[go: up one dir, main page]

WO2024125267A1 - Image processing method and apparatus, computer-readable storage medium, electronic device and computer program product - Google Patents

Image processing method and apparatus, computer-readable storage medium, electronic device and computer program product Download PDF

Info

Publication number
WO2024125267A1
WO2024125267A1 PCT/CN2023/134020 CN2023134020W WO2024125267A1 WO 2024125267 A1 WO2024125267 A1 WO 2024125267A1 CN 2023134020 W CN2023134020 W CN 2023134020W WO 2024125267 A1 WO2024125267 A1 WO 2024125267A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
original image
category
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/134020
Other languages
French (fr)
Chinese (zh)
Inventor
朱渊略
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Publication of WO2024125267A1 publication Critical patent/WO2024125267A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present disclosure relates to an image processing method, an apparatus, a computer-readable storage medium, an electronic device, and a computer program product.
  • Artificial intelligence technology is increasingly being used in the field of images. Artificial intelligence technology is usually used to convert a specified object in an original image into a target object, thereby obtaining a target image including the target object.
  • the target image obtained by processing the original image using related technologies has problems such as unclear edges of the target object, poor display effects of the processed area, abnormal colors, and missing texture details. Therefore, a solution is needed to convert a specified object in an original image into a target object.
  • the present disclosure provides an image processing method, an apparatus, a computer-readable storage medium, an electronic device, and a computer program product.
  • an image processing method comprising:
  • the original image includes a first category of objects; the first category of objects corresponds to a first area in the original image;
  • a target image is obtained; the target image includes a second category converted from the first category object in the original image Other objects.
  • an image processing apparatus comprising:
  • An acquisition module configured to acquire an original image to be processed; the original image includes a first object of a first category; the first object of the first category corresponds to a first area in the original image;
  • a determination module configured to determine a mask image corresponding to the original image based on the first area
  • a segmentation module used for performing segmentation processing on the original image to obtain a segmented image including at least part of the preset category semantics
  • a processing module is used to obtain a target image based on the original image, the mask image and the segmented image; the target image includes second category objects converted from the first category objects in the original image.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of the above-mentioned first aspects is implemented.
  • an electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the method described in any one of the first aspects is implemented.
  • a computer program product comprising a computer program, wherein the computer program implements the method described in any one of the first aspects when executed by a processor.
  • FIG1 is a schematic diagram of an image processing scene according to an exemplary embodiment of the present disclosure
  • FIG2 is a schematic diagram of a training scenario of an image processing model according to an exemplary embodiment of the present disclosure
  • FIG3 is a flow chart of an image processing method according to an exemplary embodiment of the present disclosure.
  • FIG4 is a flow chart of a method for training an image processing model according to an exemplary embodiment of the present disclosure
  • FIG5 is a block diagram of an image processing device according to an exemplary embodiment of the present disclosure.
  • FIG6 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure.
  • FIG7 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure.
  • FIG. 8 is a schematic diagram of a storage medium provided by some embodiments of the present disclosure.
  • first, second, third, etc. may be used in the present disclosure to describe various information, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information.
  • word "if” as used herein may be interpreted as "at the time of” or "when” or "in response to determining”.
  • Artificial intelligence technology is increasingly being used in the field of images. Artificial intelligence technology is usually used to convert a specified object in an original image into a target object, thereby obtaining a target image that includes the target object. For example, the hairstyle of a person in a person image can be changed (e.g., long hair can be changed to short hair), or the person in a person image can be changed into a different costume, etc.
  • the target image obtained by processing the original image using related technologies has problems such as unclear edges of the target object, poor display effect of the processed area, abnormal colors, and missing texture details. Therefore, a solution is needed to convert a specified object in an original image into a target object.
  • the present disclosure provides an image processing solution, which combines the mask image and the segmented image corresponding to the original image, converts the first category object specified in the original image into the second category object to obtain the target image. Since the conversion is performed on the specified object in the original image, only the partial area corresponding to the specified object needs to be significantly changed.
  • the present solution combines the mask image and the segmented image, so that the process of converting the specified object in the original image is more targeted, and the changed area can be repaired through the guidance of the mask image and the segmented image.
  • the edge of the target object in the target image is clearer, the texture details are richer, the image is more realistic, the display effect is improved, and the problem of color abnormality is solved.
  • FIG. 1 a schematic diagram of an image processing scenario according to an exemplary embodiment is shown.
  • the solution of the present disclosure is schematically described in combination with a complete and specific application example.
  • the application example describes a specific image processing process.
  • the original image A1 is the image to be processed, and the original image A1 includes an object m.
  • the mask image B1 corresponding to the original image A1 can be generated according to the region S corresponding to the object m in the original image A1.
  • the value of the pixel corresponding to the region S can be set to 1
  • the value of the pixel of the region S' other than the region S can be set to 0.
  • the original image A1 can also be semantically parsed to obtain the segmented image C1 corresponding to the original image A1.
  • the original image A1, the mask image B1 and the segmented image C1 are merged, for example, stacked, to obtain the merged data D1.
  • the number of channels of the merged data D1 is the sum of the number of channels of the original image A1, the mask image B1 and the segmented image C1. For example, if the number of channels of the original image A1 is a, the number of channels of the mask image B1 is b, and the number of channels of the segmented image C1 is c, then the number of channels of the merged data D1 is a+b+c.
  • the merged data D1 can be input into the image processing network, and the image processing network processes the merged data D1 of multiple channels.
  • the same/different convolution kernels can be used in the shallow layer of the image processing network to perform convolution processing on each channel of the merged data D1, and the convolution processing results corresponding to each channel are added. Therefore, the information of the mask image B1 and the segmented image C1 can guide the image processing process.
  • the image processing network can output image data with a+b+c channels, where data with a channels can constitute the target image A2, data with b channels can constitute the mask image B2, and data with the remaining c channels can constitute the segmented image C2.
  • the target image A2 includes object n, which is converted from object m included in the original image A1.
  • the mask image B2 and the segmented image C2 are both for object n and correspond to the target image A2.
  • the target image A2 can be obtained based on the image data with a+b+c channels output by the image processing network.
  • FIG. 2 a schematic diagram of a training scenario of an image processing model according to an exemplary embodiment is shown.
  • the scheme of the present disclosure is schematically described in combination with a complete and specific application example.
  • the application example describes the training process of a specific image processing model.
  • a sample image A3, a sample mask image B3 corresponding to the sample image A3, a sample segmentation image C3 and a label image A5, and a label mask image B5 and a label segmentation image C5 corresponding to the label image A5 can be obtained from a pre-created training set.
  • the sample image A3, the sample mask image B3 and the sample segmentation image C3 are merged to obtain merged data D2.
  • the number of channels of the merged data D2 is the sum of the number of channels of the sample image A3, the sample mask image B3 and the sample segmentation image C3.
  • the number of channels of the merged data D2 is a+b+c.
  • the merged data D2 can be input into the image processing network to be trained, and the multi-channel merged data D2 can be processed by the image processing network to be trained.
  • the image processing network to be trained can output image data with a+b+c channels, where data with a channels can constitute the predicted image A4, data with b channels can constitute the predicted mask image B4, and data with the remaining c channels can constitute the predicted segmentation image C4.
  • the predicted image A4, the predicted mask image B4, and the predicted segmentation image C4 can be obtained according to the image data with a+b+c channels output by the image processing network to be trained.
  • the predicted image A4 and the label image A5 are respectively input to the discriminator P to be trained, and the discriminator P can be used to discriminate the authenticity of the predicted image A4 and the label image A5.
  • the network parameters of the image processing network to be trained and the discriminator P can be adjusted in turn with reference to the label image A5, the label mask image B5 and the label segmented image C5.
  • the network parameters of the discriminator P can be adjusted based on the output of the discriminator P, so that its judgment results on the authenticity of the image are more and more accurate.
  • the prediction loss can be determined based on the prediction image A4, the prediction mask image B4, the prediction segmentation image C4, the result output by the discriminator P, the label image A5, the label mask image B5 and the label segmentation image C5.
  • the prediction loss can be composed of the sum of loss items L1, L2, L3, L4, L5, L6 and L7, and the network parameters of the image processing network are adjusted with the goal of reducing the prediction loss.
  • loss terms L1, L2 and L3 are all determined based on the predicted image A4 and the label image A5.
  • the loss term L1 can represent the difference in image features between the predicted image A4 and the label image A5, and can be specifically determined based on the mean absolute error of the pixel values of the pixels of the predicted image A4 and the label image A5.
  • the loss term L2 can represent the difference in visual perception between the predicted image A4 and the label image A5, and the loss term L3 can represent the difference in image style between the predicted image A4 and the label image A5.
  • the predicted image A4 and the label image A5 can be respectively input into a pre-trained convolutional neural network to obtain two feature maps, and the loss term L2 is obtained by calculating the difference between the two feature maps.
  • the loss term L3 is obtained by calculating the difference in the Lagum matrices corresponding to the two feature maps.
  • the loss term L4 is determined based on the difference between the target area M in the predicted image A4 and the label image A5, where the target area M is the area where there is a difference between the sample mask image B3 and the label mask image B5.
  • the loss term L4 can represent the difference in color of the target area M between the predicted image A4 and the label image A5.
  • the loss term L5 is determined based on the predicted mask image B4 and the label mask image B5, and can be a weighted sum of the binary cross entropy loss and the regional mutual information loss between the predicted mask image B4 and the label mask image B5.
  • the loss term L6 is determined based on the predicted segmented image C4 and the label segmented image C5, and can be a weighted sum of the binary cross entropy loss and the regional mutual information loss between the predicted segmented image C4 and the label segmented image C5.
  • the loss term L7 is determined based on the result output by the above-mentioned discriminator P, and can be a loss term used to represent the loss of the generative adversarial network. It can be understood that the prediction loss can also include other loss terms, and this embodiment is not limited to this aspect.
  • FIG3 is a flow chart of an image processing method according to an exemplary embodiment.
  • the execution subject of the method can be implemented as any device, platform, server or device cluster with computing and processing capabilities.
  • the method includes the following steps:
  • step 301 an original image to be processed is obtained.
  • the original image is an image to be processed, and the original image includes a first category object.
  • the processing of the original image involved in this embodiment may be to convert the first category object included in the original image into a second category object.
  • the original image may be a person image
  • the object to be converted in the original image may be the person's hair
  • the first category object may be, for example, the long hair of the person in the original image.
  • Processing the original image may be to convert the long hair included in the original image into short hair to obtain a target image.
  • the target image includes the short hair of the person after conversion (i.e., the second category object).
  • the original image may be a full-body image of a person
  • the object to be converted in the original image may be the person's clothing
  • the first category object may be, for example, a long skirt of the person in the original image.
  • Processing the original image may be converting the long skirt included in the original image into a short skirt to obtain a target image.
  • the target image may include the short skirt of the person after conversion (i.e., the second category object).
  • a mask image is determined based on a first region corresponding to an object of a first category in an original image.
  • the first category object corresponds to the first area in the original image
  • a mask image corresponding to the original image can be generated based on the first area.
  • the mask image can be a single-channel binary image.
  • the value of the pixel corresponding to the first area can be set to 1 or 255, and the value of the pixel in other areas except the first area can be set to 0. Therefore, in the scene where the object to be converted is the hair of a person, the mask image corresponding to the original image presents an effect that the hair area is white and the other areas are black. In the scene where the object to be converted is the clothing of a person, the mask image corresponding to the original image presents an effect that the clothing area of the person is white and the other areas are black.
  • step 303 the original image is segmented to obtain a segmented image.
  • the original image may be subjected to, for example, semantic segmentation processing to obtain a segmented image including at least part of the semantics of a preset category.
  • a semantic segmentation model may be used to process the original image, wherein the semantic segmentation model is trained based on an image including the semantics of a preset category.
  • n preset categories of semantics may be set in advance according to the scene and needs, and then a semantic segmentation model may be trained based on the semantics of the n preset categories and an image including the semantics of the n preset categories.
  • the semantic segmentation model is used to perform semantic segmentation processing on the original image, and the obtained semantic image includes any number of semantics of the n preset categories.
  • the semantics of the n preset categories may include but are not limited to: background, hair, face, clothes, body skin, hands, glasses, hats, etc.
  • step 304 a target image is obtained based on the original image, the mask image and the segmented image.
  • the first category object included in the original image can be converted into the second category object to obtain a target image.
  • the original image, the mask image and the segmented image can be converted using a pre-trained target model to obtain a target image.
  • the target model can be a deep convolutional neural network model, and optionally, the target model can specifically be a u2ne type model.
  • the target model can be trained by traditional supervised learning, or by a supervised adversarial generative network. It can be understood that this embodiment is not limited to this aspect.
  • the original image, the mask image and the segmented image can be merged to obtain merged data, so that the number of channels of the merged data is the sum of the number of channels of the original image, the mask image and the segmented image.
  • the merging process can be a channel-based merging process (such as stacking process, etc.).
  • the original image can be a 3-channel RGB image
  • the mask image can be a 1-channel binary image
  • the segmented image can be an n-channel semantic segmentation image (n is the number of preset semantic categories)
  • 4+n channel merged data can be obtained.
  • the merged data can be input into the target model for conversion processing to obtain the target image.
  • the present disclosure provides an image processing method, which combines a mask image and a segmented image corresponding to the original image, and converts a first category object specified in the original image into a second category object to obtain a target image. Since the specified object in the original image is converted, only a part of the area corresponding to the specified object needs to be significantly changed.
  • the present solution combines the mask image and the segmented image, so that the process of converting the specified object in the original image is more targeted, and the changed area can be repaired through the guidance of semantic information.
  • the edge of the target object converted from the specified object in the target image is clearer, the texture details are richer, the image is more realistic, the display effect is improved, and the problem of color abnormality is solved.
  • FIG4 is a flow chart of a training method for an image processing model according to an exemplary embodiment.
  • the model may be the target model for processing images involved in the embodiment of FIG3 .
  • the execution subject of the method may be implemented as any device, platform, server or device cluster with computing and processing capabilities.
  • the method may include iteratively executing the following steps:
  • step 401 a sample image, a sample mask image, a sample segmented image, a label image, a label mask image and a label segmented image are obtained.
  • each candidate image includes a first category object
  • the candidate label image corresponding to each candidate image is obtained.
  • the candidate label image corresponding to any candidate image includes a second category object converted from the first category object in the candidate image. For example, in a scene where the object to be converted is a person's hair, if the first category object is a person's long hair and the second category object is a person's short hair, then each candidate image includes the person's long hair, and the candidate label image corresponding to each candidate image includes the person's short hair converted from the person's long hair in the candidate image.
  • each candidate image includes a person's long skirt
  • the candidate label image corresponding to each candidate image includes the person's short skirt converted from the person's long skirt in the candidate image.
  • the candidate label image corresponding to the candidate image can be obtained by any reasonable method.
  • the candidate image can be manually modified to obtain the candidate label image corresponding to the candidate image.
  • the candidate image can also be processed by other models with similar functions but poor effects, and then manually modified to obtain the candidate label image corresponding to the candidate image. This embodiment does not limit the specific method for obtaining the candidate label image.
  • the acquisition method can refer to the acquisition method of the mask image and segmentation image corresponding to the original image in the embodiment of FIG. 3
  • obtain the mask image and segmentation image corresponding to each candidate label image use the candidate images and their corresponding mask images, segmentation images, candidate label images, and the mask images and segmentation images corresponding to the candidate label images to construct a training set.
  • an alternative image can be taken from the training set as a sample image, and a sample mask image, a sample segmentation image, and a label image corresponding to the sample image can be obtained, as well as a label mask image and a label segmentation image corresponding to the label image.
  • the sample image includes a first category object, the first category object corresponds to a second region in the sample image, and the sample mask image is determined based on the second region.
  • the label image is an image obtained by converting the first category object in the sample image into a second category object, the second category object corresponds to a third region in the label image, and the label mask image is determined based on the third region.
  • step 402 based on the sample image, the sample mask image, the sample segmentation image and the training image, The trained target model is used to obtain the predicted image, predicted mask image and predicted segmentation image.
  • the sample image, the sample mask image and the sample segmentation image may be merged (such as stacking) to obtain merged sample data.
  • the number of channels of the merged sample data is the sum of the number of channels of the sample image, the sample mask image and the sample segmentation image.
  • the merged sample data is input into the target model to be trained, and the target model to be trained may output a predicted image, a predicted mask image and a predicted segmentation image corresponding to the predicted image.
  • the predicted image includes a second category object converted from a first category object in the sample image.
  • the second category objects included in the label image and the second category objects included in the predicted image are not completely the same, although they are both converted from the first category objects included in the sample image.
  • the label image is a reference image for comparison with the predicted image, which has a better display effect.
  • step 403 the prediction loss is determined based on the predicted image, the predicted mask image, the predicted segmented image, the label image, the label mask image and the label segmented image, and in step 404, the model parameters of the target model to be trained are adjusted with the goal of reducing the prediction loss.
  • the prediction loss can be determined based on the predicted image, the predicted mask image, the predicted segmented image, the label image, the label mask image and the label segmented image.
  • the prediction loss may include a first loss term, a second loss term and a third loss term.
  • the first loss term is determined based on the predicted image and the label image, and may be a loss term representing the difference in image features between the predicted image and the label image, and may be specifically determined based on the mean absolute error between the predicted image and the label image.
  • the second loss term is determined based on the predicted mask image and the label mask image, and may be a weighted sum of a binary cross entropy loss and a regional mutual information loss between the predicted mask image and the label mask image.
  • the third loss term is determined based on the predicted segmented image and the label segmented image, and may be a weighted sum of a binary cross entropy loss and a regional mutual information loss between the predicted segmented image and the label segmented image.
  • the prediction loss may also include a fourth loss term, which is determined based on the difference between the target area in the predicted image and the label image.
  • the target area is the area where there is a difference between the sample mask image and the label mask image.
  • the target area can be determined by calculating the absolute value of the difference between the pixel values corresponding to each pixel point of the sample mask image and the label mask image. Since the fourth loss term for guiding the adjustment of the model parameters is obtained based on the target area in this embodiment, the trained target model can process the specified object in the original image more accurately and more targeted.
  • the fourth loss term can be determined in the following manner: Specifically, the first difference value of the target area in the predicted image and the label image under the first color space can be determined first, and the second difference value of the target area in the predicted image and the label image under the second color space can be determined, and the fourth loss term is determined based on the weighted sum of the first difference value and the second difference value.
  • the first color space can be, for example, an RGB color space
  • the second color space can be, for example, a LAB color space.
  • the first difference value can be the mean absolute error of the target area in the predicted image and the label image under the RGB color space
  • the second difference value can be the mean absolute error of the target area in the predicted image and the label image under the LAB color space.
  • the corresponding weight of the first difference value is greater than the corresponding weight of the second difference value. Since this embodiment obtains the loss term for guiding the adjustment of model parameters based on the weighted sum of the difference values of the target area in the predicted image and the label image under different color spaces, the color effect of the image obtained by processing the original image by the trained target model is better.
  • the target model can be trained by a supervised generative adversarial network. Therefore, the predicted image and the label image need to be input into the discriminator to be trained, and the discriminator is adjusted based on the result output by the discriminator.
  • the fifth loss term included in the prediction loss can also be determined based on the result output by the discriminator.
  • the fifth loss term can be a loss term used to represent the loss of the generative adversarial network. It can be understood that the prediction loss can also include other loss terms, and this embodiment is not limited to this aspect.
  • the present embodiment combines the label image, label mask image and label segmentation image corresponding to the sample image to obtain the prediction loss.
  • the target model is trained based on the prediction loss, so that the process of converting the specified object in the original image by the target model is more targeted, so that the edge of the target object in the target image is clearer, the texture details are richer, the image is more realistic, the display effect is improved, and the problem of color abnormality is solved.
  • the present disclosure also provides an embodiment of an image processing device.
  • FIG5 is a block diagram of an image processing device according to an exemplary embodiment of the present disclosure, the device may include: an acquisition module 501, a determination module 502, a segmentation module 503 and Processing module 504.
  • the acquisition module 501 is used to acquire an original image to be processed, wherein the original image includes a first category object, and the first category object corresponds to a first area in the original image.
  • the determination module 502 is configured to determine a mask image corresponding to the original image based on the first area.
  • the segmentation module 503 is used to perform segmentation processing on the original image to obtain a segmented image including at least part of the preset category semantics.
  • the processing module 504 is used to obtain a target image based on the original image, the mask image and the segmented image.
  • the target image includes objects of the second category converted from objects of the first category in the original image.
  • the segmentation module 503 is configured to: process the original image using a semantic segmentation model to obtain a segmented image, wherein the semantic segmentation model is trained based on an image including a preset category semantics.
  • the processing module 504 may include: a conversion submodule (not shown in the figure).
  • the conversion submodule is used to convert the original image, the mask image and the segmented image using the target model to obtain the target image.
  • the conversion process is to convert the first category object into the second category object.
  • the conversion submodule is configured to: merge the original image, the mask image and the segmented image to obtain merged data.
  • the number of channels of the merged data is the sum of the number of channels of the original image, the mask image and the segmented image, and the merged segmentation is input into the target model for the above conversion process to obtain the target image.
  • the target model is trained by iteratively executing the following steps: obtaining a sample image and a label image, wherein the second region in the sample image corresponds to a first category object, and the third region in the label image corresponds to a second category object.
  • a predicted image is obtained, and a prediction loss is determined based on the predicted image and the label image.
  • the prediction loss includes a fourth loss term, which is determined based on the difference between the target region in the predicted image and the label image.
  • the target region is determined based on the second region and the third region, and the model parameters of the target model to be trained are adjusted with the goal of reducing the prediction loss.
  • the fourth loss term can be determined by: determining a first difference value between the target area in the predicted image and the label image in the first color space, determining a second difference value between the target area in the predicted image and the label image in the second color space, and based on the first difference value and the second difference value, The weighted sum of the two difference values determines the fourth loss term.
  • the relevant parts can refer to the partial description of the method embodiment.
  • the device embodiment described above is only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiment scheme of the present disclosure. A person of ordinary skill in the art can understand and implement it without paying creative labor.
  • FIG6 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure.
  • the electronic device 910 includes a processor 911 and a memory 912, which can be used to implement a client or a server.
  • the memory 912 is used to store computer executable instructions (e.g., one or more computer program modules) non-transiently.
  • the processor 911 is used to run the computer executable instructions, and when the computer executable instructions are run by the processor 911, one or more steps in the image processing method described above can be executed, thereby implementing the image processing method described above.
  • the memory 912 and the processor 911 can be interconnected via a bus system and/or other forms of connection mechanisms (not shown).
  • the processor 911 may be a central processing unit (CPU), a graphics processing unit (GPU), or other forms of processing units having data processing capabilities and/or program execution capabilities.
  • the central processing unit (CPU) may be an X86 or ARM architecture, etc.
  • the processor 911 may be a general-purpose processor or a dedicated processor, and may control other components in the electronic device 910 to perform desired functions.
  • the memory 912 may include any combination of one or more computer program products, and the computer program product may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc.
  • Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, flash memory, etc.
  • One or more computer program modules may be stored on the computer-readable storage medium, and the processor 911 may run one or more computer program modules to implement various functions of the electronic device 910.
  • Various applications and various data, as well as various data used and/or generated by the application, etc. may also be stored in the computer-readable storage medium.
  • FIG7 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure.
  • the electronic device 920 is suitable for implementing the image processing method provided by the embodiment of the present disclosure, for example.
  • the electronic device 920 may be a terminal device, etc., and may be used to implement a client or a server.
  • the electronic device 920 may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), wearable electronic devices, etc., and fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
  • PDAs personal digital assistants
  • PADs tablet computers
  • PMPs portable multimedia players
  • vehicle-mounted terminals such as vehicle-mounted navigation terminals
  • wearable electronic devices etc.
  • fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
  • the electronic device 920 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 921, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 922 or a program loaded from a storage device 928 to a random access memory (RAM) 923.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for the operation of the electronic device 920 are also stored.
  • the processing device 921, the ROM 922, and the RAM 923 are connected to each other via a bus 924.
  • An input/output (I/O) interface 925 is also connected to the bus 924.
  • the following devices may be connected to the I/O interface 925: input devices 926 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 927 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 928 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 929.
  • the communication devices 929 may allow the electronic device 920 to communicate with other electronic devices wirelessly or by wire to exchange data.
  • FIG. 7 shows an electronic device 920 having various devices, it should be understood that it is not required to implement or have all of the devices shown, and the electronic device 920 may alternatively implement or have more or fewer devices.
  • the above-mentioned image processing method can be implemented as a computer software program.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, and the computer program includes a program code for executing the above-mentioned image processing method.
  • the computer program can be downloaded and installed from the network through the communication device 929, or installed from the storage device 928, or installed from the ROM 922.
  • the processing device 921 the functions defined in the image processing method provided by the embodiment of the present disclosure can be implemented.
  • FIG8 is a schematic diagram of a storage medium provided by some embodiments of the present disclosure.
  • the storage medium 930 may be a non-transitory computer-readable storage medium for storing non-transitory
  • the image processing method described in the embodiment of the present disclosure can be implemented.
  • the non-transitory computer executable instruction 931 is executed by a processor, one or more steps in the image processing method described above can be performed.
  • the storage medium 930 may be applied to the above-mentioned electronic device.
  • the storage medium 930 may include a memory in the electronic device.
  • the storage medium may include a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory (CD-ROM), flash memory, or any combination of the above storage media, or other applicable storage media.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • flash memory or any combination of the above storage media, or other applicable storage media.
  • the description of the storage medium 930 can refer to the description of the memory in the embodiment of the electronic device, and the repeated parts are not repeated.
  • the specific functions and technical effects of the storage medium 930 can refer to the description of the image processing method above, and are not repeated here.
  • a computer-readable medium may be a tangible medium that may contain or store a program for use by an instruction execution system, device or equipment or used in combination with an instruction execution system, device or equipment.
  • a computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer-readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that may be used by an instruction execution system, device or device or used in combination with it.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, which may be sent, propagated, or transmitted for use by or in conjunction with an instruction execution system, apparatus, or device. Programs used.
  • the program code contained on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Color Image Communication Systems (AREA)
  • Facsimile Image Signal Circuits (AREA)

Abstract

Provided in the present disclosure are an image processing method and apparatus, a computer-readable storage medium, an electronic device and a computer program product. The method comprises: acquiring an original image to be processed, the original image comprising an object of a first category, and the object of the first category corresponding to a first area in the original image; on the basis of the first area, determining a mask image corresponding to the original image; segmenting the original image to obtain segmented images having at least some of semantics of preset categories; and, on the basis of the original image, the mask image and the segmented images, obtaining a target image, the target image comprising an object of a second category converted from the object of the first category in the original image. The embodiments of the present disclosure may allow target objects converted from designated objects in the target images to have clearer edges and richer texture details, thus making images more realistic, improving the display effect, and solving the problem of color anomaly.

Description

图像处理方法、装置、计算机可读存储介质、电子设备及计算机程序产品Image processing method, device, computer readable storage medium, electronic device and computer program product

本申请要求于2022年12月13日递交的中国专利申请第202211594249.0号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。This application claims priority to Chinese Patent Application No. 202211594249.0 filed on December 13, 2022, and the contents of the above-mentioned Chinese patent application disclosure are hereby cited in their entirety as a part of this application.

技术领域Technical Field

本公开涉及一种图像处理方法、装置、计算机可读存储介质、电子设备及计算机程序产品。The present disclosure relates to an image processing method, an apparatus, a computer-readable storage medium, an electronic device, and a computer program product.

背景技术Background technique

人工智能技术在图像领域的应用越来越广泛,通常采用人工智能技术将原始图像中的指定对象转换成目标对象,得到包括目标对象的目标图像。目前来说,采用相关技术对原始图像进行处理得到的目标图像中,存在目标对象的边缘不清晰,被处理过的区域显示效果差,颜色异常以及纹理细节缺失等问题。因此,需要一种能将原始图像中的指定对象转换成目标对象的方案。Artificial intelligence technology is increasingly being used in the field of images. Artificial intelligence technology is usually used to convert a specified object in an original image into a target object, thereby obtaining a target image including the target object. At present, the target image obtained by processing the original image using related technologies has problems such as unclear edges of the target object, poor display effects of the processed area, abnormal colors, and missing texture details. Therefore, a solution is needed to convert a specified object in an original image into a target object.

发明内容Summary of the invention

本公开提供一种图像处理方法、装置、计算机可读存储介质、电子设备及计算机程序产品。The present disclosure provides an image processing method, an apparatus, a computer-readable storage medium, an electronic device, and a computer program product.

根据第一方面,提供一种图像处理方法,所述方法包括:According to a first aspect, there is provided an image processing method, the method comprising:

获取待处理的原始图像;所述原始图像中包括第一类别对象;所述第一类别对象在所述原始图像中对应于第一区域;Acquire an original image to be processed; the original image includes a first category of objects; the first category of objects corresponds to a first area in the original image;

基于所述第一区域,确定所述原始图像对应的掩膜图像;Based on the first area, determining a mask image corresponding to the original image;

对所述原始图像进行分割处理,得到包括至少部分预设类别语义的分割图像;Performing segmentation processing on the original image to obtain a segmented image including at least part of the preset category semantics;

基于所述原始图像、所述掩膜图像和所述分割图像,得到目标图像;所述目标图像包括由所述原始图像中的所述第一类别对象转换而成的第二类 别对象。Based on the original image, the mask image and the segmented image, a target image is obtained; the target image includes a second category converted from the first category object in the original image Other objects.

根据第二方面,提供一种图像处理装置,所述方法包括:According to a second aspect, an image processing apparatus is provided, the method comprising:

获取模块,用于获取待处理的原始图像;所述原始图像中包括第一类别第一对象;所述第一类别对象在所述原始图像中对应于第一区域;An acquisition module, configured to acquire an original image to be processed; the original image includes a first object of a first category; the first object of the first category corresponds to a first area in the original image;

确定模块,用于基于所述第一区域,确定所述原始图像对应的掩膜图像;A determination module, configured to determine a mask image corresponding to the original image based on the first area;

分割模块,用于对所述原始图像进行分割处理,得到包括至少部分预设类别语义的分割图像;A segmentation module, used for performing segmentation processing on the original image to obtain a segmented image including at least part of the preset category semantics;

处理模块,用于基于所述原始图像、所述掩膜图像和所述分割图像,得到目标图像;所述目标图像包括由所述原始图像中的所述第一类别对象转换而成的第二类别对象。A processing module is used to obtain a target image based on the original image, the mask image and the segmented image; the target image includes second category objects converted from the first category objects in the original image.

根据第三方面,提供一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面中任一项所述的方法。According to a third aspect, a computer-readable storage medium is provided, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of the above-mentioned first aspects is implemented.

根据第四方面,提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现第一方面中任一项所述的方法。According to a fourth aspect, an electronic device is provided, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the method described in any one of the first aspects is implemented.

根据第五方面,提供一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现上述第一方面任一项所述的方法。According to a fifth aspect, a computer program product is provided, comprising a computer program, wherein the computer program implements the method described in any one of the first aspects when executed by a processor.

应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本公开实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the present disclosure. For ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work.

图1是本公开根据一示例性实施例示出的一种图像处理的场景示意图;FIG1 is a schematic diagram of an image processing scene according to an exemplary embodiment of the present disclosure;

图2是本公开根据一示例性实施例示出的一种图像处理模型的训练场景示意图;FIG2 is a schematic diagram of a training scenario of an image processing model according to an exemplary embodiment of the present disclosure;

图3是本公开根据一示例性实施例示出的一种图像处理方法的流程图; FIG3 is a flow chart of an image processing method according to an exemplary embodiment of the present disclosure;

图4是本公开根据一示例性实施例示出的一种图像处理模型的训练方法的流程图;FIG4 is a flow chart of a method for training an image processing model according to an exemplary embodiment of the present disclosure;

图5是本公开根据一示例性实施例示出的一种图像处理装置框图;FIG5 is a block diagram of an image processing device according to an exemplary embodiment of the present disclosure;

图6是本公开一些实施例提供的一种电子设备的示意框图;FIG6 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure;

图7是本公开一些实施例提供的另一种电子设备的示意框图;FIG7 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure;

图8是本公开一些实施例提供的一种存储介质的示意图。FIG. 8 is a schematic diagram of a storage medium provided by some embodiments of the present disclosure.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本公开中的技术方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。In order to enable those skilled in the art to better understand the technical solutions in the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments of the present disclosure, all other embodiments obtained by ordinary technicians in the field without creative work should fall within the scope of protection of the present disclosure.

下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present disclosure. Instead, they are only examples of devices and methods consistent with some aspects of the present disclosure as detailed in the attached claims.

在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in this disclosure are only for the purpose of describing specific embodiments and are not intended to limit the disclosure. The singular forms of "a", "the" and "the" used in this disclosure are also intended to include plural forms unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used herein refers to and includes any or all possible combinations of one or more associated listed items.

应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in the present disclosure to describe various information, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of the present disclosure, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining".

人工智能技术在图像领域的应用越来越广泛,通常采用人工智能技术将原始图像中的指定对象转换成目标对象,得到包括目标对象的目标图像。例 如,将人物图像中人物的发型进行转换(如将长发变成短发),或者对人物图像中的人物进行变装等。目前来说,采用相关技术对原始图像进行处理得到的目标图像中,存在目标对象的边缘不清晰,被处理过的区域显示效果差,颜色异常以及纹理细节缺失等问题。因此,需要一种能将原始图像中的指定对象转换成目标对象的方案。Artificial intelligence technology is increasingly being used in the field of images. Artificial intelligence technology is usually used to convert a specified object in an original image into a target object, thereby obtaining a target image that includes the target object. For example, the hairstyle of a person in a person image can be changed (e.g., long hair can be changed to short hair), or the person in a person image can be changed into a different costume, etc. At present, the target image obtained by processing the original image using related technologies has problems such as unclear edges of the target object, poor display effect of the processed area, abnormal colors, and missing texture details. Therefore, a solution is needed to convert a specified object in an original image into a target object.

本公开提供的一种图像处理方案,结合原始图像对应的掩膜图像和分割图像,将原始图像中指定的第一类别对象转换成第二类别对象,得到目标图像。由于是对原始图像中指定的对象进行转换,所以仅指定的对象对应的部分区域需要较大的更改,本方案在处理图像的过程中结合了掩膜图像和分割图像,使得对原始图像中的指定的对象进行转换的过程更有针对性,并能通过掩膜图像和分割图像的引导,对被更改的区域进行修复。从而使目标图像中目标对象的边缘更清晰,纹理细节更丰富,图像更真实,提高了显示效果,解决了颜色异常的问题。The present disclosure provides an image processing solution, which combines the mask image and the segmented image corresponding to the original image, converts the first category object specified in the original image into the second category object to obtain the target image. Since the conversion is performed on the specified object in the original image, only the partial area corresponding to the specified object needs to be significantly changed. In the process of processing the image, the present solution combines the mask image and the segmented image, so that the process of converting the specified object in the original image is more targeted, and the changed area can be repaired through the guidance of the mask image and the segmented image. As a result, the edge of the target object in the target image is clearer, the texture details are richer, the image is more realistic, the display effect is improved, and the problem of color abnormality is solved.

参见图1,为根据一示例性实施例示出的一种图像处理的场景示意图。下面参考图1,结合一个完整具体的应用实例,对本公开的方案进行示意性说明。该应用实例描述了一个具体的图像处理的过程。Referring to Fig. 1, a schematic diagram of an image processing scenario according to an exemplary embodiment is shown. Referring to Fig. 1, the solution of the present disclosure is schematically described in combination with a complete and specific application example. The application example describes a specific image processing process.

如图1所示,原始图像A1为待处理图像,原始图像A1中包括对象m,可以根据对象m在原始图像A1中所对应的区域S,生成原始图像A1对应的掩膜图像B1。例如,可以将区域S对应的像素点的值设置为1,将除区域S之外的区域S’的像素点的值设置为0。同时,还可以对原始图像A1进行语义解析,得到原始图像A1对应的分割图像C1。将原始图像A1、掩膜图像B1和分割图像C1进行合并处理,例如进行堆叠处理,得到合并数据D1。合并数据D1的通道数为原始图像A1、掩膜图像B1和分割图像C1的通道数之和。例如原始图像A1的通道数为a,掩膜图像B1的通道数为b,分割图像C1的通道数为c,则合并数据D1的通道数为a+b+c。As shown in FIG1 , the original image A1 is the image to be processed, and the original image A1 includes an object m. The mask image B1 corresponding to the original image A1 can be generated according to the region S corresponding to the object m in the original image A1. For example, the value of the pixel corresponding to the region S can be set to 1, and the value of the pixel of the region S' other than the region S can be set to 0. At the same time, the original image A1 can also be semantically parsed to obtain the segmented image C1 corresponding to the original image A1. The original image A1, the mask image B1 and the segmented image C1 are merged, for example, stacked, to obtain the merged data D1. The number of channels of the merged data D1 is the sum of the number of channels of the original image A1, the mask image B1 and the segmented image C1. For example, if the number of channels of the original image A1 is a, the number of channels of the mask image B1 is b, and the number of channels of the segmented image C1 is c, then the number of channels of the merged data D1 is a+b+c.

接着,可以将合并数据D1输入至图像处理网络中,由该图像处理网络对多通道的合并数据D1进行处理。在对合并数据D1进行处理时,在图像处理网络的浅层可以利用相同/不同的卷积核分别对合并数据D1的每个通道进行卷积处理,并将各个通道对应的卷积处理结果相加。因此,掩膜图像B1和分割图像C1的信息能够对图像的处理过程起到引导作用。 Next, the merged data D1 can be input into the image processing network, and the image processing network processes the merged data D1 of multiple channels. When processing the merged data D1, the same/different convolution kernels can be used in the shallow layer of the image processing network to perform convolution processing on each channel of the merged data D1, and the convolution processing results corresponding to each channel are added. Therefore, the information of the mask image B1 and the segmented image C1 can guide the image processing process.

最后,图像处理网络可以输出通道数为a+b+c的图像数据,其中,有a个通道的数据能够构成目标图像A2,有b个通道的数据能够构成掩膜图像B2,剩下的c个通道的数据能够构成分割图像C2。目标图像A2中包括对象n,对象n由原始图像A1中包括的对象m转换而成。其中,掩膜图像B2和分割图像C2都针对对象n,对应于目标图像A2。可以根据图像处理网络输出的通道数为a+b+c的图像数据,获取目标图像A2。Finally, the image processing network can output image data with a+b+c channels, where data with a channels can constitute the target image A2, data with b channels can constitute the mask image B2, and data with the remaining c channels can constitute the segmented image C2. The target image A2 includes object n, which is converted from object m included in the original image A1. The mask image B2 and the segmented image C2 are both for object n and correspond to the target image A2. The target image A2 can be obtained based on the image data with a+b+c channels output by the image processing network.

参见图2,为根据一示例性实施例示出的一种图像处理模型的训练场景示意图。下面参考图2,结合一个完整具体的应用实例,对本公开的方案进行示意性说明。该应用实例描述了一个具体的图像处理模型的训练过程。Referring to Fig. 2, a schematic diagram of a training scenario of an image processing model according to an exemplary embodiment is shown. Referring to Fig. 2, the scheme of the present disclosure is schematically described in combination with a complete and specific application example. The application example describes the training process of a specific image processing model.

如图2所示,首先,可以从预先创建的训练集中获取样本图像A3、样本图像A3对应的样本掩膜图像B3、样本分割图像C3和标签图像A5、以及标签图像A5对应的标签掩膜图像B5和标签分割图像C5。将样本图像A3、样本掩膜图像B3和样本分割图像C3进行合并处理,得到合并数据D2。合并数据D2的通道数为样本图像A3、样本掩膜图像B3和样本分割图像C3的通道数之和。例如样本图像A3的通道数为a,样本掩膜图像B3的通道数为b,样本分割图像C3的通道数为c,则合并数据D2的通道数为a+b+c。可以将合并数据D2输入至待训练的图像处理网络中,由该待训练的图像处理网络对多通道的合并数据D2进行处理。As shown in FIG2 , first, a sample image A3, a sample mask image B3 corresponding to the sample image A3, a sample segmentation image C3 and a label image A5, and a label mask image B5 and a label segmentation image C5 corresponding to the label image A5 can be obtained from a pre-created training set. The sample image A3, the sample mask image B3 and the sample segmentation image C3 are merged to obtain merged data D2. The number of channels of the merged data D2 is the sum of the number of channels of the sample image A3, the sample mask image B3 and the sample segmentation image C3. For example, if the number of channels of the sample image A3 is a, the number of channels of the sample mask image B3 is b, and the number of channels of the sample segmentation image C3 is c, then the number of channels of the merged data D2 is a+b+c. The merged data D2 can be input into the image processing network to be trained, and the multi-channel merged data D2 can be processed by the image processing network to be trained.

然后,待训练的图像处理网络可以输出通道数为a+b+c的图像数据,其中,有a个通道的数据能够构成预测图像A4,有b个通道的数据能够构成预测掩膜图像B4,剩下的c个通道的数据能够构成预测分割图像C4。可以根据待训练的图像处理网络输出的通道数为a+b+c的图像数据,获取预测图像A4、预测掩膜图像B4和预测分割图像C4。并将预测图像A4和标签图像A5分别输入至待训练的判别器P,该判别器P可以用于判别预测图像A4和标签图像A5的真假。Then, the image processing network to be trained can output image data with a+b+c channels, where data with a channels can constitute the predicted image A4, data with b channels can constitute the predicted mask image B4, and data with the remaining c channels can constitute the predicted segmentation image C4. The predicted image A4, the predicted mask image B4, and the predicted segmentation image C4 can be obtained according to the image data with a+b+c channels output by the image processing network to be trained. The predicted image A4 and the label image A5 are respectively input to the discriminator P to be trained, and the discriminator P can be used to discriminate the authenticity of the predicted image A4 and the label image A5.

最后,可以基于预测图像A4、预测掩膜图像B4、预测分割图像C4和判别器P输出的结果,参照标签图像A5、标签掩膜图像B5和标签分割图像C5,轮流对待训练的图像处理网络和判别器P各自的网络参数进行调整。具体地,在判别器训练阶段,可以基于判别器P输出的结果,调整该判别器P的网络参数,使其对图像真假的判断结果越来越准确。 Finally, based on the predicted image A4, the predicted mask image B4, the predicted segmented image C4 and the output of the discriminator P, the network parameters of the image processing network to be trained and the discriminator P can be adjusted in turn with reference to the label image A5, the label mask image B5 and the label segmented image C5. Specifically, in the discriminator training stage, the network parameters of the discriminator P can be adjusted based on the output of the discriminator P, so that its judgment results on the authenticity of the image are more and more accurate.

在图像处理网络的训练阶段,可以基于预测图像A4、预测掩膜图像B4、预测分割图像C4、判别器P输出的结果、标签图像A5、标签掩膜图像B5和标签分割图像C5,确定预测损失。预测损失可以由损失项L1、L2、L3、L4、L5、L6和L7等相加构成,并以减小预测损失为目标,调整图像处理网络的网络参数。In the training phase of the image processing network, the prediction loss can be determined based on the prediction image A4, the prediction mask image B4, the prediction segmentation image C4, the result output by the discriminator P, the label image A5, the label mask image B5 and the label segmentation image C5. The prediction loss can be composed of the sum of loss items L1, L2, L3, L4, L5, L6 and L7, and the network parameters of the image processing network are adjusted with the goal of reducing the prediction loss.

其中,损失项L1、L2和L3均是基于预测图像A4和标签图像A5而确定的,损失项L1能够表示预测图像A4和标签图像A5之间的图像特征的差异,具体可以是基于预测图像A4和标签图像A5的像素点的像素值的平均绝对误差确定。损失项L2能够表示预测图像A4和标签图像A5之间视觉感知的差异,损失项L3能够表示预测图像A4和标签图像A5之间图像风格的差异。具体来说,可以将预测图像A4和标签图像A5分别输入至预先训练的卷积神经网络网络,得到两个特征图,通过计算该两个特征图之间的差异,得到损失项L2。通过计算该两个特征图各自对应的拉格姆矩阵的差异,得到损失项L3。Among them, loss terms L1, L2 and L3 are all determined based on the predicted image A4 and the label image A5. The loss term L1 can represent the difference in image features between the predicted image A4 and the label image A5, and can be specifically determined based on the mean absolute error of the pixel values of the pixels of the predicted image A4 and the label image A5. The loss term L2 can represent the difference in visual perception between the predicted image A4 and the label image A5, and the loss term L3 can represent the difference in image style between the predicted image A4 and the label image A5. Specifically, the predicted image A4 and the label image A5 can be respectively input into a pre-trained convolutional neural network to obtain two feature maps, and the loss term L2 is obtained by calculating the difference between the two feature maps. The loss term L3 is obtained by calculating the difference in the Lagum matrices corresponding to the two feature maps.

损失项L4基于目标区域M在预测图像A4和标签图像A5中的差异而确定,其中,目标区域M为样本掩膜图像B3与标签掩膜图像B5中存在差别的区域。损失项L4能够表示目标区域M在预测图像A4和标签图像A5之间的颜色的差异。The loss term L4 is determined based on the difference between the target area M in the predicted image A4 and the label image A5, where the target area M is the area where there is a difference between the sample mask image B3 and the label mask image B5. The loss term L4 can represent the difference in color of the target area M between the predicted image A4 and the label image A5.

损失项L5基于预测掩膜图B4和标签掩膜图像B5而确定,可以是预测掩膜图B4和标签掩膜图像B5之间的二分类交叉熵损失与区域互信息损失的加权和。损失项L6基于预测分割图像C4和标签分割图像C5而确定,可以是预测分割图像C4和标签分割图像C5之间的二分类交叉熵损失与区域互信息损失的加权和。损失项L7基于上述判别器P输出的结果而确定,可以是用于表示生成对抗网络损失的损失项。可以理解,预测损失还可以包括其它的损失项,本实施例对此方面不限定。The loss term L5 is determined based on the predicted mask image B4 and the label mask image B5, and can be a weighted sum of the binary cross entropy loss and the regional mutual information loss between the predicted mask image B4 and the label mask image B5. The loss term L6 is determined based on the predicted segmented image C4 and the label segmented image C5, and can be a weighted sum of the binary cross entropy loss and the regional mutual information loss between the predicted segmented image C4 and the label segmented image C5. The loss term L7 is determined based on the result output by the above-mentioned discriminator P, and can be a loss term used to represent the loss of the generative adversarial network. It can be understood that the prediction loss can also include other loss terms, and this embodiment is not limited to this aspect.

下面将结合具体的实施例对本公开进行详细描述。The present disclosure will be described in detail below in conjunction with specific embodiments.

图3为根据一示例性实施例示出的一种图像处理方法的流程图。该方法的执行主体可以实现为任何具有计算、处理能力的设备、平台、服务器或设备集群。该方法包括以下步骤:FIG3 is a flow chart of an image processing method according to an exemplary embodiment. The execution subject of the method can be implemented as any device, platform, server or device cluster with computing and processing capabilities. The method includes the following steps:

如图3所示,在步骤301中,获取待处理的原始图像。 As shown in FIG. 3 , in step 301 , an original image to be processed is obtained.

在本实施例中,原始图像为待处理的图像,原始图像中包括第一类别对象。本实施例所涉及的对原始图像的处理,可以是将原始图像中包括的第一类别对象转换成第二类别对象。例如,在一种场景下,原始图像可以是人物图像,原始图像中待转换的对象可以是人物头发,第一类别对象例如可以是原始图像中人物的长发。对原始图像进行处理可以是将原始图像中包括的长发转换成短发,得到目标图像。其中,目标图像中包括经过转换之后的人物的短发(即第二类别对象)。In this embodiment, the original image is an image to be processed, and the original image includes a first category object. The processing of the original image involved in this embodiment may be to convert the first category object included in the original image into a second category object. For example, in one scenario, the original image may be a person image, and the object to be converted in the original image may be the person's hair, and the first category object may be, for example, the long hair of the person in the original image. Processing the original image may be to convert the long hair included in the original image into short hair to obtain a target image. The target image includes the short hair of the person after conversion (i.e., the second category object).

又例如,在另一种场景下,原始图像可以是人物全身图像,原始图像中待转换的对象可以是人物服装,第一类别对象例如可以是原始图像中人物的长裙。对原始图像进行处理可以是将原始图像中包括的长裙转换成短裙,得到目标图像。其中,目标图像中可以包括经过转换之后的人物的短裙(即第二类别对象)。可以理解,还可以在其它场景下应用本方案,本实施例对具体应用场景方面不限定。For another example, in another scenario, the original image may be a full-body image of a person, the object to be converted in the original image may be the person's clothing, and the first category object may be, for example, a long skirt of the person in the original image. Processing the original image may be converting the long skirt included in the original image into a short skirt to obtain a target image. The target image may include the short skirt of the person after conversion (i.e., the second category object). It can be understood that the present solution may also be applied in other scenarios, and the present embodiment does not limit the specific application scenarios.

在步骤302中,基于第一类别对象在原始图像中对应的第一区域,确定掩膜图像。In step 302, a mask image is determined based on a first region corresponding to an object of a first category in an original image.

在本实施例中,第一类别对象在原始图像中对应于第一区域,可以根据第一区域生成原始图像对应的掩膜图像,该掩膜图像可以是单通道的二值图像。例如,可以将第一区域对应的像素点的值设置为1或255,将除第一区域之外的其它区域的像素点的值设置为0。因此,在待转换的对象为人物头发的场景下,原始图像对应的掩膜图像所呈现的效果是头发区域为白色,其它区域为黑色。在待转换的对象为人物服装的场景下,原始图像对应的掩膜图像所呈现的效果是人物着装区域为白色,其它区域为黑色。In this embodiment, the first category object corresponds to the first area in the original image, and a mask image corresponding to the original image can be generated based on the first area. The mask image can be a single-channel binary image. For example, the value of the pixel corresponding to the first area can be set to 1 or 255, and the value of the pixel in other areas except the first area can be set to 0. Therefore, in the scene where the object to be converted is the hair of a person, the mask image corresponding to the original image presents an effect that the hair area is white and the other areas are black. In the scene where the object to be converted is the clothing of a person, the mask image corresponding to the original image presents an effect that the clothing area of the person is white and the other areas are black.

在步骤303中,对原始图像进行分割处理,得到分割图像。In step 303, the original image is segmented to obtain a segmented image.

在本实施例中,可以对原始图像进行例如语义分割处理,得到包括至少部分预设类别语义的分割图像。可选的,可以采用语义分割模型对原始图像进行处理,其中,该语义分割模型基于包括预设类别语义的图像而训练得到。例如,可以预先根据场景和需要设定n个预设类别的语义,然后,基于该n个预设类别的语义和包括该n个预设类别的语义的图像,训练语义分割模型。利用该语义分割模型对原始图像进行语义分割处理,得到的语义图像中包括该n个预设类别的语义中的任意多个语义。例如,在待转换的对象为人物头 发的场景下,该n个预设类别的语义可以包括但不限于:背景、头发、人脸、衣服、身体皮肤、手、眼镜、帽子等。In this embodiment, the original image may be subjected to, for example, semantic segmentation processing to obtain a segmented image including at least part of the semantics of a preset category. Optionally, a semantic segmentation model may be used to process the original image, wherein the semantic segmentation model is trained based on an image including the semantics of a preset category. For example, n preset categories of semantics may be set in advance according to the scene and needs, and then a semantic segmentation model may be trained based on the semantics of the n preset categories and an image including the semantics of the n preset categories. The semantic segmentation model is used to perform semantic segmentation processing on the original image, and the obtained semantic image includes any number of semantics of the n preset categories. For example, when the object to be converted is a person's head In the scenario of posting, the semantics of the n preset categories may include but are not limited to: background, hair, face, clothes, body skin, hands, glasses, hats, etc.

在步骤304中,基于原始图像、掩膜图像和分割图像,得到目标图像。In step 304, a target image is obtained based on the original image, the mask image and the segmented image.

在本实施例中,可以基于原始图像、掩膜图像和分割图像,将原始图像中包括的第一类别对象转换成第二类别对象,得到目标图像。可选地,可以利用预先训练好的目标模型对原始图像、掩膜图像和分割图像进行转换处理,得到目标图像。其中,该目标模型可以是深度卷积神经网络模型,可选地,该目标模型具体可以为u2ne类型的模型。可以通过传统的有监督学习的方式训练目标模型,也可以通过有监督的对抗式生成网络训练目标模型,可以理解,本实施例对此方面不限定。In this embodiment, based on the original image, the mask image and the segmented image, the first category object included in the original image can be converted into the second category object to obtain a target image. Optionally, the original image, the mask image and the segmented image can be converted using a pre-trained target model to obtain a target image. The target model can be a deep convolutional neural network model, and optionally, the target model can specifically be a u2ne type model. The target model can be trained by traditional supervised learning, or by a supervised adversarial generative network. It can be understood that this embodiment is not limited to this aspect.

具体来说,可以先将原始图像、掩膜图像和分割图像进行合并处理,得到合并数据,使合并数据的通道数为原始图像、掩膜图像和分割图像的通道数之和。其中,该合并处理可以是基于通道进行合并的处理(如堆叠处理等)。例如,原始图像可以是3通道的RGB图像,掩膜图像可以是1通道的二值图像,分割图像可以是n通道的语义分割图像(n为语义的预设类别的数量),则进行合并处理之后可以得到4+n通道的合并数据。需要说明的是,进行合并处理之后,每个通道的信息并没有发生变化。可以将该合并数据输入至目标模型进行转换处理,得到目标图像。Specifically, the original image, the mask image and the segmented image can be merged to obtain merged data, so that the number of channels of the merged data is the sum of the number of channels of the original image, the mask image and the segmented image. Among them, the merging process can be a channel-based merging process (such as stacking process, etc.). For example, the original image can be a 3-channel RGB image, the mask image can be a 1-channel binary image, and the segmented image can be an n-channel semantic segmentation image (n is the number of preset semantic categories), then after the merging process, 4+n channel merged data can be obtained. It should be noted that after the merging process, the information of each channel has not changed. The merged data can be input into the target model for conversion processing to obtain the target image.

本公开提供的一种图像处理方法,结合原始图像对应的掩膜图像和分割图像,将原始图像中指定的第一类别对象转换成第二类别对象,得到目标图像。由于是对原始图像中指定的对象进行转换,所以仅指定的对象对应的部分区域需要较大的更改,本方案在处理图像的过程中结合了掩膜图像和分割图像,使得对原始图像中的指定的对象进行转换的过程更有针对性,并能通过语义信息的引导,对被更改的区域进行修复。从而使目标图像中由指定的对象转换成的目标对象的边缘更清晰,纹理细节更丰富,图像更真实,提高了显示效果,解决了颜色异常的问题。The present disclosure provides an image processing method, which combines a mask image and a segmented image corresponding to the original image, and converts a first category object specified in the original image into a second category object to obtain a target image. Since the specified object in the original image is converted, only a part of the area corresponding to the specified object needs to be significantly changed. In the process of processing the image, the present solution combines the mask image and the segmented image, so that the process of converting the specified object in the original image is more targeted, and the changed area can be repaired through the guidance of semantic information. As a result, the edge of the target object converted from the specified object in the target image is clearer, the texture details are richer, the image is more realistic, the display effect is improved, and the problem of color abnormality is solved.

图4为根据一示例性实施例示出的一种图像处理模型的训练方法的流程图。该模型可以是图3实施例中所涉及的用于处理图像的目标模型。该方法的执行主体可以实现为任何具有计算、处理能力的设备、平台、服务器或设备集群。该方法可以包括迭代执行以下步骤: FIG4 is a flow chart of a training method for an image processing model according to an exemplary embodiment. The model may be the target model for processing images involved in the embodiment of FIG3 . The execution subject of the method may be implemented as any device, platform, server or device cluster with computing and processing capabilities. The method may include iteratively executing the following steps:

如图4所示,在步骤401中,获取样本图像、样本掩膜图像、样本分割图像、标签图像、标签掩膜图像和标签分割图像。As shown in FIG. 4 , in step 401 , a sample image, a sample mask image, a sample segmented image, a label image, a label mask image and a label segmented image are obtained.

在本实施例中,可以预先采集大量用于训练的备选图像,每个备选图像中均包括第一类别对象,并获取每个备选图像各自对应的备选标签图像。任一备选图像对应的备选标签图像包括由该备选图像中的第一类别对象转换成的第二类别对象。例如,在待转换的对象为人物头发的场景下,如果第一类别对象为人物长发,第二类别对象为人物短发,则每个备选图像中均包括人物长发,每个备选图像对应的备选标签图像中包括由该备选图像中人物长发转换成的人物短发。又例如,在待转换的对象为人物服装的场景下,如果第一类别对象为人物长裙,第二类别对象为人物短裙,则每个备选图像中均包括人物长裙,每个备选图像对应的备选标签图像中包括由该备选图像中人物长裙转换成的人物短裙。In this embodiment, a large number of candidate images for training can be collected in advance, each candidate image includes a first category object, and the candidate label image corresponding to each candidate image is obtained. The candidate label image corresponding to any candidate image includes a second category object converted from the first category object in the candidate image. For example, in a scene where the object to be converted is a person's hair, if the first category object is a person's long hair and the second category object is a person's short hair, then each candidate image includes the person's long hair, and the candidate label image corresponding to each candidate image includes the person's short hair converted from the person's long hair in the candidate image. For another example, in a scene where the object to be converted is a person's clothing, if the first category object is a person's long skirt and the second category object is a person's short skirt, then each candidate image includes a person's long skirt, and the candidate label image corresponding to each candidate image includes the person's short skirt converted from the person's long skirt in the candidate image.

可以理解,可以通过任意合理的方式获取备选图像对应的备选标签图像,例如,可以由人工对备选图像进行手动修改,得到该备选图像对应的备选标签图像。也可以先通过其它功能相似效果较差的模型对该备选图像进行处理之后,再进行手动修改,得到该备选图像对应的备选标签图像。本实施例对备选标签图像的具体获取方式方面不限定。It is understandable that the candidate label image corresponding to the candidate image can be obtained by any reasonable method. For example, the candidate image can be manually modified to obtain the candidate label image corresponding to the candidate image. The candidate image can also be processed by other models with similar functions but poor effects, and then manually modified to obtain the candidate label image corresponding to the candidate image. This embodiment does not limit the specific method for obtaining the candidate label image.

然后,获取每个备选图像各自对应的掩膜图像和分割图像(获取方式可以参考图3实施例中原始图像对应的掩膜图像和分割图像的获取方式),以及获取每个备选标签图像各自对应的掩膜图像和分割图像。接着,利用备选图像及其对应的掩膜图像、分割图像、备选标签图像,以及备选标签图像对应的掩膜图像和分割图像,构建训练集。Then, obtain the mask image and segmentation image corresponding to each candidate image (the acquisition method can refer to the acquisition method of the mask image and segmentation image corresponding to the original image in the embodiment of FIG. 3), and obtain the mask image and segmentation image corresponding to each candidate label image. Then, use the candidate images and their corresponding mask images, segmentation images, candidate label images, and the mask images and segmentation images corresponding to the candidate label images to construct a training set.

在对目标模型进行训练时,可以从训练集中取出备选图像作为样本图像,并获取该样本图像对应的样本掩膜图像、样本分割图像、标签图像,以及获取该标签图像对应的标签掩膜图像和标签分割图像。其中,样本图像中包括第一类别对象,第一类别对象在样本图像中对应于第二区域,样本掩膜图像基于第二区域确定。标签图像为将样本图像中的第一类别对象转换成第二类别对象所得到的图像,第二类别对象在标签图像中对应于第三区域,标签掩膜图像基于第三区域确定。When training the target model, an alternative image can be taken from the training set as a sample image, and a sample mask image, a sample segmentation image, and a label image corresponding to the sample image can be obtained, as well as a label mask image and a label segmentation image corresponding to the label image. The sample image includes a first category object, the first category object corresponds to a second region in the sample image, and the sample mask image is determined based on the second region. The label image is an image obtained by converting the first category object in the sample image into a second category object, the second category object corresponds to a third region in the label image, and the label mask image is determined based on the third region.

在步骤402中,基于样本图像、样本掩膜图像、样本分割图像以及待训 练的目标模型,得到预测图像、预测掩膜图像以及预测分割图像。In step 402, based on the sample image, the sample mask image, the sample segmentation image and the training image, The trained target model is used to obtain the predicted image, predicted mask image and predicted segmentation image.

在本实施例中,可以将样本图像、样本掩膜图像和样本分割图像进行合并处理(如堆叠处理等),得到合并样本数据。该合并样本数据的通道数为样本图像、样本掩膜图像和样本分割图像的通道数之和。然后,将合并样本数据输入至待训练的目标模型,待训练的目标模型可以输出预测图像,预测图像对应的预测掩膜图像和预测分割图像。其中,预测图像包括由样本图像中的第一类别对象转换而成的第二类别对象。In this embodiment, the sample image, the sample mask image and the sample segmentation image may be merged (such as stacking) to obtain merged sample data. The number of channels of the merged sample data is the sum of the number of channels of the sample image, the sample mask image and the sample segmentation image. Then, the merged sample data is input into the target model to be trained, and the target model to be trained may output a predicted image, a predicted mask image and a predicted segmentation image corresponding to the predicted image. The predicted image includes a second category object converted from a first category object in the sample image.

需要说明的是,标签图像中包括的第二类别对象和预测图像中包括的第二类别对象,虽然都是由样本图像中包括的第一类别对象转换而成的,但并不完全一致。并且,标签图像为用于和预测图像进行比较的参考图像,具有更好的显示效果。It should be noted that the second category objects included in the label image and the second category objects included in the predicted image are not completely the same, although they are both converted from the first category objects included in the sample image. In addition, the label image is a reference image for comparison with the predicted image, which has a better display effect.

在步骤403中,基于预测图像、预测掩膜图像、预测分割图像、标签图像、标签掩膜图像以及标签分割图像,确定预测损失,以及在步骤404中,以减小预测损失为目标,调整待训练的目标模型的模型参数。In step 403, the prediction loss is determined based on the predicted image, the predicted mask image, the predicted segmented image, the label image, the label mask image and the label segmented image, and in step 404, the model parameters of the target model to be trained are adjusted with the goal of reducing the prediction loss.

在本实施例中,可以基于预测图像、预测掩膜图像、预测分割图像、标签图像、标签掩膜图像以及标签分割图像,确定预测损失。其中,预测损失可以包括第一损失项、第二损失项和第三损失项。第一损失项基于预测图像和标签图像而确定,可以是表示预测图像和标签图像之间的图像特征差异的损失项,具体可以是基于预测图像和标签图像的平均绝对误差确定的。第二损失项基于预测掩膜图和标签掩膜图像而确定,可以是预测掩膜图和标签掩膜图像之间的二分类交叉熵损失与区域互信息损失的加权和。第三损失项基于预测分割图像和标签分割图像而确定,可以是预测分割图像和标签分割图像之间的二分类交叉熵损失与区域互信息损失的加权和。In this embodiment, the prediction loss can be determined based on the predicted image, the predicted mask image, the predicted segmented image, the label image, the label mask image and the label segmented image. The prediction loss may include a first loss term, a second loss term and a third loss term. The first loss term is determined based on the predicted image and the label image, and may be a loss term representing the difference in image features between the predicted image and the label image, and may be specifically determined based on the mean absolute error between the predicted image and the label image. The second loss term is determined based on the predicted mask image and the label mask image, and may be a weighted sum of a binary cross entropy loss and a regional mutual information loss between the predicted mask image and the label mask image. The third loss term is determined based on the predicted segmented image and the label segmented image, and may be a weighted sum of a binary cross entropy loss and a regional mutual information loss between the predicted segmented image and the label segmented image.

可选地,预测损失还可以包括第四损失项,第四损失项基于目标区域在预测图像和标签图像中的差异而确定。其中,目标区域为样本掩膜图像与标签掩膜图像中存在差别的区域。具体来说,可以通过计算样本掩膜图像与标签掩膜图像的各个像素点各自对应的像素值之差的绝对值来确定目标区域。由于本实施例基于目标区域得到用于指导模型参数调整的第四损失项,因此,使得训练得到的目标模型对原始图像中的指定对象的处理更精准,更有针对性。 Optionally, the prediction loss may also include a fourth loss term, which is determined based on the difference between the target area in the predicted image and the label image. The target area is the area where there is a difference between the sample mask image and the label mask image. Specifically, the target area can be determined by calculating the absolute value of the difference between the pixel values corresponding to each pixel point of the sample mask image and the label mask image. Since the fourth loss term for guiding the adjustment of the model parameters is obtained based on the target area in this embodiment, the trained target model can process the specified object in the original image more accurately and more targeted.

可选地,第四损失项可以通过如下方式确定:具体来说,可以先确定第一颜色空间下目标区域在预测图像和标签图像中的第一差异值,并确定第二颜色空间下目标区域在预测图像和标签图像中的第二差异值,基于第一差异值和第二差异值的加权和,确定第四损失项。其中,第一颜色空间例如可以是RGB颜色空间,第二颜色空间例如可以是LAB颜色空间。第一差异值可以是RGB颜色空间下目标区域在预测图像和标签图像中的平均绝对误差,第二差异值可以是LAB颜色空间下目标区域在预测图像和标签图像中的平均绝对误差。第一差异值的对应的权重大于第二差异值对应的权重。由于本实施例基于不同颜色空间下目标区域在预测图像和标签图像中的差异值的加权和,得到用于指导模型参数调整的损失项,因此,使得训练得到的目标模型对原始图像进行处理得到的图像的颜色效果更好。Optionally, the fourth loss term can be determined in the following manner: Specifically, the first difference value of the target area in the predicted image and the label image under the first color space can be determined first, and the second difference value of the target area in the predicted image and the label image under the second color space can be determined, and the fourth loss term is determined based on the weighted sum of the first difference value and the second difference value. Among them, the first color space can be, for example, an RGB color space, and the second color space can be, for example, a LAB color space. The first difference value can be the mean absolute error of the target area in the predicted image and the label image under the RGB color space, and the second difference value can be the mean absolute error of the target area in the predicted image and the label image under the LAB color space. The corresponding weight of the first difference value is greater than the corresponding weight of the second difference value. Since this embodiment obtains the loss term for guiding the adjustment of model parameters based on the weighted sum of the difference values of the target area in the predicted image and the label image under different color spaces, the color effect of the image obtained by processing the original image by the trained target model is better.

进一步可选地,可以通过有监督的生成式对抗网络训练目标模型,因此,需要将预测图像和标签图像输入至待训练的判别器中,并基于判别器输出的结果,调整判别器。并且,还可以基于判别器输出的结果确定预测损失包括的第五损失项,该第五损失项可以是用于表示生成对抗网络损失的损失项。可以理解,预测损失还可以包括其它的损失项,本实施例对此方面不限定。Further optionally, the target model can be trained by a supervised generative adversarial network. Therefore, the predicted image and the label image need to be input into the discriminator to be trained, and the discriminator is adjusted based on the result output by the discriminator. In addition, the fifth loss term included in the prediction loss can also be determined based on the result output by the discriminator. The fifth loss term can be a loss term used to represent the loss of the generative adversarial network. It can be understood that the prediction loss can also include other loss terms, and this embodiment is not limited to this aspect.

由于本实施例在训练目标模型的过程中,结合样本图像对应的标签图像、标签掩膜图像和标签分割图像,得到预测损失。并基于预测损失对目标模型进行训练,使得目标模型对原始图像中的指定的对象进行转换的过程更有针对性,从而使目标图像中目标对象的边缘更清晰,纹理细节更丰富,图像更真实,提高了显示效果,解决了颜色异常的问题。In the process of training the target model, the present embodiment combines the label image, label mask image and label segmentation image corresponding to the sample image to obtain the prediction loss. The target model is trained based on the prediction loss, so that the process of converting the specified object in the original image by the target model is more targeted, so that the edge of the target object in the target image is clearer, the texture details are richer, the image is more realistic, the display effect is improved, and the problem of color abnormality is solved.

应当注意,尽管在上述实施例中,以特定顺序描述了本公开实施例的方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。相反,流程图中描绘的步骤可以改变执行顺序。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。It should be noted that although the operations of the method of the disclosed embodiment are described in a specific order in the above embodiments, this does not require or imply that the operations must be performed in this specific order, or that all the operations shown must be performed to achieve the desired results. On the contrary, the steps depicted in the flowchart can change the order of execution. Additionally or alternatively, some steps can be omitted, multiple steps can be combined into one step for execution, and/or one step can be decomposed into multiple steps for execution.

与前述图像处理方法实施例相对应,本公开还提供了图像处理装置的实施例。Corresponding to the aforementioned image processing method embodiment, the present disclosure also provides an embodiment of an image processing device.

如图5所示,图5是本公开根据一示例性实施例示出的一种图像处理装置的框图,该装置可以包括:获取模块501,确定模块502,分割模块503和 处理模块504。As shown in FIG5 , FIG5 is a block diagram of an image processing device according to an exemplary embodiment of the present disclosure, the device may include: an acquisition module 501, a determination module 502, a segmentation module 503 and Processing module 504.

其中,获取模块501,用于获取待处理的原始图像,该原始图像中包括第一类别对象,第一类别对象在原始图像中对应于第一区域。The acquisition module 501 is used to acquire an original image to be processed, wherein the original image includes a first category object, and the first category object corresponds to a first area in the original image.

确定模块502,用于基于第一区域,确定原始图像对应的掩膜图像。The determination module 502 is configured to determine a mask image corresponding to the original image based on the first area.

分割模块503,用于对原始图像进行分割处理,得到包括至少部分预设类别语义的分割图像。The segmentation module 503 is used to perform segmentation processing on the original image to obtain a segmented image including at least part of the preset category semantics.

处理模块504,用于基于原始图像、掩膜图像和分割图像,得到目标图像。目标图像包括由原始图像中的第一类别对象转换而成的第二类别对象。The processing module 504 is used to obtain a target image based on the original image, the mask image and the segmented image. The target image includes objects of the second category converted from objects of the first category in the original image.

在一些实施方式中,分割模块503被配置用于:采用语义分割模型对原始图像进行处理,得到分割图像。其中,语义分割模型基于包括预设类别语义的图像而训练得到。In some embodiments, the segmentation module 503 is configured to: process the original image using a semantic segmentation model to obtain a segmented image, wherein the semantic segmentation model is trained based on an image including a preset category semantics.

在另一些实施方式中,处理模块504可以包括:转换子模块(图中未示出)。In some other implementations, the processing module 504 may include: a conversion submodule (not shown in the figure).

其中,转换子模块,用于利用目标模型对原始图像、掩膜图像和分割图像进行转换处理,得到目标图像。该转换处理为将第一类别对象转换成第二类别对象的处理。The conversion submodule is used to convert the original image, the mask image and the segmented image using the target model to obtain the target image. The conversion process is to convert the first category object into the second category object.

在另一些实施方式中,转换子模块被配置用于:将原始图像、掩膜图像和分割图像进行合并处理,得到合并数据。该合并数据的通道数为原始图像、掩膜图像和分割图像的通道数之和,将该合并分割输入至目标模型进行上述转换处理,得到目标图像。In other embodiments, the conversion submodule is configured to: merge the original image, the mask image and the segmented image to obtain merged data. The number of channels of the merged data is the sum of the number of channels of the original image, the mask image and the segmented image, and the merged segmentation is input into the target model for the above conversion process to obtain the target image.

在另一些实施方式中,该目标模型通过迭代执行以下步骤训练得到:获取样本图像以及标签图像,其中,样本图像中的第二区域对应于第一类别对象,标签图像中的第三区域对应于第二类别对象。基于样本图像以及待训练的目标模型,得到预测图像,基于预测图像和标签图像,确定预测损失。预测损失包括第四损失项,第四损失项基于目标区域在预测图像和标签图像中的差异而确定。其中,目标区域基于第二区域与第三区域确定,以减小预测损失为目标,调整待训练的目标模型的模型参数。In other embodiments, the target model is trained by iteratively executing the following steps: obtaining a sample image and a label image, wherein the second region in the sample image corresponds to a first category object, and the third region in the label image corresponds to a second category object. Based on the sample image and the target model to be trained, a predicted image is obtained, and a prediction loss is determined based on the predicted image and the label image. The prediction loss includes a fourth loss term, which is determined based on the difference between the target region in the predicted image and the label image. The target region is determined based on the second region and the third region, and the model parameters of the target model to be trained are adjusted with the goal of reducing the prediction loss.

在另一些实施方式中,第四损失项可以通过如下方式确定:确定第一颜色空间下目标区域在预测图像和标签图像中的第一差异值,确定第二颜色空间下目标区域在预测图像和标签图像中的第二差异值,基于第一差异值和第 二差异值的加权和,确定第四损失项。In some other embodiments, the fourth loss term can be determined by: determining a first difference value between the target area in the predicted image and the label image in the first color space, determining a second difference value between the target area in the predicted image and the label image in the second color space, and based on the first difference value and the second difference value, The weighted sum of the two difference values determines the fourth loss term.

对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本公开实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。For the device embodiment, since it basically corresponds to the method embodiment, the relevant parts can refer to the partial description of the method embodiment. The device embodiment described above is only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiment scheme of the present disclosure. A person of ordinary skill in the art can understand and implement it without paying creative labor.

图6为本公开一些实施例提供的一种电子设备的示意框图。如图6所示,该电子设备910包括处理器911和存储器912,可以用于实现客户端或服务器。存储器912用于非瞬时性地存储有计算机可执行指令(例如一个或多个计算机程序模块)。处理器911用于运行该计算机可执行指令,该计算机可执行指令被处理器911运行时可以执行上文所述的图像处理方法中的一个或多个步骤,进而实现上文所述的图像处理方法。存储器912和处理器911可以通过总线系统和/或其它形式的连接机构(未示出)互连。FIG6 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure. As shown in FIG6 , the electronic device 910 includes a processor 911 and a memory 912, which can be used to implement a client or a server. The memory 912 is used to store computer executable instructions (e.g., one or more computer program modules) non-transiently. The processor 911 is used to run the computer executable instructions, and when the computer executable instructions are run by the processor 911, one or more steps in the image processing method described above can be executed, thereby implementing the image processing method described above. The memory 912 and the processor 911 can be interconnected via a bus system and/or other forms of connection mechanisms (not shown).

例如,处理器911可以是中央处理单元(CPU)、图形处理单元(GPU)或者具有数据处理能力和/或程序执行能力的其它形式的处理单元。例如,中央处理单元(CPU)可以为X86或ARM架构等。处理器911可以为通用处理器或专用处理器,可以控制电子设备910中的其它组件以执行期望的功能。For example, the processor 911 may be a central processing unit (CPU), a graphics processing unit (GPU), or other forms of processing units having data processing capabilities and/or program execution capabilities. For example, the central processing unit (CPU) may be an X86 or ARM architecture, etc. The processor 911 may be a general-purpose processor or a dedicated processor, and may control other components in the electronic device 910 to perform desired functions.

例如,存储器912可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序模块,处理器911可以运行一个或多个计算机程序模块,以实现电子设备910的各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据以及应用程序使用和/或产生的各种数据等。For example, the memory 912 may include any combination of one or more computer program products, and the computer program product may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, flash memory, etc. One or more computer program modules may be stored on the computer-readable storage medium, and the processor 911 may run one or more computer program modules to implement various functions of the electronic device 910. Various applications and various data, as well as various data used and/or generated by the application, etc. may also be stored in the computer-readable storage medium.

需要说明的是,本公开的实施例中,电子设备910的具体功能和技术效果可以参考上文中关于图像处理方法的描述,此处不再赘述。 It should be noted that, in the embodiment of the present disclosure, the specific functions and technical effects of the electronic device 910 can refer to the above description of the image processing method, which will not be repeated here.

图7为本公开一些实施例提供的另一种电子设备的示意框图。该电子设备920例如适于用来实施本公开实施例提供的图像处理方法。电子设备920可以是终端设备等,可以用于实现客户端或服务器。电子设备920可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)、可穿戴电子设备等等的移动终端以及诸如数字TV、台式计算机、智能家居设备等等的固定终端。需要注意的是,图7示出的电子设备920仅仅是一个示例,其不会对本公开实施例的功能和使用范围带来任何限制。FIG7 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. The electronic device 920 is suitable for implementing the image processing method provided by the embodiment of the present disclosure, for example. The electronic device 920 may be a terminal device, etc., and may be used to implement a client or a server. The electronic device 920 may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), wearable electronic devices, etc., and fixed terminals such as digital TVs, desktop computers, smart home devices, etc. It should be noted that the electronic device 920 shown in FIG7 is only an example, which does not impose any restrictions on the functions and scope of use of the embodiments of the present disclosure.

如图7所示,电子设备920可以包括处理装置(例如中央处理器、图形处理器等)921,其可以根据存储在只读存储器(ROM)922中的程序或者从存储装置928加载到随机访问存储器(RAM)923中的程序而执行各种适当的动作和处理。在RAM 923中,还存储有电子设备920操作所需的各种程序和数据。处理装置921、ROM 922以及RAM 923通过总线924彼此相连。输入/输出(I/O)接口925也连接至总线924。As shown in FIG. 7 , the electronic device 920 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 921, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 922 or a program loaded from a storage device 928 to a random access memory (RAM) 923. In the RAM 923, various programs and data required for the operation of the electronic device 920 are also stored. The processing device 921, the ROM 922, and the RAM 923 are connected to each other via a bus 924. An input/output (I/O) interface 925 is also connected to the bus 924.

通常,以下装置可以连接至I/O接口925:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置926;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置927;包括例如磁带、硬盘等的存储装置928;以及通信装置929。通信装置929可以允许电子设备920与其他电子设备进行无线或有线通信以交换数据。虽然图7示出了具有各种装置的电子设备920,但应理解的是,并不要求实施或具备所有示出的装置,电子设备920可以替代地实施或具备更多或更少的装置。Typically, the following devices may be connected to the I/O interface 925: input devices 926 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 927 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 928 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 929. The communication devices 929 may allow the electronic device 920 to communicate with other electronic devices wirelessly or by wire to exchange data. Although FIG. 7 shows an electronic device 920 having various devices, it should be understood that it is not required to implement or have all of the devices shown, and the electronic device 920 may alternatively implement or have more or fewer devices.

例如,根据本公开的实施例,上述图像处理方法可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包括用于执行上述图像处理方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置929从网络上被下载和安装,或者从存储装置928安装,或者从ROM922安装。在该计算机程序被处理装置921执行时,可以实现本公开实施例提供的图像处理方法中限定的功能。For example, according to an embodiment of the present disclosure, the above-mentioned image processing method can be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, and the computer program includes a program code for executing the above-mentioned image processing method. In such an embodiment, the computer program can be downloaded and installed from the network through the communication device 929, or installed from the storage device 928, or installed from the ROM 922. When the computer program is executed by the processing device 921, the functions defined in the image processing method provided by the embodiment of the present disclosure can be implemented.

图8为本公开一些实施例提供的一种存储介质的示意图。例如,如图8所示,存储介质930可以为非暂时性计算机可读存储介质,用于存储非暂时 性计算机可执行指令931。当非暂时性计算机可执行指令931由处理器执行时可以实现本公开实施例所述的图像处理方法,例如,当非暂时性计算机可执行指令931由处理器执行时,可以执行根据上文所述的图像处理方法中的一个或多个步骤。FIG8 is a schematic diagram of a storage medium provided by some embodiments of the present disclosure. For example, as shown in FIG8 , the storage medium 930 may be a non-transitory computer-readable storage medium for storing non-transitory When the non-transitory computer executable instruction 931 is executed by a processor, the image processing method described in the embodiment of the present disclosure can be implemented. For example, when the non-transitory computer executable instruction 931 is executed by a processor, one or more steps in the image processing method described above can be performed.

例如,该存储介质930可以应用于上述电子设备中,例如,该存储介质930可以包括电子设备中的存储器。For example, the storage medium 930 may be applied to the above-mentioned electronic device. For example, the storage medium 930 may include a memory in the electronic device.

例如,存储介质可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。For example, the storage medium may include a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory (CD-ROM), flash memory, or any combination of the above storage media, or other applicable storage media.

例如,关于存储介质930的说明可以参考电子设备的实施例中对于存储器的描述,重复之处不再赘述。存储介质930的具体功能和技术效果可以参考上文中关于图像处理方法的描述,此处不再赘述。For example, the description of the storage medium 930 can refer to the description of the memory in the embodiment of the electronic device, and the repeated parts are not repeated. The specific functions and technical effects of the storage medium 930 can refer to the description of the image processing method above, and are not repeated here.

需要说明的是,在本公开的上下文中,计算机可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是,但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合 使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that in the context of the present disclosure, a computer-readable medium may be a tangible medium that may contain or store a program for use by an instruction execution system, device or equipment or used in combination with an instruction execution system, device or equipment. A computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer-readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that may be used by an instruction execution system, device or device or used in combination with it. In the present disclosure, a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, which may be sent, propagated, or transmitted for use by or in conjunction with an instruction execution system, apparatus, or device. Programs used. The program code contained on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

本领域技术人员在考虑本公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。Those skilled in the art will readily appreciate other embodiments of the present disclosure after considering the present disclosure. The present disclosure is intended to cover any variations, uses or adaptations of the present disclosure, which follow the general principles of the present disclosure and include common knowledge or customary technical means in the art that are not disclosed in the present disclosure. The description and examples are to be regarded as exemplary only, and the true scope and spirit of the present disclosure are indicated by the claims.

应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。 It should be understood that the present disclosure is not limited to the exact structures that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

一种图像处理方法,包括:An image processing method, comprising: 获取待处理的原始图像,其中,所述原始图像中包括第一类别对象,所述第一类别对象在所述原始图像中对应于第一区域;Acquire an original image to be processed, wherein the original image includes a first category of objects, and the first category of objects corresponds to a first area in the original image; 基于所述第一区域,确定所述原始图像对应的掩膜图像;Based on the first area, determining a mask image corresponding to the original image; 对所述原始图像进行分割处理,得到包括至少部分预设类别语义的分割图像;Performing segmentation processing on the original image to obtain a segmented image including at least part of the preset category semantics; 基于所述原始图像、所述掩膜图像和所述分割图像,得到目标图像,其中,所述目标图像包括由所述原始图像中的所述第一类别对象转换而成的第二类别对象。A target image is obtained based on the original image, the mask image and the segmented image, wherein the target image includes objects of the second category converted from objects of the first category in the original image. 根据权利要求1所述的方法,其中,所述对所述原始图像进行分割处理,得到包括至少部分预设类别语义的分割图像,包括:The method according to claim 1, wherein the segmenting of the original image to obtain a segmented image including at least part of the preset category semantics comprises: 采用语义分割模型对所述原始图像进行处理,得到所述分割图像;其中,所述语义分割模型基于包括所述预设类别语义的图像而训练得到。The original image is processed by using a semantic segmentation model to obtain the segmented image; wherein the semantic segmentation model is trained based on an image including the preset category semantics. 根据权利要求1或2所述的方法,其中,所述基于所述原始图像、所述掩膜图像和所述分割图像,得到目标图像,包括:The method according to claim 1 or 2, wherein obtaining the target image based on the original image, the mask image and the segmented image comprises: 利用目标模型对所述原始图像、所述掩膜图像和所述分割图像进行转换处理,得到所述目标图像,其中,所述转换处理为将所述第一类别对象转换成所述第二类别对象的处理。The target image is obtained by performing conversion processing on the original image, the mask image and the segmented image using the target model, wherein the conversion processing is a processing of converting the first category object into the second category object. 根据权利要求3所述的方法,其中,所述利用目标模型对所述原始图像、所述掩膜图像和所述分割图像进行转换处理,得到所述目标图像,包括:The method according to claim 3, wherein the converting process of the original image, the mask image and the segmented image using the target model to obtain the target image comprises: 将所述原始图像、所述掩膜图像和所述分割图像进行合并处理,得到合并数据,其中,所述合并数据的通道数为所述原始图像、所述掩膜图像和所述分割图像的通道数之和;Merging the original image, the mask image and the segmented image to obtain merged data, wherein the number of channels of the merged data is the sum of the number of channels of the original image, the mask image and the segmented image; 将所述合并数据输入至所述目标模型进行所述转换处理,得到所述目标图像。The combined data is input into the target model for the conversion process to obtain the target image. 根据权利要求3或4所述的方法,所述目标模型通过迭代执行以下步骤训练得到:According to the method of claim 3 or 4, the target model is trained by iteratively performing the following steps: 获取样本图像以及标签图像;其中,所述样本图像中的第二区域对应于 所述第一类别对象;所述标签图像中的第三区域对应于所述第二类别对象;Obtain a sample image and a label image; wherein the second area in the sample image corresponds to the first category object; the third area in the label image corresponds to the second category object; 基于所述样本图像以及待训练的目标模型,得到预测图像;Based on the sample image and the target model to be trained, a predicted image is obtained; 基于所述预测图像和所述标签图像,确定预测损失,其中,所述预测损失包括第四损失项,所述第四损失项基于目标区域在所述预测图像和所述标签图像中的差异而确定,所述目标区域基于所述第二区域与所述第三区域确定;Determining a prediction loss based on the predicted image and the label image, wherein the prediction loss includes a fourth loss term, the fourth loss term is determined based on a difference between a target region in the predicted image and the label image, and the target region is determined based on the second region and the third region; 以减小所述预测损失为目标,调整所述待训练的目标模型的模型参数。With the goal of reducing the prediction loss, the model parameters of the target model to be trained are adjusted. 根据权利要求5所述的方法,其中,所述第四损失项通过如下方式确定:The method according to claim 5, wherein the fourth loss term is determined by: 确定第一颜色空间下所述目标区域在所述预测图像和所述标签图像中的第一差异值;Determine a first difference value of the target area in the predicted image and the label image in a first color space; 确定第二颜色空间下所述目标区域在所述预测图像和所述标签图像中的第二差异值;Determine a second difference value of the target area in the predicted image and the label image in a second color space; 基于所述第一差异值和所述第二差异值的加权和,确定所述第四损失项。The fourth loss term is determined based on a weighted sum of the first difference value and the second difference value. 一种图像处理装置,包括:An image processing device, comprising: 获取模块,被配置为获取待处理的原始图像,其中,所述原始图像中包括第一类别对象,所述第一类别对象在所述原始图像中对应于第一区域;An acquisition module is configured to acquire an original image to be processed, wherein the original image includes a first category of objects, and the first category of objects corresponds to a first area in the original image; 确定模块,被配置为基于所述第一区域,确定所述原始图像对应的掩膜图像;a determination module, configured to determine a mask image corresponding to the original image based on the first area; 分割模块,被配置为对所述原始图像进行分割处理,得到包括至少部分预设类别语义的分割图像;以及a segmentation module, configured to perform segmentation processing on the original image to obtain a segmented image including at least part of the preset category semantics; and 处理模块,被配置为基于所述原始图像、所述掩膜图像和所述分割图像,得到目标图像,其中,所述目标图像包括由所述原始图像中的所述第一类别对象转换而成的第二类别对象。The processing module is configured to obtain a target image based on the original image, the mask image and the segmented image, wherein the target image includes objects of the second category converted from objects of the first category in the original image. 一种计算机可读存储介质,存储有计算机程序,其中,当所述计算机程序在计算机中执行时,令所述计算机执行权利要求1-6中任一项所述的方法。A computer-readable storage medium stores a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the method according to any one of claims 1 to 6. 一种电子设备,包括存储器和处理器,其中,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-6中任一项所述的方法。 An electronic device comprises a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the method according to any one of claims 1 to 6 is implemented. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现根据权利要求1-6中任一项所述的方法。 A computer program product comprises a computer program, wherein the computer program implements the method according to any one of claims 1 to 6 when being executed by a processor.
PCT/CN2023/134020 2022-12-13 2023-11-24 Image processing method and apparatus, computer-readable storage medium, electronic device and computer program product Ceased WO2024125267A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211594249.0 2022-12-13
CN202211594249.0A CN116229054A (en) 2022-12-13 2022-12-13 Image processing method, device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2024125267A1 true WO2024125267A1 (en) 2024-06-20

Family

ID=86590031

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/134020 Ceased WO2024125267A1 (en) 2022-12-13 2023-11-24 Image processing method and apparatus, computer-readable storage medium, electronic device and computer program product

Country Status (2)

Country Link
CN (1) CN116229054A (en)
WO (1) WO2024125267A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118587443A (en) * 2024-08-07 2024-09-03 之江实验室 Image segmentation method and device based on self-training and prior guidance
CN120599549A (en) * 2025-08-08 2025-09-05 上海源控自动化技术有限公司 Window status identification and control method, device, electronic device and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229054A (en) * 2022-12-13 2023-06-06 北京字跳网络技术有限公司 Image processing method, device and electronic equipment
CN117058168A (en) * 2023-08-09 2023-11-14 北京字跳网络技术有限公司 Image processing methods, devices and electronic equipment
CN120707372A (en) * 2024-03-25 2025-09-26 北京字跳网络技术有限公司 Image processing method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511580A (en) * 2022-01-28 2022-05-17 北京字跳网络技术有限公司 Image processing method, device, equipment and storage medium
CN114863482A (en) * 2022-05-17 2022-08-05 北京字跳网络技术有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN116229054A (en) * 2022-12-13 2023-06-06 北京字跳网络技术有限公司 Image processing method, device and electronic equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903257A (en) * 2019-03-08 2019-06-18 上海大学 A Virtual Hair Dyeing Method Based on Image Semantic Segmentation
CN111062237A (en) * 2019-09-05 2020-04-24 商汤国际私人有限公司 Method and apparatus, electronic device, and storage medium for recognizing sequences in images
CN111047509B (en) * 2019-12-17 2024-07-05 中国科学院深圳先进技术研究院 Image special effect processing method, device and terminal
CN111325212A (en) * 2020-02-18 2020-06-23 北京奇艺世纪科技有限公司 Model training method and device, electronic equipment and computer readable storage medium
CN111047548B (en) * 2020-03-12 2020-07-03 腾讯科技(深圳)有限公司 Attitude transformation data processing method and device, computer equipment and storage medium
CN111598968B (en) * 2020-06-28 2023-10-31 腾讯科技(深圳)有限公司 Image processing method and device, storage medium and electronic equipment
CN113096140B (en) * 2021-04-15 2022-11-22 北京市商汤科技开发有限公司 Instance partitioning method and device, electronic device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511580A (en) * 2022-01-28 2022-05-17 北京字跳网络技术有限公司 Image processing method, device, equipment and storage medium
CN114863482A (en) * 2022-05-17 2022-08-05 北京字跳网络技术有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN116229054A (en) * 2022-12-13 2023-06-06 北京字跳网络技术有限公司 Image processing method, device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118587443A (en) * 2024-08-07 2024-09-03 之江实验室 Image segmentation method and device based on self-training and prior guidance
CN118587443B (en) * 2024-08-07 2024-12-03 之江实验室 Image segmentation method and device based on self-training and priori guidance
CN120599549A (en) * 2025-08-08 2025-09-05 上海源控自动化技术有限公司 Window status identification and control method, device, electronic device and storage medium

Also Published As

Publication number Publication date
CN116229054A (en) 2023-06-06

Similar Documents

Publication Publication Date Title
WO2024125267A1 (en) Image processing method and apparatus, computer-readable storage medium, electronic device and computer program product
JP6994588B2 (en) Face feature extraction model training method, face feature extraction method, equipment, equipment and storage medium
CN110189336B (en) Image generation method, system, server and storage medium
JP2023545565A (en) Image detection method, model training method, image detection device, training device, equipment and program
JP2023547917A (en) Image segmentation method, device, equipment and storage medium
US10719693B2 (en) Method and apparatus for outputting information of object relationship
WO2019242416A1 (en) Video image processing method and apparatus, computer readable storage medium and electronic device
WO2020155907A1 (en) Method and apparatus for generating cartoon style conversion model
CN110321958A (en) Training method, the video similarity of neural network model determine method
WO2020062493A1 (en) Image processing method and apparatus
WO2020248841A1 (en) Au detection method and apparatus for image, and electronic device and storage medium
US20210406305A1 (en) Image deformation control method and device and hardware device
EP4425423A1 (en) Image processing method and apparatus, device, storage medium and program product
CN114332553A (en) Image processing method, device, equipment and storage medium
WO2020238321A1 (en) Method and device for age identification
CN113780326A (en) Image processing method and device, storage medium and electronic equipment
CN112836692A (en) Method, apparatus, apparatus and medium for processing images
CN113284206B (en) Information acquisition method and device, computer readable storage medium, and electronic device
CN112966592A (en) Hand key point detection method, device, equipment and medium
WO2025031323A1 (en) Image processing method and apparatus, and electronic device
US20230036366A1 (en) Image attribute classification method, apparatus, electronic device, medium and program product
CN113225488B (en) Video processing method and device, electronic equipment and storage medium
CN113989121A (en) Normalization processing method and device, electronic equipment and storage medium
CN110059739B (en) Image synthesis method, image synthesis device, electronic equipment and computer-readable storage medium
CN115937020B (en) Image processing methods, devices, equipment, media and program products

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23902465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE