[go: up one dir, main page]

WO2025130944A1 - Picture editing method and system, and related device - Google Patents

Picture editing method and system, and related device Download PDF

Info

Publication number
WO2025130944A1
WO2025130944A1 PCT/CN2024/140387 CN2024140387W WO2025130944A1 WO 2025130944 A1 WO2025130944 A1 WO 2025130944A1 CN 2024140387 W CN2024140387 W CN 2024140387W WO 2025130944 A1 WO2025130944 A1 WO 2025130944A1
Authority
WO
WIPO (PCT)
Prior art keywords
editing
area
image
picture
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/140387
Other languages
French (fr)
Chinese (zh)
Inventor
王旭东
时金魁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2025130944A1 publication Critical patent/WO2025130944A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04845Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text

Definitions

  • the present application relates to the field of electronic technology, and in particular to a picture editing method, related equipment and system.
  • an embodiment of the present application provides a picture editing method, which can be applied to a picture editing system, and the method may include: displaying a first picture, detecting a user operation on the first picture for selecting an editing area, determining an editing area from the first picture according to the operation position of the user operation on the first picture, and distinguishing and displaying the editing area in the first picture; then, generating recommended editing instructions according to image features of the editing area, and displaying the recommended editing instructions; detecting that a user inputs a first editing instruction for the editing area, the first editing instruction including: a recommended editing instruction; then, performing image editing processing on the first picture according to the first editing instruction; and finally, displaying the first picture after the image editing processing.
  • the method provided in the first aspect is based on the understanding of the semantic content of the original image. It recommends editing instructions to the user according to the image features of the editing area selected by the user, and provides the user with editing ideas. This not only reduces the complexity of use, but also ensures the rationality of the content of the edited image, avoids users from experiencing a lot of trial and error, and significantly improves the efficiency of image output.
  • generating recommended editing instructions based on image features of the editing area may specifically include: using a fused feature vector of multiple image features of the editing area as one of the inputs of the first artificial intelligence algorithm; using one or more preset editing types as the second input of the first artificial intelligence algorithm; obtaining recommended editing instructions through the operation of the first artificial intelligence algorithm, the recommended editing instructions including editing parameters corresponding to the preset editing types; wherein the multiple image features may include multiple items of mask features, depth features, contour features, and color features.
  • the preset editing type may include one or more of the following: delete, drag, replace, add, or color adjustment.
  • the editing parameter corresponding to the deletion may include the deleted content
  • the editing parameter corresponding to the replacement may include the replaceable content
  • the editing parameter corresponding to the drag may include the drag target position
  • the editing parameter corresponding to the addition may include the newly added content
  • the editing parameter corresponding to the color adjustment may include the color adjustment value, etc.
  • detecting that the user inputs the first editing instruction for the editing area may specifically include: detecting that the user selects to input a recommended editing instruction, so that the recommended editing instruction selected by the user is determined as the first editing instruction.
  • the picture editing system is not limited to displaying recommended editing instructions, and can also prompt the user to enter editing instructions in the following manner: display a first input box, the first input box can be used to receive voice or text editing instructions; the first editing instructions also include voice or text editing instructions entered through the input box.
  • detecting that the user enters the first editing instruction for the editing area can specifically include: detecting the voice or text instruction entered by the user in the first input box, and the voice or text instruction being determined as the first editing instruction. That is, the first editing instruction can include the user entering a text instruction in the input box or pressing the voice key to enter a voice instruction.
  • the image editing system is not limited to displaying recommended editing instructions, and can also prompt the user to enter editing instructions in the following manner: display one or more preset editing instructions, such as commonly used editing instructions or editing instructions saved in advance by the user.
  • detecting that the user enters a first editing instruction for the editing area can specifically include: detecting that the user selects to enter a preset editing instruction, and the voice or text instruction is determined to be the first editing instruction.
  • the user operation for selecting an editing area may include: a user operation of selecting a first object in a first picture, the operation position of the user operation on the first picture falls on the first object in the first picture, and the image area where the first object is located is the editing area; the image area where the first object is located is determined by performing image segmentation processing on the first picture.
  • the user operation for selecting the editing area may include: a user operation of drawing the editing area in the first picture.
  • distinctively displaying the editing area in the first picture may include one or more of the following methods: highlighting the outline of the editing area, highlighting the entire editing area, or displaying a dotted frame along the outline of the editing area.
  • the method of the first aspect may further include: obtaining preprocessing information of the first image, and determining the editing area from the area indicated by the preprocessing information. In this way, the editing area can be determined based on the preprocessing information of the image, rather than based on image segmentation technology, and repeated online calculations are not required.
  • the preprocessing information may include indication information of multiple regions, wherein the preprocessing information includes coordinates of contour points of each of the multiple regions, a binary image, and a grayscale image, the values of the multiple regions in the binary image are first values, and the grayscale values of the multiple regions in the grayscale image are first grayscale values or first grayscale ranges.
  • determining the editing region from the region indicated by the preprocessing information may specifically include: determining the region where the operation position of the user operation is located in the multiple regions as the editing region, or determining the region closest to the operation position in the multiple regions as the editing region.
  • the preprocessing information may only include indication information of a region, such as coordinates of contour points of a region, a binary image or a grayscale image indicating a region, etc.
  • indication information of a region such as coordinates of contour points of a region, a binary image or a grayscale image indicating a region, etc.
  • the preprocessing information of the first image may not directly indicate one or more regions, but may include other data, such as multi-layer information, or contour information, or depth information.
  • the other data may be used to first determine one or more regions, and then the editing region may be determined from the one or more regions.
  • Methods for determining one or more regions indirectly indicated by the pre-processing information may include, but are not limited to:
  • the preprocessing information includes layer information of multiple layers, and the layer information of each layer includes the coordinates of opaque pixels in the layer, then: before determining the editing area from the area indicated by the preprocessing information, the connected opaque pixels in each layer can be determined as an area indicated by the preprocessing information.
  • pixels with the same or similar depth values may be determined as an area indicated by the preprocessing information based on the depth information.
  • the method of the first aspect may further include: before performing image editing processing on the first picture according to the first editing instruction, if it is determined that the first editing instruction is not reasonable to apply to the editing area, re-recommending the editing instruction or re-recommending the editing area. In this way, the problem that the editing effect that does not conform to common sense logic is caused by the user's random input of editing instructions is solved, thereby reducing the number of trial and error times when the user edits the picture.
  • re-recommending the editing area may specifically include: traversing the area outside the editing area in the first image, comparing the fused feature vector of the traversed area with the feature vector corresponding to the first editing instruction, finding an area that can be reasonably matched with the first editing instruction, and re-recommending the found area as the editing area.
  • re-recommending editing instructions may specifically include: traversing the editing types and/or editing parameters in the recommendation pool, comparing the feature vectors corresponding to the traversed editing types and/or editing parameters with the fused feature vector of the editing area, finding the editing types and/or editing parameters that can reasonably match the editing area, and modifying the first editing instruction according to the found editing type and/or editing parameters, and re-recommending the modified first editing instruction.
  • the editing instructions input by the user for the editing area may involve adding a new object
  • the editing instructions involving adding a new object may include: adding, replacing, dragging and other types of editing instructions.
  • replacing is equivalent to deleting an object from the original image and then adding another object
  • dragging is equivalent to deleting an object from a certain position in the original image and then adding the object to another position.
  • the editing instruction involving adding a new object may mean that the editing process corresponding to the editing instruction includes: adding a new object to the editing area.
  • the depth features of the first pixel area are replaced by the depth features of the new object, and the depth features of the second pixel area remain the depth features of the original object, wherein the perspective relationship of the first pixel area is that the new object is in front of the original object, and the perspective relationship of the second pixel area is that the original object is in front of the new object.
  • performing image editing processing on the first picture according to the first editing instruction may include: correcting image features of the first picture, and regenerating the first picture using the corrected image features of the first picture.
  • the image feature may further include a first image feature, which is an image feature other than a depth feature and may include one or more of the following: a mask feature, a contour feature, and a color feature.
  • a first image feature which is an image feature other than a depth feature and may include one or more of the following: a mask feature, a contour feature, and a color feature.
  • Correcting the image features of the first picture may also include: in the editing area, if the depth feature of a pixel area is replaced with the corrected depth feature of the new object, then using the first image feature of the pixel area on the new object to replace the original first image feature of the pixel area in the editing area.
  • an embodiment of the present application provides a terminal device, which may include: a human-computer interaction module, a processor and a memory, wherein the human-computer interaction module is coupled to the processor, and the memory is coupled to the processor; the human-computer interaction module may include input and output components such as a touch screen; wherein the memory may be used to store computer program code, and the computer program code may include computer instructions, and when the processor executes the computer instructions, the terminal device executes the method described in any one or more embodiments of the first aspect mentioned above.
  • the terminal device provided in the second aspect has powerful computing capabilities
  • complex calculations can be performed directly on the computing module (including the processor) of the terminal device.
  • the human-computer interaction module and the computing module mentioned in the subsequent embodiments can be deployed on the terminal device, and the steps performed by the two are the steps performed by the terminal device, and the communication or data interaction between the two belongs to intra-device communication.
  • an embodiment of the present application provides a computer-readable storage medium, comprising instructions, characterized in that when the instructions are executed on a terminal device, the terminal device executes the method described in any one or more embodiments of the first aspect above.
  • an embodiment of the present application provides a method for editing an image, which can be applied to a human-computer interaction module.
  • the human-computer interaction module is included in an image editing system, and the image editing system also includes: a computing module.
  • the method may include: a human-computer interaction module displays a first image, detects a user operation on the first image for selecting an editing area, and displays the editing area in the first image; then, the human-computer interaction module receives a recommended editing instruction sent by a computing module, and displays the recommended editing instruction, which is generated by the computing module according to image features of the editing area; the human-computer interaction module detects that a user inputs a first editing instruction for the editing area, and the first editing instruction includes: a recommended editing instruction; finally, the human-computer interaction module receives the first image after image editing processing sent by the computing module, and displays the first image after image editing processing, and the image editing processing is performed by the computing module according to the first editing instruction.
  • the first picture may be a picture opened by a user through an application for browsing, managing or processing pictures, such as a photo album (also known as a gallery), a photo editing application, a drawing design program, etc.
  • the first picture may be stored on a terminal device such as a mobile phone, or may be stored on a network.
  • detecting that the user inputs the first editing instruction for the editing area may specifically include: detecting that the user selects to input a recommended editing instruction, so that the recommended editing instruction selected by the user is determined as the first editing instruction.
  • the human-computer interaction module is not limited to displaying recommended editing instructions, and can also prompt the user to enter editing instructions in the following manner: display a first input box, the first input box can be used to receive voice or text editing instructions; the first editing instruction also includes a voice or text editing instruction entered through the input box.
  • the first editing instruction can include the user entering a text instruction in the input box or pressing the voice key to enter a voice instruction.
  • the human-computer interaction module may also prompt the user to enter the editing instruction in the following manner: display one or more preset editing instructions, such as commonly used editing instructions or editing instructions saved in advance by the user.
  • detecting that the user enters the first editing instruction for the editing area may specifically include: detecting that the user selects to enter the preset editing instruction, and the voice or text instruction is determined as the first editing instruction.
  • the user operation for selecting an editing area may include: a user operation of selecting a first object in a first picture, the operation position of the user operation on the first picture falls on the first object in the first picture, and the image area where the first object is located is the editing area; the image area where the first object is located is determined by performing image segmentation processing on the first picture.
  • the user operation for selecting the editing area may include: a user operation for drawing the editing area in the first picture.
  • distinctively displaying the editing area in the first picture may include one or more of the following methods: highlighting the outline of the editing area, highlighting the entire editing area, or displaying a dotted box along the outline of the editing area.
  • the method may further include: if the first editing instruction is not reasonable to be applied to the editing area, the human-computer interaction module re-recommends the editing area; the re-recommended editing area is found by the calculation module from an area outside the editing area based on the feature vector corresponding to the first editing instruction.
  • the method may also include: if the application of the first editing instruction to the editing area is unreasonable, the human-computer interaction module recommends a modified first editing instruction; the editing type and/or editing parameters of the modified first editing instruction are found by the calculation module from the recommendation pool based on the fused feature vector of the editing area.
  • an embodiment of the present application provides a method for editing a picture, the method is applied to a computing module, the computing module is included in a picture editing system, and the picture editing system further includes: a human-computer interaction module;
  • the method may include: a computing module generates a recommended editing instruction based on image features of an editing area of a first image, and sends the recommended editing instruction to a human-computer interaction module, so that the human-computer interaction module displays the recommended editing instruction; then, the computing module receives the first editing instruction sent by the human-computer interaction module, and performs image editing processing on the first image according to the first editing instruction; finally, the computing module sends the first image after image editing processing to the human-computer interaction module, so that the human-computer interaction module displays the first image after image editing processing.
  • the preprocessing information of the first image may not directly indicate one or more regions, but may include other data, such as multi-layer information, or contour information, or depth information.
  • the other data may be used to first determine one or more regions, and then the editing region may be determined from the one or more regions.
  • the calculation module may determine the area surrounded by the contour indicated by the contour information as the area indicated by the preprocessing information.
  • re-recommending editing instructions may specifically include: a computing module traversing the editing types and/or editing parameters in the recommendation pool, comparing the feature vectors corresponding to the traversed editing types and/or editing parameters with the fused feature vector of the editing area, finding the editing types and/or editing parameters that can reasonably match the editing area, and modifying the first editing instruction according to the found editing type and/or editing parameters, and re-recommending the modified first editing instruction.
  • the editing process corresponding to the first editing instruction may include: adding a new object to the editing area.
  • the depth feature of the first pixel area is replaced with the depth feature of the new object, and the depth feature of the second pixel area remains the depth feature of the original object, wherein the perspective relationship of the first pixel area is that the new object is in front of the original object, and the perspective relationship of the second pixel area is that the original object is in front of the new object.
  • the image feature may also include a first image feature, which is an image feature other than a depth feature, and includes one or more of the following: a mask feature, a contour feature, and a color feature.
  • an embodiment of the present application provides a server, which may include: a processor and a memory, wherein the memory is coupled to the processor; wherein the memory is used to store computer program code, and the computer program code includes computer instructions, and when the processor executes the computer instructions, the terminal device executes the method described in any one or more embodiments of the aforementioned fifth aspect.
  • an embodiment of the present application provides a computer-readable storage medium, comprising instructions, characterized in that when the instructions are executed on a terminal device, the terminal device executes the method described in any one or more embodiments of the aforementioned fifth aspect.
  • an embodiment of the present application provides a picture editing system, which may include: a terminal device and a server, wherein the terminal device is the terminal device described in the sixth aspect, and the server is the server described in the seventh aspect.
  • an embodiment of the present application provides a picture editing system, which may include: a human-computer interaction module and a computing module, wherein:
  • the human-computer interaction module may be used to display the first image, detect a user operation on the first image for selecting an editing area, and inform the calculation module of an operation position of the user operation on the first image;
  • the calculation module may be used to determine an editing area from the first image according to an operation position of a user operation on the first image, and inform the human-computer interaction module of the editing area;
  • the human-computer interaction module may also be used to distinguish and display the editing area in the first picture
  • the calculation module may also be used to generate a recommended editing instruction according to the image features of the editing area, and inform the human-computer interaction module of the recommended editing instruction;
  • the human-computer interaction module can also be used to display recommended editing instructions, then detect that the user inputs a first editing instruction for the editing area, and send the first editing instruction to the calculation module; the first editing instruction includes: a recommended editing instruction;
  • the calculation module may also be used to perform image editing processing on the first picture according to the first editing instruction, and send the first picture after image editing processing to the human-computer interaction module;
  • the human-computer interaction module can also be used to display the first image after image editing.
  • the image editing system provided in the eleventh aspect can be deployed in the same device (such as a terminal device such as a mobile phone with powerful computing capabilities) or in two devices (a terminal device and a server).
  • the human-computer interaction module can execute the method described in any one or more embodiments of the aforementioned fourth aspect, and the computing module can execute the method described in any one or more embodiments of the aforementioned fifth aspect.
  • an embodiment of the present application provides a chip system, which is applied to a terminal device.
  • the chip system includes one or more processors, which are used to call computer instructions so that the terminal device can execute the method described in the aforementioned first aspect, or any one or more embodiments of the fourth aspect.
  • an embodiment of the present application provides a chip system, which is applied to a server, and the chip system includes one or more processors, which are used to call computer instructions so that the server can execute the method described in any one or more embodiments of the aforementioned fifth aspect.
  • the present application provides a computer program product comprising instructions.
  • the electronic device can execute the method described in any one or more embodiments of the above-mentioned first aspect, fourth aspect, or fifth aspect.
  • FIG1 shows a picture editing system provided by an embodiment of the present application
  • FIG2 shows a terminal device provided in an embodiment of the present application
  • FIG3 shows a server provided in an embodiment of the present application
  • FIG4 shows the relationship between various method embodiments of the present application
  • FIG5 shows a method for editing a picture provided by an embodiment of the present application
  • FIG6 shows an example of a user opening a picture
  • FIG. 7 shows an example in which a user selects the editing area “sky” in a picture
  • FIG8 shows an example of a user drawing a heart-shaped editing area in a picture
  • FIG9 shows an example of prompting a user to input an editing instruction by voice or text
  • FIG10 shows an example of outputting a recommended editing instruction to a user
  • FIG11 shows another example of outputting a recommended editing instruction to a user
  • FIG12 exemplarily shows the digitized form of the mask feature of the image
  • FIG13 exemplarily shows a visualization form of a mask feature
  • FIG14 exemplarily shows a visualization form of a deep feature
  • FIG15 exemplarily shows a visualization form of contour features
  • FIG17 exemplarily shows the result of executing the editing command “add ‘flying geese’” for the editing area “sky”;
  • FIG18 exemplarily shows the result after executing the editing instruction “adjust hue: reduce brightness to 82%, reduce saturation to 75%, and change hue to red” for the editing area “sky”;
  • FIG23 exemplarily shows a method of determining a plurality of regions indicated by preprocessing information based on preprocessing information including contour information
  • FIG. 25 shows another picture editing method provided by an embodiment of the present application.
  • FIG. 26 shows a method flow for determining whether a first editing instruction and an editing area are reasonably matched, provided in an embodiment of the present application
  • FIG28 shows an example of prompting the user to change the editing area
  • FIG29 shows an example of prompting the user to change the editing instruction
  • FIG30 shows a method flow of how to execute an image involving a newly added object provided by an embodiment of the present application
  • FIG31 exemplarily shows a situation where the perspective relationship between the new object and the original object in the picture is unreasonable
  • FIG32 exemplarily shows a new object image
  • FIG33 exemplarily shows a pixel area in an edit area where a new object mask feature cannot be used
  • FIG34 exemplarily shows a case where the perspective relationship between the new object and the original object in the picture is reasonable
  • FIG35 exemplarily shows the size comparison between the new object image area and the area in the original image where feature replacement (such as depth feature replacement, mask feature replacement, contour feature replacement, etc.) occurs.
  • feature replacement such as depth feature replacement, mask feature replacement, contour feature replacement, etc.
  • some simple image editing functions are widely used, such as erasing a local area of the image, moving the position of an object in the image, adjusting the color or brightness of an object in the image, etc.
  • image editing functions users do not need to perform complex editing processes such as cutting out, redrawing, and coloring the local area of the image. They only need to select the target area or target object by clicking, and enter editing instructions such as deletion, color adjustment, and moving the object position.
  • this type of image editing function only supports a few inherent editing operations, and does not support user-defined editing operations; moreover, it only supports users to process content that has already appeared in the image, and does not support users to add new content to the image; in addition, it cannot prompt users whether the editing operation is reasonable, for example, "dragging a street bench to the sky" is unreasonable.
  • the embodiments of the present application provide a picture editing method, related equipment and system, which can lower the threshold for picture editing, support user-defined editing instructions, and understand the user's operating intentions, provide users with editing ideas, avoid users from experiencing a lot of trial and error, and significantly improve the efficiency of picture output.
  • the picture editing method provided in the embodiment of the present application can be implemented based on the picture editing system 10 shown in Figure 1.
  • the picture editing system 10 may include a human-computer interaction module 100 and a computing module 200.
  • the human-computer interaction module 100 and the computing module 200 work together to jointly implement the picture editing method provided in the embodiment of the present application.
  • the human-computer interaction module 100 can be used as a human-computer interaction interface for users to use the picture editing function.
  • Picture editing programs such as photo albums (also known as gallery), picture editing applications, drawing design programs, etc.
  • the human-computer interaction module 100 can provide input capabilities such as touch input, audio input, gesture input, and output capabilities such as display output and audio output, so that users can use picture editing functions by clicking, dragging, text input, gestures, etc., such as adding new objects to the picture, replacing an object in the picture, erasing a local area in the picture, moving the position of an object in the picture, adjusting the color or brightness of an object in the picture, etc.
  • the human-computer interaction module 100 can also be used to communicate with the calculation module 200 to transmit editing parameters (such as the position clicked by the user, the image features of the editing area, etc.) to the calculation module 200, so that the calculation module performs complex calculations involved in the picture editing function according to the editing parameters. What are the editing parameters and what kind of calculations the calculation module performs according to the editing parameters will be described in detail in the following embodiments, and will not be expanded here.
  • the calculation module 200 can be responsible for the complex calculations involved in the picture editing function, such as editing instruction recommendation algorithm, image processing algorithm, image understanding algorithm, etc.
  • the calculation module 200 can also be used to communicate with the human-computer interaction module 100 to receive the editing parameters transmitted by the human-computer interaction module 100, and return the processing results to the human-computer interaction module 100, so that the human-computer interaction module 100 provides editing suggestions to the user or displays the edited picture according to the processing results.
  • the human-computer interaction module 100 and the computing module 200 can be in different devices respectively.
  • the human-computer interaction module 100 can be in a terminal device such as a mobile phone, a tablet computer, a smart screen, a smart watch, etc.
  • the computing module 200 can be in a device such as a cloud-side server that can provide stronger computing power, or in other terminal devices with surplus computing power.
  • the communication between the two is the communication between devices, which can be mobile communications such as 2G/3G/4G/5G, wireless fidelity (Wi-Fi) communication, satellite communication and other wireless communications, or Ethernet communication, Universal Serial Bus (USB) communication and other wired communications.
  • mobile communications such as 2G/3G/4G/5G, wireless fidelity (Wi-Fi) communication, satellite communication and other wireless communications, or Ethernet communication, Universal Serial Bus (USB) communication and other wired communications.
  • Wi-Fi wireless fidelity
  • satellite communication and other wireless communications or Ethernet communication
  • USB Universal Serial Bus
  • the human-computer interaction module 100 and the computing module 200 may also be integrated into the same device, for example, both are in a terminal device such as a mobile phone, a tablet computer, a personal computer, etc. that has both human-computer interaction capabilities and complex computing capabilities.
  • the communication between the two is intra-device communication, which can be intra-device communication such as bus communication and shared memory communication.
  • FIG. 2 exemplarily shows a terminal device 300 provided in an embodiment of the present application.
  • the terminal device 300 may have both human-computer interaction capability and computing capability.
  • the device type of the terminal device 300 may be any of a mobile phone, a tablet computer, a handheld computer, a desktop computer, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a personal digital assistant (PDA), and smart home devices such as smart large screens, wearable devices such as smart watches and smart glasses, extended reality (XR) devices such as augmented reality (AR), virtual reality (VR), and mixed reality (MR), in-vehicle devices or smart city devices, etc.
  • XR extended reality
  • AR augmented reality
  • VR virtual reality
  • MR mixed reality
  • the terminal device 300 may include: a processor 110, a memory 120, a display 130, a display driver integrated circuit (DDIC) 140, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, and a subscriber identification module (SIM) card interface 195, etc.
  • the sensor module 180 may include a gyroscope sensor 180B, an acceleration sensor 180E, and a touch sensor 180K, etc.
  • the various parts in the terminal device 300 may be connected through a bus.
  • the processor 110 can be responsible for providing computing power and can be used as a computing module of the terminal device; the display 130, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 and other input and output components can be responsible for providing human-computer interaction capabilities and can be used as a human-computer interaction module of the terminal device.
  • the terminal device 300 can be independently implemented as the picture editing system 10 shown in Figure 1; at this time, the part of the terminal device 300 responsible for providing computing power (such as the processor) can constitute the computing module 200 in the picture editing system 10, and the part of the terminal device 300 responsible for providing human-computer interaction capabilities (such as the display screen, touch sensor, etc.) can constitute the human-computer interaction module 100 in the picture editing system 10.
  • the terminal device 300 can also be implemented as only the human-computer interaction module 100 in the picture editing system 10, and the computing module 200 in the picture editing system 10 can be implemented by the cloud-side server.
  • the processor 110 may be one or more, and they may be integrated into an integrated circuit of a system on chip (SOC). SOC is a system-level chip.
  • the processor 110 may include a central processing unit (CPU), a graphic processing unit (GPU), a neural-network processing unit (NPU), etc.
  • the CPU may include an application processor (AP), a baseband processor chip (BP), etc., wherein the AP may be responsible for running the operating system, user interface, and application program on the terminal device; the BP may be responsible for receiving and sending wireless signals and managing radio frequency services.
  • the GPU may be responsible for graphics rendering, coloring according to the rendering instructions and data from the CPU, filling, rendering, and outputting materials, etc.
  • the NPU quickly processes input information by drawing on the biological neural network structure, such as the transmission mode between neurons in the human brain, and can also continuously self-learn.
  • the NPU can be used to run artificial intelligence algorithms, such as editing instruction recommendation algorithms, image processing algorithms, image understanding algorithms, etc.
  • the CPU and GPU can be used to render and synthesize the images to be sent to the display 130.
  • the processor 110 may include one or more interfaces, such as an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general purpose input output (GPIO) interface, a subscriber identity module (SIM) interface, and/or a universal serial bus (USB) interface, etc.
  • I2C inter-integrated circuit
  • I2S inter-integrated circuit sound
  • PCM pulse code modulation
  • UART universal asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general purpose input output
  • SIM subscriber identity module
  • USB universal serial bus
  • the processor 110 may be provided with a cache memory for storing instructions or data just used or cyclically used by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the cache memory, which can reduce the waiting time of the processor 110 and improve the program running efficiency.
  • the memory 120 may include a program storage area and a user data storage area, wherein the program storage area may store an operating system and one or more application programs (such as game applications), and the data storage area may store data (such as photos, contacts) created by the user when using the terminal device 300.
  • the memory 120 may be a high-speed random access memory or a non-volatile memory, such as a disk, a flash memory, a universal flash storage (UFS), etc.
  • the memory 120 may also be an external memory card, such as a Micro SD card.
  • the memory 120 may also store code instructions of the image editing method provided in the embodiment of the present application.
  • the processor 110 reads the code instructions from the memory 120 and runs the code instructions
  • the terminal device 300 can execute the steps performed by the human-computer interaction module and/or the computing module in the image editing method provided in the embodiment of the present application.
  • the memory 120 may also be integrated with the processor 110 in an integrated circuit of a SOC.
  • the terminal device 300 can realize the display function through SOC, DDIC 140, and display 130.
  • the display 130 has multiple refresh rates.
  • the refresh rate indicates the number of times the display screen refreshes the display screen in 1 second. For example, a 60 Hz refresh rate indicates that the display screen refreshes the display screen 60 times in 1 second.
  • the display 130 may use an LTPO display panel, allowing the refresh rate to be reduced to a low refresh rate, such as 10 Hz or 1 Hz, thereby supporting the reduction of the power consumption of the display screen.
  • the display driver integrated circuit (DDIC) 140 can be used as the control core of the display 130 to drive the display 130 to work and receive data from the SOC (processor 110), such as image data and some instructions.
  • the DDIC 140 can send driving signals and data to the display panel of the display 130 in the form of electrical signals, and then realize the control of the screen brightness and color, so that image information such as letters and pictures can be displayed on the screen, completing the screen refresh.
  • the image data of the screen to be displayed sent by the SOC to the DDIC 140 can be sent to the frame buffer for storage to complete the display (or image sending). Then, the DDIC 140 takes out the image data from the frame buffer and drives the display 130 for display.
  • the wireless communication function of the terminal device 300 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in terminal device 300 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve the utilization of antennas.
  • antenna 1 can be reused as a diversity antenna for a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 can provide solutions for wireless communications including 2G/3G/4G/5G applied to the terminal device 300.
  • the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, and filter, amplify, and process the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and convert it into electromagnetic waves for radiation through the antenna 1.
  • at least some of the functional modules of the mobile communication module 150 can be set in the processor 110.
  • at least some of the functional modules of the mobile communication module 150 can be set in the same device as at least some of the modules of the processor 110.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display 130.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless communication solutions including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared (IR) and the like applied to the terminal device 300.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared
  • the wireless communication module 160 can be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the frequency of the electromagnetic wave signal and performs filtering, and sends the processed signal to the processor 110.
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, modulate the frequency of the signal, amplify the signal, and convert it into electromagnetic waves for radiation through the antenna 2.
  • the antenna 1 of the terminal device 300 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal device 300 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology.
  • the GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS) and/or a satellite based augmentation system (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation system
  • the terminal device 300 can realize the shooting function through ISP, camera 193, video codec, GPU, display 130 and application processor.
  • Video codecs are used to compress or decompress digital videos.
  • the terminal device 300 may support one or more video codecs.
  • the terminal device 300 may play or record videos in multiple coding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG Moving Picture Experts Group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 can be arranged in the processor 110, or some functional modules of the audio module 170 can be arranged in the processor 110.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak by putting their mouth close to the microphone 170C to input the sound signal into the microphone 170C.
  • the terminal device 300 can be provided with at least one microphone 170C. In other embodiments, the terminal device 300 can be provided with two microphones 170C, which can not only collect sound signals but also realize noise reduction function. In other embodiments, the terminal device 300 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify the sound source, realize directional recording function, etc.
  • the human-computer interaction module may detect that the user inputs a first editing instruction for the first editing area.
  • the methods for prompting the user to input editing instructions may include but are not limited to the following methods:
  • the first editing instruction may be a text instruction input by the user in an input box or a voice instruction input by pressing a voice key.
  • the human-computer interaction module may display one or more recommended editing instructions, which are generated by the calculation module according to the image features of the first editing area.
  • the editing area selected by the user is a heart-shaped area 131 in the sky
  • the recommended editing instruction generated for the editing area is “recommended adding new screen content ‘soaring geese’”, where “add” is the editing type and “soaring geese” is the editing parameter.
  • the editing area selected by the user is the sky area
  • the recommended editing instruction generated for the editing area is “Recommended adjustment of hue: reduce brightness to 82%, saturation to 75%, and hue to red”, where “adjust hue” is the editing type, and “reduce brightness to 82%, saturation to 75%, and hue to red” are the editing parameters.
  • the embodiment of the present application is based on the understanding of the semantic content of the original image, and recommends editing types and editing parameters to users according to the image features of the editing area selected by the user, thereby providing users with editing ideas. This not only reduces the complexity of use, but also ensures the rationality of the content of the edited image, avoids users from experiencing a lot of trial and error, and significantly improves the efficiency of image output.
  • the image features of the first editing area may include, but are not limited to: mask features, depth features, contour features, and color features. These features may be vector features extracted from the first editing area using various existing technologies.
  • the calculation module can use an image segmentation algorithm (such as SAM, SEEM algorithm) to extract the mask features of the image.
  • the mask features can carry the category information corresponding to each pixel in the image.
  • Figure 12 exemplifies the matrix storage form of the mask features of a picture including the sky, castle, mountain, valley, forest and tree.
  • the size of the matrix is the same as the size of the original image, and the coordinates of the matrix correspond one-to-one with the coordinates of the original image.
  • the values in the matrix represent the category to which the pixels at the corresponding position of the original image belong, such as "0" for "sky", “1” for "forest", etc.
  • the storage form of the mask features can be varied, as long as it can reflect the category information of each pixel, and the embodiments of the present application do not limit this. Regardless of the storage form, the mask features can be converted to obtain the visualization results shown in Figure 13.
  • the calculation module can determine that the category of the pixels in the heart-shaped area 131 is "sky” based on the mask characteristics, and then determine the recommended editing type as "add” and the corresponding editing parameter as "flying geese” based on the specific recommendation rule.
  • the specific recommendation rule includes: when the first editing area is the sky, the editing type is "add” and the editing parameter is "flying geese".
  • the recommendation rules can be set to other, and the embodiments of the present application do not limit this.
  • the calculation module can also extract depth features using a depth estimation algorithm, extract contour features using an edge detection and contour extraction algorithm, extract color features using a primary color extraction algorithm (such as a Kmeans clustering algorithm), and so on.
  • a depth estimation algorithm such as a Kmeans clustering algorithm
  • the visualization form of the depth feature can be shown in FIG14
  • the visualization form of the contour feature can be shown in FIG15
  • the visualization form of the color feature can be shown in FIG16.
  • FIG16 visualizes the color moment of the picture and intuitively describes the color distribution in the picture through a second-order rectangular form.
  • the storage form of the three features of depth feature, contour feature, and color feature can also be in a matrix form, similar to FIG12, except that the values in the matrix represent different meanings.
  • the matrix value of the depth feature represents the depth value of each pixel
  • the matrix value of the contour feature represents whether each element is a contour
  • the matrix value of the color feature represents the color of each element.
  • the calculation module can use weighted summation and other techniques to fuse these features to obtain a fused feature vector, and use the fused feature vector as one of the inputs of the specific artificial intelligence algorithm.
  • the calculation module can also map some editing types into numbers (such as "add” is mapped to 1, and "delete” is mapped to 2) as the second input of the specific artificial intelligence algorithm.
  • the calculation module can obtain the editing parameters corresponding to various editing types through the calculation of the specific artificial intelligence algorithm.
  • each training sample can include an input sample and an output sample, wherein the input sample includes the fused feature vector of the image area and the mapping ID of the editing type, and the output sample includes the editing parameters used when the editing type is applied to the image area. The more reasonable the training sample, the larger the sample set, and the higher the reliability of the trained model.
  • the computing module can also generate editing questions based on the image features of the editing area and the preset editing type.
  • the answer to the editing question is how to match the editing parameters of the editing type, and then input it into a large model such as chatGPT to obtain the answer.
  • chatGPT a large model
  • the chatGPT model can output an answer, such as: “Add 'clouds' to the sky", where 'clouds' is the answer to the question by the large model.
  • the computing module may further traverse the editing instructions in the recommendation pool, and compare the feature vectors corresponding to the traversed editing instructions with the fused feature vectors of the editing area. For details, please refer to S52 in Example 3, to find the editing instructions that can be reasonably matched with the editing area, and finally recommend the editing instructions.
  • the first editing instruction input by the user may be selected from the recommended editing instructions generated by the calculation module, for example, the user clicks on a recommended editing instruction to confirm inputting the recommended editing instruction as the first editing instruction, or the user drags a recommended editing instruction to the first editing area to confirm inputting the recommended editing instruction as the first editing instruction.
  • the example is only used to explain the embodiments of the present application, and may be different in actual applications. The embodiments of the present application do not limit the manner in which the user inputs the first editing instruction.
  • the human-computer interaction module may display one or more preset editing instructions, such as commonly used editing instructions or editing instructions saved in advance by the user.
  • the first editing instruction entered by the user may be selected from preset editing instructions displayed by the human-computer interaction module, for example, the user clicks on a preset editing instruction to confirm that the recommended editing instruction is entered as the first editing instruction, or the user drags a preset editing instruction to the first editing area to confirm that the recommended editing instruction is entered as the first editing instruction.
  • the examples are only used to explain the embodiments of the present application, and may be different in actual applications. The embodiments of the present application do not limit the manner in which the user enters the first editing instruction.
  • the above-described methods of prompting the user to input the editing instruction can also be implemented in combination, for example, the human-computer interaction module pops up an input box to prompt the user to input the editing instruction as in method 1, displays the recommended editing instruction as in method 2, and displays the threshold editing as in method 3.
  • the human-computer interaction module pops up an input box to prompt the user to input the editing instruction as in method 1, displays the recommended editing instruction as in method 2, and displays the threshold editing as in method 3.
  • the user can see prompts of multiple methods on the interface at the same time, and choose to input the editing instruction according to a certain prompt.
  • the human-computer interaction module can send the first editing instruction to the calculation module, and then, as described in S21, the calculation module can perform corresponding image editing processing on the first picture according to the first editing instruction, and return the first picture after the image editing processing to the human-computer interaction module as described in S22. In this way, as described in S23, the human-computer interaction module can display the first picture after the image editing processing.
  • the editing instruction may include the following content: the editing type and its corresponding editing parameters.
  • the editing type may be, for example, deletion, dragging, replacement, addition, color adjustment, etc.
  • the corresponding editing parameters may be, for example, replaceable content, drag target position, newly added content, color adjustment value, etc.
  • the editing area selected by the user is "sky” and the first editing instruction is “delete 'dark clouds'", then compared with the image before editing, the "dark clouds” in the "sky” of the first picture after image editing are deleted.
  • the editing area selected by the user is "sky” and the first editing instruction is "add 'flying geese'”, then compared with the image before editing, the "flying geese” in the "sky” of the first picture after image editing are added.
  • Figure 17 exemplifies the first picture after editing when the first editing instruction is "add 'soaring geese'”
  • Figure 18 exemplifies the first picture after editing when the first editing instruction is "adjust the color tone: reduce the brightness to 82%, the saturation to 75%, and the hue to red”.
  • the computing module can also obtain the image content to be added to the first picture according to the editing parameters in the first editing instruction, such as "soaring geese", and add the image content to the first picture to complete the image editing process corresponding to the first editing instruction (such as "add 'soaring geese'”).
  • the image content can be carried in a material picture, which can come from a network, a terminal device, or the material picture can be generated by the computing module using artificial intelligence.
  • Figure 19 exemplifies the material picture obtained according to the editing parameters ("soaring geese") of the editing instruction "add 'soaring geese'".
  • Embodiment 2 is an alternative to Embodiment 1 and also introduces the overall process of the image editing method.
  • the editing area can be determined based on the preprocessing information of the image instead of image segmentation technology, and repeated online calculations are not required.
  • the overall process of the picture editing method provided in the second embodiment may include:
  • the human-computer interaction module can detect the operation of the user selecting the editing area in the first picture, such as clicking on an object in the picture. In response to this, as described in S35, the human-computer interaction module can trigger the calculation module to obtain the preprocessing information of the first picture; in addition, as described in S36, the human-computer interaction module can also transmit the operation position of the user's operation on the first picture to the calculation module.
  • the calculation module can determine which area of the picture the editing area selected by the user is based on the preprocessing information, or further combined with the operation position, and inform the human-computer interaction module of the indication information (such as contour information, binary image, grayscale image) of the editing area selected by the user as described in S38. Then, as described in S39, the human-computer interaction module can distinguish and display the editing area in the first picture according to the contour information of the editing area selected by the user.
  • the indication information such as contour information, binary image, grayscale image
  • the user's operation of selecting the editing area in the first picture can be an operation of clicking or long pressing an object in the first picture.
  • the user selects "sky” as the editing area by clicking the object "sky”.
  • the way to distinguish and display the editing area in the first picture may include but is not limited to: highlighting the outline of the editing area, highlighting the entire editing area, or displaying a dotted box along the outline of the editing area, etc.
  • the preprocessing information may be used to indicate one or more regions.
  • the one or more regions refer to image regions in the first picture, and the image regions in the first picture may be divided into units of objects, and all pixels of an object constitute an image region.
  • the editing region may be determined from the one or more regions.
  • the one or more regions may be regions that the user prefers to edit, or regions that are recommended to edit, or regions that are allowed to edit, and so on.
  • the editing region may be quickly determined based on the preprocessing information without performing image segmentation processing on the first picture.
  • the calculation module may first read the preprocessing information:
  • the area is determined as the editing area.
  • the preprocessing information may only contain the coordinates of the contour points of a region.
  • the preprocessing information may be in the form of an array in Json format, where an array represents the contour of a region, and each element in the array is a tuple, where the first value of the tuple represents the x-coordinate value of a contour point of the region, and the second value of the tuple represents the y-coordinate value of the contour point. Therefore, the region enclosed by the contour recorded in the array can be determined, and the region can be determined as the editing region.
  • the preprocessing information is not limited to the Jason format and may also adopt other data formats.
  • the data content in the preprocessing information can also be a binary image, a grayscale image, etc.
  • the preprocessing information may also be a binary image containing only one region. In the binary image, only one region has a first value.
  • the binary image may be exemplified as shown in FIG22, wherein one element corresponds to one pixel, the coordinates of the element may represent the position of the pixel, the value of the element may represent the pixel category of the pixel, and is binarized, such as "1" may represent that the pixel at this position belongs to the "sky", and "0" may represent that the pixel at this position does not belong to the "sky”.
  • an area with a value of "1" (“sky” area, the first value is “1”) or an area with a value of "0" (non-"sky” area, the first value is “0”) may be determined as an editing area.
  • the preprocessing information may be a grayscale image, in which only one region may have a grayscale value of a specific grayscale value or be within a specific grayscale range. Therefore, the only region may be determined as the editing region based on the grayscale image.
  • the editing area can be further determined in combination with the action position of the second user operation on the first picture. Specifically, it can be determined which area of the picture the action position is located in, or which area is closer to, and the area is determined as the editing area. In other words, among the multiple areas indicated by the preprocessing information, the area where the action position is located can be determined as the editing area, or the area closest to the action position among the multiple areas can be determined as the area selected by the user. In addition, the area selected by the user can be determined as the editing area.
  • NMS non-maximum suppression
  • the preprocessing information may include the coordinates of the contour points of the multiple regions, wherein the coordinates of the contour of each region may be expressed as an array.
  • the preprocessing information may also be a binary image including multiple regions, wherein the values of multiple regions in the binary image are the first value (such as "1"). For example, “1" indicates that the pixel at this position belongs to "flower", and "0" indicates that the pixel at this position does not belong to "flower”.
  • the preprocessing information may also be a grayscale image, wherein the grayscale values of multiple regions in the grayscale image are specific grayscale values or are in a specific grayscale range, for example, there are multiple regions whose grayscale values are in the grayscale range of 0-50.
  • the preprocessing information is not limited to the coordinates, binary images, and grayscale images of the contour points of the region and can be used to directly indicate multiple regions.
  • the preprocessing information may also have other forms of data content, which is not limited in the embodiments of the present application.
  • the preprocessing information of the first image may not directly indicate one or more regions, but may include other data, such as multi-layer information, contour information, or depth information.
  • the calculation module may use the other data to determine one or more regions, and then determine the editing region from the one or more regions based on the method described in 1 or 2 above. That is, the preprocessing information of the first image may indirectly indicate one or more regions.
  • Methods for determining one or more regions indirectly indicated by the pre-processing information may include, but are not limited to:
  • the layer information of each layer may include the coordinates of the opaque pixels in the layer, then for each layer in the layer information, the opaque pixels in the layer that are connected together can be regarded as a region. Furthermore, the layer regions with more overlaps can be merged into one region, for example, multiple overlapping layer regions are merged using the NMS algorithm, and the edited region is determined using the merged one or more regions.
  • the contour information can record the pixels on the contour, then the useless internal contours in the contour information can be filtered out using dilation and erosion techniques, and one or more regions of the contour information that are completely closed can be obtained based on graph theory and other techniques.
  • dilation and erosion can be used to eliminate noise, segment independent image elements, and connect adjacent elements;
  • graph theory and other techniques can be used for connected domain analysis, which refers to finding independent connected domains in an image and marking them out.
  • the connected domain of an image refers to the region composed of pixels with the same pixel value and adjacent positions in the image. Generally, a connected domain contains only one pixel value. Therefore, in order to prevent the influence of pixel value fluctuations on the extraction of different connected domains, connected domain analysis often processes binarized images.
  • FIG23 exemplarily shows a plurality of regions obtained based on the contour information of the picture.
  • the depth information may record the depth values of multiple pixels, and the pixels may be divided into different regions according to the depth value of each pixel in the depth information. Specifically, pixels with the same or similar depth values may be divided into the same region.
  • FIG24 exemplarily shows a plurality of areas obtained based on the depth information of the picture, wherein the "castle” area 241 may be composed of some pixel points with the same or similar depth values, and the "tower” area 242 may be composed of some pixel points with the same or similar depth values.
  • similar depth values may mean that the difference in depth values does not exceed a specific value, such as 0.1.
  • the depth value of each pixel can be expressed in the range of 0 to 1, where 0 represents the depth value of the pixel farthest from the camera that took the first picture, and 1 represents the depth value of the pixel closest to the camera that took the first picture.
  • the depth value may also be expressed in other ways, which are not limited in the embodiment of the present application.
  • the preprocessing information may include multiple contents introduced in 3.1 to 3.3 above at the same time.
  • the preprocessing information may include layer information, depth information, and contour information at the same time.
  • the areas indicated by the preprocessing information are determined based on these three contents, they can be calibrated with each other to improve the accuracy of recognition. For example, if both the layer information and the depth information indicate the same area, the recognition of the area is often accurate; conversely, if there is a conflict between the areas indicated by the layer information and the depth information, it means that the area recognition based on the depth information or the layer information is inaccurate, and the algorithm can be optimized and re-recognized.
  • embodiment 2 can determine the editing area from one or more areas directly or indirectly indicated by the pre-processing information, without first using image segmentation processing to identify and divide the image area where each object in the first picture is located and then identify the editing area according to the user's operation position. Therefore, the area that the user wants to edit can be predicted more quickly, and image segmentation calculations can be avoided.
  • Embodiment 3 is a supplementary refinement of Embodiments 1 and 2.
  • a step of judging whether the editing instruction input by the user and the editing area selected by the user are reasonably matched can be added.
  • the editing area selected by the user and the input editing instruction do not match, not only will the user be prompted that the editing instruction cannot be executed, but the user can also be recommended to select a new editing area, or the user can be helped to modify the editing instruction.
  • Embodiment 3 solves the problem that the editing effect caused by the user's arbitrary input of editing instructions does not conform to common sense logic, thereby reducing the number of trial and error times when the user edits the picture.
  • the specific implementation of determining whether the first editing instruction and the editing area are reasonably matched may include the following process:
  • the editing types in the first editing instruction may be finite and enumerable. Different editing types may be mapped to different identification numbers (IDs) and may thus be represented by corresponding IDs.
  • the computing module may convert the IDs corresponding to the editing types using models such as deep learning algorithms to obtain feature vectors corresponding to the editing types. For example, the ID corresponding to "increase” is "001", which is converted into the following feature vector: [0.5250, 0.7937, 0.1356, 1.4893, -3.9651, 1.5068].
  • the editing parameters in the first editing instruction can also be mapped to an ID first and then converted into a feature vector.
  • the calculation module can perform word segmentation on the phrase representing the editing parameter, and then find the ID corresponding to each word segmentation result from the preset word list.
  • the editing parameter "quiet lake” can be segmented into: ["quiet", “quiet”, “of”, “lake”, “park”], and the ID array corresponding to the word segmentation result is: [40496,3152,2099,8024,3563,8024,40497,0,0,0].
  • 40496 and 40497 respectively represent the start and end of the description phrase of the editing parameter, and 0 represents supplement.
  • the length of the ID array that is, the number of elements it contains, can be preset to constrain the maximum length of the description phrase of the editing parameter. Then, the calculation module can generate a corresponding feature vector for each ID in the ID array through a model such as a deep learning algorithm.
  • the ID array in the previous example can be converted into the following array of feature vectors: [[0.8838,0.1570,0.5249,...,0.4278,0.1725,0.4225],[1.8143,-0.5514,0.0995,...,-4.7141,-1.3811,-1.1166],[0.0186,3.5949,1.1780,...,1.1433,2.7235,-0.5069],...,[1.1674,0.9497,1.8264,...,1.3671,0.5551,-0.4302] .5551,-0.4302]], where [0.8838,0.1570,0.5249,...,0.4278,0.1725,0.4225] represents the eigenvector of 40496, [1.8143,-0.5514,0.0995,...,-4.7141,-1.3811,-1.1166] represents the eigenvector of 3152, [0.0186,3.5949,1.1780,...,1.1433,2.7235,-0.5069] represents the eigenvector of 2099, ..., [1.1674,0.9497,1.8264,...,...
  • the deep learning algorithm model may be, for example, a word vector (Word2Vec) model, a Transformers model, etc.
  • the fused feature vector corresponding to the edited area can be expressed as a two-dimensional array.
  • the fused feature vector of the "sky” area is: [[0.2611, 1.8726, ..., -0.9721], ..., [1.6888, 2.6287, ..., -5.5910]].
  • the feature vector corresponding to a feature can be calculated by a deep learning algorithm model, and the deep learning algorithm model can be, for example, a convolutional neural network model (CNN).
  • CNN convolutional neural network model
  • the fusion of multiple features can be achieved through multiple CNN models and weighted summation.
  • the mask feature shown in Figure 13 is obtained by CNN model 1
  • the deep feature shown in Figure 14 is obtained by CNN model 2
  • the contour feature shown in Figure 15 is obtained by CNN model 3, etc.
  • the dimensions of the feature vectors of these features are the same length, so these feature vectors can be obtained by weighted summation to obtain a fused feature vector.
  • S52 Use models such as machine learning or deep learning algorithms to compare the feature vector corresponding to the first editing instruction with the fused feature vector of the editing area selected by the user, and determine whether it is reasonable to apply the first editing instruction to the editing area selected by the user.
  • the calculation module can use the feature vector corresponding to the editing type in the first editing instruction as one of the inputs of models such as machine learning or deep learning algorithms, use the feature vector corresponding to the editing parameters in the first editing instruction as the second input of models such as machine learning or deep learning algorithms, and use the fusion feature vector of the editing area as the third input of models such as machine learning or deep learning algorithms. Finally, the calculation module can obtain a judgment result on whether it is reasonable to apply the first editing instruction to the editing area selected by the user through calculations by models such as machine learning or deep learning algorithms.
  • the human-computer interaction module can output an error prompt, which can be a visual prompt displayed on the screen, a vibration prompt perceptible by touch, a voice prompt, and so on.
  • the human-computer interaction module may display an error prompt 311 to remind the user that adding “quiet lake” to “sky” is not in line with common sense and may not allow the execution of the editing instruction.
  • the calculation module can determine that the editing instruction is applicable to the editing area, and if the editing instruction becomes "replace with 'trees'", the calculation module can determine that the new editing instruction is not applicable to the editing area.
  • the error prompt 311 shown in FIG. 27 is also only an example. In actual applications, the error prompt may also be in other styles, such as flashing erroneous editing instructions or flashing editing areas. The embodiments of the present application do not limit this.
  • the first editing instruction can be modified according to the editing area selected by the user, such as modifying the editing type and/or editing parameters, so that the modified editing instruction adapts to the editing area selected by the user.
  • the other area refers to the area outside the editing area selected by the user in the first picture.
  • the calculation module can traverse the editing types and/or editing parameters in the recommendation pool, and use the method in S52 to compare the feature vectors corresponding to the traversed editing types and editing parameters with the fused feature vector of the editing area selected by the user to find the editing type and/or editing parameters that can reasonably match the area selected by the user, and then modify the editing instruction, and finally recommend the modified editing instruction.
  • the calculation module can determine that: the editing instruction is not applicable to the "sky” area 312, and modify the editing area, using the "forest” area 313 as the new editing area, and finally recommends the user to apply the editing instruction to the "forest” area 313.
  • the calculation module can determine that: the editing instruction is not applicable to the “sky”, and modify the editing instruction to “add ‘soaring geese’” as a new editing instruction, and finally suggest that the user apply the new editing instruction to the area “sky”.
  • Step 1 Determine the perspective relationship, ie, the front-to-back relationship, between the new object and the original object in the first image according to the image features of the first image and the editing parameters in the first editing instruction, which is reflected in the data as the different depth values.
  • the perspective relationship between the objects will be unreasonable, and the new object will not be reasonably integrated into the first picture, such as causing the original object in the editing area to be completely blocked by the new object.
  • the new object "Quiet Lake” shown in Figure 32 directly replaces the original objects “Tree” and “Valley” in the editing area 317 of the picture 315, it will cause the "Quiet Lake” to block the original object "Tree”, resulting in an unreasonable perspective relationship.
  • the embodiments of the present application may use artificial intelligence algorithms such as image semantic understanding to determine the perspective relationship between the new object and the original object, and based on this, correct the depth features of the image to present a reasonable perspective relationship between objects.
  • artificial intelligence algorithms such as image semantic understanding to determine the perspective relationship between the new object and the original object, and based on this, correct the depth features of the image to present a reasonable perspective relationship between objects.
  • the computing module can determine based on the masking features of the picture 315: the original objects in the editing area 317 are “tree” and “valley”, and the new object “quiet lake” will block the original objects "tree” and "valley”.
  • Step 2 According to the perspective relationship between the new object and the original object, and the depth value of the original object, the reference depth of the new object is determined, and then the depth feature of the new object is corrected using the reference depth.
  • the reference depth of the new object can be determined by interpolation or other methods.
  • the reference depth of the "quiet lake” can be determined to be 0.8 through interpolation algorithms, which is greater than 0.6 and less than 1.0.
  • the depth feature of the new object is corrected by using the baseline depth of the new object and the depth difference between various regions on the new object.
  • the average depth of the new object can be close to or equal to the reference depth, and the depth difference between each area remains unchanged.
  • closeness can mean that the difference between the average depth and the reference depth does not exceed a specific depth value, such as 0.05. In this way, it can be ensured that the new object as a whole forms a reasonable perspective relationship with the original object, and the perspective relationship between each element in the new object can be preserved.
  • Step 3 Replace the original depth features of the edited area with the corrected depth features of the new object to correct the depth features of the first image.
  • the essence of replacement can be: traverse each pixel area in the editing area, if the new object is in front of the original object in the pixel area, then use the depth data of the pixel area on the new object to replace the original depth data of the pixel area, so as to realize the perspective relationship of the new object in front of the original object; if the original object in the pixel area is in front of the new object, then keep the original depth data of the pixel area, so as to realize the perspective relationship of the original object in front of the new object.
  • the original depth data of a pixel area refers to the depth data of the pixel area on the original object.
  • the editing area can be divided into a first pixel area and a second pixel area, wherein the perspective relationship of the first pixel area is that the new object is in front of the original object, and the perspective relationship of the second pixel area is that the original object is in front of the new object.
  • the depth features of the first pixel area can be replaced with the depth features of the new object, and the depth features of the second pixel area can maintain the depth features of the original object.
  • the new object may also be entirely in front of the original object, excluding the second pixel area. In this case, the depth features of the entire editing area can be directly replaced with the depth features of the new object.
  • FIG. 34 exemplarily shows a reasonable perspective relationship formed between the new object “quiet lake” and the original objects “tree” and “valley” through correction of depth features.
  • the correction of the mask feature may not directly use the new object mask data to replace the original mask data of the edited area, but only use the new object mask data to replace the original mask data in the area where the depth data has been replaced. In this way, it can ensure that the correction of the mask feature is consistent with the correction of the depth feature, so that the corrected mask feature and depth feature both point to the same object, avoiding conflicts between the two, and ensuring that the semantics of the regenerated image are reasonable.
  • the depth data of pixel area 319 in the editing area 317 is not replaced, that is, the depth data of the original object “tree” is still used, so that the original object “tree” is in front of the new object “quiet lake” in the perspective relationship.
  • the mask data of the new object "quiet lake” (such as "8") is not used to replace the original mask data of area 319 (such as "5", that is, the mask data of "tree"), but the mask data of the original object "tree” is retained; otherwise, it will lead to a contradiction, that is: from the perspective of the depth feature, area 319 is the original object "tree", but from the perspective of the mask feature, area 319 is indeed the new object "quiet lake”.
  • the mask data of the pixel area on the new object can be used to replace the original mask data of the pixel area.
  • the original mask data of a pixel area refers to the mask data of the pixel area on the original object. Therefore, in actual applications, the area where the mask feature is replaced may only be part of the pixel area in the editing area, and the depth features of this part of the pixel area are replaced with the depth features of the new object.
  • FIG34 the area where the mask feature replacement occurs is smaller than the area of the new object “Quiet Lake”.
  • FIG35 simply shows a comparison between the two.
  • the correction of the contour feature may not directly use the contour data of the new object to replace the original contour data of the edited area, but only use the contour data of the new object to replace the original contour data in the area where the depth data has been replaced. In this way, it can ensure that the correction of the contour feature is consistent with the correction of the depth feature, so that the corrected contour feature and depth feature both point to the same object, avoiding conflicts between the two, and ensuring that the semantics of the regenerated image are reasonable.
  • the contour data of the pixel area on the new object can be used to replace the original contour data of the pixel area.
  • the original contour data of a pixel area refers to the contour data of the pixel area on the original object.
  • FIG. 35 can also be used to illustrate the comparison between the two.
  • the correction of other features is also considered, that is: in the editing area, if the depth data of a pixel area belongs to the depth data on the new object, and is no longer the original depth data of the pixel area, then the color features of the pixel area on the new object and other features can be used to replace the original color features and other features of the pixel area.
  • an artificial intelligence algorithm model can be used to regenerate images, for example, by combining a Stable Diffusion model with a ControlNet model.
  • Stable Diffusion can be used to generate diffusion models for images based on text or images.
  • more conditions such as depth features, mask features, contour features, etc.
  • the regenerated image can be shown in the right picture of Figure 34, which realizes the reasonable insertion of new objects (such as "quiet lake”) between the original objects (such as “tree” and “valley”) in the first image, presenting a reasonable perspective relationship between objects.
  • new objects such as "quiet lake”
  • new objects such as "tree” and "valley
  • An embodiment of the present application also provides a computer-readable storage medium, which stores a computer program.
  • the computer program When executed by a processor, it can implement the steps performed by the human-computer interaction module in the above-mentioned method embodiments, or the steps performed by the human-computer interaction module and the computing module.
  • the embodiment of the present application further provides a computer-readable storage medium, which stores a computer program.
  • a computer program When the computer program is executed by a processor, the steps performed by the computing module in the above-mentioned method embodiments can be implemented.
  • the embodiment of the present application also provides a computer program product.
  • the terminal device can implement the steps performed by the human-computer interaction module in the above-mentioned various method embodiments, or the steps performed by the human-computer interaction module and the computing module.
  • the embodiment of the present application also provides a computer program product.
  • the server can implement the steps performed by the computing module in the above-mentioned various method embodiments.
  • the embodiment of the present application also provides a chip system, which includes a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory to implement the steps performed by the human-computer interaction module in any method embodiment of the present application, or the steps performed by the human-computer interaction module and the computing module.
  • the chip system can be a single chip, or a chip module composed of multiple chips.
  • the embodiment of the present application also provides a chip system, which includes a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory to implement the steps performed by the computing module in any method embodiment of the present application.
  • the chip system can be a single chip or a chip module composed of multiple chips.
  • UI user interface
  • interface for short
  • the term "user interface (UI), or interface for short) in the specification and drawings of this application refers to the medium interface for interaction and information exchange between an application or operating system and a user, which realizes the conversion between the internal form of information and the form acceptable to the user.
  • the user interface of an application is a source code written in a specific computer language such as Java and extensible markup language (XML).
  • the interface source code is parsed and rendered on the terminal device, and finally presented as content that the user can recognize, such as pictures, text, buttons and other controls.
  • Controls also known as widgets, are basic elements of the user interface. Typical controls include toolbars, menu bars, text boxes, buttons, scroll bars, pictures and text.
  • the properties and contents of controls in the interface are defined by tags or nodes.
  • XML specifies the controls contained in the interface through nodes such as ⁇ Textview>, ⁇ ImgView>, and ⁇ VideoView>.
  • a node corresponds to a control or attribute in the interface, and the node is presented as user-visible content after parsing and rendering.
  • many applications such as hybrid applications, usually include web pages in their interfaces.
  • a web page also known as a page, can be understood as a special control embedded in the application interface.
  • a web page is a source code written in a specific computer language, such as hypertext markup language (HTML), cascading style sheets (CSS), JavaScript (JS), etc.
  • the web page source code can be loaded and displayed as user-recognizable content by a browser or a web page display component with similar functions to a browser.
  • the specific content contained in a web page is also defined by tags or nodes in the web page source code.
  • HTML defines the elements and attributes of a web page through ⁇ p>, ⁇ img>, ⁇ video>, and ⁇ canvas>.
  • GUI graphical user interface
  • It can be an icon, window, control or other interface element displayed on the display screen of an electronic device, where a control can include icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, widgets and other visual interface elements.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center.
  • the computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that contains one or more available media integration.
  • the available medium can be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk), etc.
  • the processes can be completed by computer programs to instruct related hardware, and the programs can be stored in computer-readable storage media.
  • the programs can include the processes of the above-mentioned method embodiments.
  • the aforementioned storage media include: ROM or random access memory RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Provided in the embodiments of the present application are a picture editing method and system, and a related device. On the basis of the understanding of semantic content of a picture and various features, such as a mask feature, a depth feature, a contour feature and a color feature, of an editing area selected by a user, an editing type (such as deletion, dragging, replacement, addition and color adjustment) and editing parameters (such as replaceable content, the target position of the dragging, newly added content and a color adjustment value) are recommended to the user, thereby providing editing ideas for the user. In addition, it is also possible to determine whether an editing instruction is rationally matched with the editing area selected by the user, and not only can the user be prompted that the editing instruction cannot be executed, but the user can also be recommended to select a new editing area, or the user can be assisted in modifying the editing instruction, such that the problem of an editing effect failing to conform to common sense logic due to random input of editing instructions by the user is solved, thereby making it possible to reduce the number of trials and errors when the user edits pictures, and thus significantly improving the image-making efficiency.

Description

图片编辑方法、相关设备及系统Image editing method, related equipment and system

本申请要求在2023年12月22日提交中国国家知识产权局、申请号为202311795993.1的中国专利申请的优先权,发明名称为“图片编辑方法、相关设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the State Intellectual Property Office of China on December 22, 2023, with application number 202311795993.1, and priority to the Chinese patent application with the invention name “Picture Editing Method, Related Equipment and System”, all contents of which are incorporated by reference in this application.

技术领域Technical Field

本申请涉及电子技术领域,尤其涉及图片编辑方法、相关设备及系统。The present application relates to the field of electronic technology, and in particular to a picture editing method, related equipment and system.

背景技术Background Art

目前,许多专业图片编辑工具或者应用程序都可以辅助用户对图片的局部区域进行编辑加工。编辑图片时,用户通常需要先手动利用套索、框选等功能在原图中对想要修改的区域进行标记选择,然后再进行抠图、重绘、上色或添加新元素等操作,最终得到编辑后的图片。整个流程对专业知识和专业工具的使用有较高的要求,并且需要投入大量的人力成本才能完成质量较高的编辑任务,不适合普通用户。Currently, many professional image editing tools or applications can assist users in editing and processing local areas of images. When editing images, users usually need to manually use functions such as lasso and box selection to mark and select the area they want to modify in the original image, and then perform operations such as cutting out, redrawing, coloring, or adding new elements to finally obtain the edited image. The entire process has high requirements for professional knowledge and the use of professional tools, and requires a lot of manpower costs to complete high-quality editing tasks, which is not suitable for ordinary users.

发明内容Summary of the invention

本申请实施例提供了图片编辑方法、相关设备及系统,可降低图片编辑门槛,支持用户自定义编辑指令,并且能够理解用户操作意图,为用户提供编辑思路,避免用户经历大量试错,显著提高了出图效率。The embodiments of the present application provide a picture editing method, related equipment and system, which can lower the threshold for picture editing, support user-defined editing instructions, and understand the user's operating intentions, provide users with editing ideas, avoid users from experiencing a lot of trial and error, and significantly improve the efficiency of picture output.

第一方面,本申请实施例提供了一种图片编辑方法,该方法可应用于图片编辑系统,该方法可以包括:显示第一图片,检测到作用于第一图片上的用于选择编辑区域的用户操作,根据用户操作在第一图片上的操作位置从第一图片中确定出编辑区域,并在第一图片中区别显示编辑区域;然后,根据编辑区域的图像特征生成推荐编辑指令,并显示推荐编辑指令;检测到用户针对编辑区域输入第一编辑指令,第一编辑指令包括:推荐编辑指令;然后,根据第一编辑指令对第一图片进行图像编辑处理;最后,展示图像编辑处理后的第一图片。In a first aspect, an embodiment of the present application provides a picture editing method, which can be applied to a picture editing system, and the method may include: displaying a first picture, detecting a user operation on the first picture for selecting an editing area, determining an editing area from the first picture according to the operation position of the user operation on the first picture, and distinguishing and displaying the editing area in the first picture; then, generating recommended editing instructions according to image features of the editing area, and displaying the recommended editing instructions; detecting that a user inputs a first editing instruction for the editing area, the first editing instruction including: a recommended editing instruction; then, performing image editing processing on the first picture according to the first editing instruction; and finally, displaying the first picture after the image editing processing.

第一方面中,第一图片可以是用户通过相册(又称图库)、修图应用程序、绘图设计程序等用于浏览、管理或处理图片的应用程序打开的图片。第一图片可以存储于手机等终端设备上,也可以存储于网络上。In the first aspect, the first picture can be a picture opened by a user through an application for browsing, managing or processing pictures, such as a photo album (also known as a gallery), a photo editing application, a drawing design program, etc. The first picture can be stored on a terminal device such as a mobile phone, or stored on a network.

第一方面提供的方法基于对原图语义内容的理解,根据用户选择的编辑区域的图像特征为用户推荐编辑指令,为用户提供编辑思路,不仅降低了使用复杂程度,也保证了编辑后图片的内容合理性,避免用户经历大量试错,显著提高了出图效率。The method provided in the first aspect is based on the understanding of the semantic content of the original image. It recommends editing instructions to the user according to the image features of the editing area selected by the user, and provides the user with editing ideas. This not only reduces the complexity of use, but also ensures the rationality of the content of the edited image, avoids users from experiencing a lot of trial and error, and significantly improves the efficiency of image output.

结合第一方面,在一些实施例中,根据编辑区域的图像特征生成推荐编辑指令,具体可以包括:将编辑区域的多种图像特征的融合特征向量作为第一人工智能算法的输入之一;将一种或多种预设编辑类型作为第一人工智能算法的输入之二;通过第一人工智能算法的运算得到推荐编辑指令,推荐编辑指令包括预设编辑类型对应的编辑参数;其中,多种图像特征可以包括蒙层特征、深度特征、轮廓特征、颜色特征中的多项。In combination with the first aspect, in some embodiments, generating recommended editing instructions based on image features of the editing area may specifically include: using a fused feature vector of multiple image features of the editing area as one of the inputs of the first artificial intelligence algorithm; using one or more preset editing types as the second input of the first artificial intelligence algorithm; obtaining recommended editing instructions through the operation of the first artificial intelligence algorithm, the recommended editing instructions including editing parameters corresponding to the preset editing types; wherein the multiple image features may include multiple items of mask features, depth features, contour features, and color features.

结合第一方面,在一些实施例中,预设编辑类型可以包括以下一项或多项:删除、拖动、替换、增加,或者颜色调节。删除对应的编辑参数可包括删除的内容,替换对应的编辑参数可包括可替换的内容,拖动对应的编辑参数可包括拖动的目标位置、增加对应的编辑参数可包括新增的内容、颜色调节对应的编辑参数可包括颜色调节数值等。In conjunction with the first aspect, in some embodiments, the preset editing type may include one or more of the following: delete, drag, replace, add, or color adjustment. The editing parameter corresponding to the deletion may include the deleted content, the editing parameter corresponding to the replacement may include the replaceable content, the editing parameter corresponding to the drag may include the drag target position, the editing parameter corresponding to the addition may include the newly added content, the editing parameter corresponding to the color adjustment may include the color adjustment value, etc.

结合第一方面,在一些实施例中,检测到用户针对编辑区域输入第一编辑指令,具体可以包括:检测到用户选择输入推荐编辑指令。这样用户选择的推荐编辑指令被确定为第一编辑指令。In conjunction with the first aspect, in some embodiments, detecting that the user inputs the first editing instruction for the editing area may specifically include: detecting that the user selects to input a recommended editing instruction, so that the recommended editing instruction selected by the user is determined as the first editing instruction.

结合第一方面,在一些实施例中,在识别出编辑区域之后,不限于显示推荐编辑指令,图片编辑系统还可以通过下述方式提示用户输入编辑指令:显示第一输入框,第一输入框可用于接收语音或文本编辑指令;第一编辑指令还包括通过输入框输入的语音或文本编辑指令。当采用这种方式提示用户输入编辑指令时,检测到用户针对编辑区域输入第一编辑指令,具体可以包括:检测到用户在第一输入框中输入的语音或文本指令,语音或文本指令被确定为第一编辑指令。也即,第一编辑指令可以包括用户在输入框中输入文本指令或按住语音键输入语音指令。In combination with the first aspect, in some embodiments, after identifying the editing area, the picture editing system is not limited to displaying recommended editing instructions, and can also prompt the user to enter editing instructions in the following manner: display a first input box, the first input box can be used to receive voice or text editing instructions; the first editing instructions also include voice or text editing instructions entered through the input box. When prompting the user to enter editing instructions in this manner, detecting that the user enters the first editing instruction for the editing area can specifically include: detecting the voice or text instruction entered by the user in the first input box, and the voice or text instruction being determined as the first editing instruction. That is, the first editing instruction can include the user entering a text instruction in the input box or pressing the voice key to enter a voice instruction.

结合第一方面,在一些实施例中,在识别出编辑区域之后,不限于显示推荐编辑指令,图片编辑系统还可以通过下述方式提示用户输入编辑指令:显示一个或多个预置编辑指令,如常用编辑指令或者用户提前保存的编辑指令。当采用这种方式提示用户输入编辑指令时,检测到用户针对编辑区域输入第一编辑指令,具体可以包括:检测到用户选择输入预置编辑指令,语音或文本指令被确定为第一编辑指令。In conjunction with the first aspect, in some embodiments, after identifying the editing area, the image editing system is not limited to displaying recommended editing instructions, and can also prompt the user to enter editing instructions in the following manner: display one or more preset editing instructions, such as commonly used editing instructions or editing instructions saved in advance by the user. When prompting the user to enter editing instructions in this manner, detecting that the user enters a first editing instruction for the editing area can specifically include: detecting that the user selects to enter a preset editing instruction, and the voice or text instruction is determined to be the first editing instruction.

结合第一方面,在一些实施例中,用于选择编辑区域的用户操作可以包括:在第一图片中选择第一物体的用户操作,用户操作在第一图片上的操作位置落在第一图片中的第一物体上,第一物体所在的图像区域为编辑区域;第一物体所在的图像区域是对第一图片进行图像分割处理确定出的。In combination with the first aspect, in some embodiments, the user operation for selecting an editing area may include: a user operation of selecting a first object in a first picture, the operation position of the user operation on the first picture falls on the first object in the first picture, and the image area where the first object is located is the editing area; the image area where the first object is located is determined by performing image segmentation processing on the first picture.

结合第一方面,在一些实施例中,用于选择编辑区域的用户操作可以包括:在第一图片中绘制编辑区域的用户操作。In combination with the first aspect, in some embodiments, the user operation for selecting the editing area may include: a user operation of drawing the editing area in the first picture.

结合第一方面,在一些实施例中,在第一图片中区别显示编辑区域可以包括以下一项或多项方式:高亮编辑区域的轮廓、高亮整个编辑区域,或沿着编辑区域的轮廓显示虚线框。In combination with the first aspect, in some embodiments, distinctively displaying the editing area in the first picture may include one or more of the following methods: highlighting the outline of the editing area, highlighting the entire editing area, or displaying a dotted frame along the outline of the editing area.

结合第一方面,在一些实施例中,第一方面的方法还可以包括:获取第一图片的预处理信息,并从预处理信息指示的区域中确定出编辑区域。这样,编辑区域的确定可以基于图片的预处理信息,而不再基于图像分割技术,不需要重复在线计算。In conjunction with the first aspect, in some embodiments, the method of the first aspect may further include: obtaining preprocessing information of the first image, and determining the editing area from the area indicated by the preprocessing information. In this way, the editing area can be determined based on the preprocessing information of the image, rather than based on image segmentation technology, and repeated online calculations are not required.

结合第一方面,在一些实施例中,预处理信息可以包括多个区域的指示信息,其中,预处理信息包括多个区域各自的轮廓点的坐标、二值图、灰度图,二值图中多个区域的取值为第一值,灰度图中多个区域的灰度值为第一灰度值或第一灰度范围。这样,从预处理信息指示的区域中确定出编辑区域,具体可以包括:将多个区域中用户操作的操作位置所处的区域确定为编辑区域,或者将多个区域中距离作用位置最近的区域确定为编辑区域。In combination with the first aspect, in some embodiments, the preprocessing information may include indication information of multiple regions, wherein the preprocessing information includes coordinates of contour points of each of the multiple regions, a binary image, and a grayscale image, the values of the multiple regions in the binary image are first values, and the grayscale values of the multiple regions in the grayscale image are first grayscale values or first grayscale ranges. In this way, determining the editing region from the region indicated by the preprocessing information may specifically include: determining the region where the operation position of the user operation is located in the multiple regions as the editing region, or determining the region closest to the operation position in the multiple regions as the editing region.

结合第一方面,在一些实施例中,预处理信息可以仅包含一个区域的指示信息,如一个区域的轮廓点的坐标,指示一个区域的二值图、灰度图等。这样,从预处理信息指示的区域中确定出编辑区域,具体可以包括:直接将该一个区域确定为编辑区域。In conjunction with the first aspect, in some embodiments, the preprocessing information may only include indication information of a region, such as coordinates of contour points of a region, a binary image or a grayscale image indicating a region, etc. In this way, determining the editing region from the region indicated by the preprocessing information may specifically include: directly determining the region as the editing region.

结合第一方面,在一些实施例中,第一图片的预处理信息也可以不直接指示一个或多个区域,而包含其他数据,例如多图层信息、或轮廓信息、或深度信息。这样,可以先利用该其他数据确定出一个或多个区域,然后再从这一个或多个区域中确定出编辑区域。In conjunction with the first aspect, in some embodiments, the preprocessing information of the first image may not directly indicate one or more regions, but may include other data, such as multi-layer information, or contour information, or depth information. In this way, the other data may be used to first determine one or more regions, and then the editing region may be determined from the one or more regions.

确定预处理信息间接指示的一个或多个区域的方法可包括但不限于:Methods for determining one or more regions indirectly indicated by the pre-processing information may include, but are not limited to:

若预处理信息包含了多个图层的图层信息,每个图层的图层信息包括图层中不透明像素的坐标,那么:在从预处理信息指示的区域中确定出编辑区域之前,可以将每一个图层中连成一片的不透明像素确定成预处理信息指示的一个区域。If the preprocessing information includes layer information of multiple layers, and the layer information of each layer includes the coordinates of opaque pixels in the layer, then: before determining the editing area from the area indicated by the preprocessing information, the connected opaque pixels in each layer can be determined as an area indicated by the preprocessing information.

若预处理信息包含轮廓信息,那么:在从预处理信息指示的区域中确定出编辑区域之前,可以将轮廓信息指示的轮廓所包围形成的区域确定成预处理信息指示的区域。If the preprocessing information includes contour information, then: before determining the editing area from the area indicated by the preprocessing information, the area surrounded by the contour indicated by the contour information can be determined as the area indicated by the preprocessing information.

若预处理信息包含深度信息,那么:在从预处理信息指示的区域中确定出编辑区域之前,可以根据深度信息,将深度值相同或相近的像素确定成预处理信息指示的一个区域。If the preprocessing information includes depth information, then: before determining the editing area from the area indicated by the preprocessing information, pixels with the same or similar depth values may be determined as an area indicated by the preprocessing information based on the depth information.

结合第一方面,在一些实施例中,第一方面的方法还可以包括:在根据第一编辑指令对第一图片进行图像编辑处理之前,若判断出第一编辑指令应用于编辑区域不合理,则重新推荐编辑指令或重新推荐编辑区域。这样,解决了用户随意输入编辑指令所导致的编辑效果不符合常识逻辑的问题,从而可以减少用户编辑图片时的试错次数。In conjunction with the first aspect, in some embodiments, the method of the first aspect may further include: before performing image editing processing on the first picture according to the first editing instruction, if it is determined that the first editing instruction is not reasonable to apply to the editing area, re-recommending the editing instruction or re-recommending the editing area. In this way, the problem that the editing effect that does not conform to common sense logic is caused by the user's random input of editing instructions is solved, thereby reducing the number of trial and error times when the user edits the picture.

结合第一方面,在一些实施例中,可以通过比对第一编辑指令对应的特征向量与编辑区域的各个图像特征的融合特征向量判断第一编辑指令应用于编辑区域是否合理。In combination with the first aspect, in some embodiments, it can be determined whether the first editing instruction is reasonable to be applied to the editing area by comparing the feature vector corresponding to the first editing instruction with the fused feature vector of each image feature of the editing area.

结合第一方面,在一些实施例中,重新推荐编辑区域,具体可以包括:遍历第一图片中编辑区域之外的区域,比对所遍历的区域的融合特征向量与第一编辑指令对应的特征向量,找到能够合理搭配第一编辑指令的区域,并将找到的区域重新推荐为编辑区域。In combination with the first aspect, in some embodiments, re-recommending the editing area may specifically include: traversing the area outside the editing area in the first image, comparing the fused feature vector of the traversed area with the feature vector corresponding to the first editing instruction, finding an area that can be reasonably matched with the first editing instruction, and re-recommending the found area as the editing area.

结合第一方面,在一些实施例中,重新推荐编辑指令,具体可以包括:遍历推荐池中的编辑类型和/或编辑参数,比对所遍历的编辑类型和/或编辑参数对应的特征向量与编辑区域的融合特征向量,找到能够合理搭配编辑区域的编辑类型和/或编辑参数,并根据找到的编辑类型和/或编辑参数修改第一编辑指令,重新推荐修改后的第一编辑指令。In combination with the first aspect, in some embodiments, re-recommending editing instructions may specifically include: traversing the editing types and/or editing parameters in the recommendation pool, comparing the feature vectors corresponding to the traversed editing types and/or editing parameters with the fused feature vector of the editing area, finding the editing types and/or editing parameters that can reasonably match the editing area, and modifying the first editing instruction according to the found editing type and/or editing parameters, and re-recommending the modified first editing instruction.

结合第一方面,在一些实施例中,用户针对编辑区域输入的编辑指令可以涉及新增物体,涉及新增物体的编辑指令可包括:增加、替换、拖动等类型的编辑指令。其中,替换相当于从原图中删除一个物体之后,再新增另一个物体;拖动相当于从原图中的某个位置删除一个物体,再将该物体增加到另一个位置。也即,编辑指令涉及新增物体可以是指,编辑指令对应的编辑处理包括:增加新物体到编辑区域。In conjunction with the first aspect, in some embodiments, the editing instructions input by the user for the editing area may involve adding a new object, and the editing instructions involving adding a new object may include: adding, replacing, dragging and other types of editing instructions. Among them, replacing is equivalent to deleting an object from the original image and then adding another object; dragging is equivalent to deleting an object from a certain position in the original image and then adding the object to another position. That is, the editing instruction involving adding a new object may mean that the editing process corresponding to the editing instruction includes: adding a new object to the editing area.

经历这样的图像编辑处理后,在第一图片中编辑区域,第一像素区域的深度特征被替换成了新物体的深度特征,第二像素区域的深度特征保持为原物体的深度特征,其中,第一像素区域的透视关系为新物体在原物体前面,第二像素区域的透视关系为原物体在新物体前面。After such image editing processing, in the edited area in the first picture, the depth features of the first pixel area are replaced by the depth features of the new object, and the depth features of the second pixel area remain the depth features of the original object, wherein the perspective relationship of the first pixel area is that the new object is in front of the original object, and the perspective relationship of the second pixel area is that the original object is in front of the new object.

结合第一方面,在一些实施例中,根据第一编辑指令对第一图片进行图像编辑处理,可以包括:对第一图片的图像特征进行修正,利用修正后的第一图片的图像特征重新生成第一图片。In combination with the first aspect, in some embodiments, performing image editing processing on the first picture according to the first editing instruction may include: correcting image features of the first picture, and regenerating the first picture using the corrected image features of the first picture.

结合第一方面,在一些实施例中,图像特征可以包括深度特征。对第一图片的图像特征进行修正,具体可以包括:根据第一图片的图像特征以及第一编辑指令中的编辑参数确定新物体与原物体之间的透视关系;根据新物体与原物体之间的透视关系,以及原物体的深度值,确定出新物体的基准深度,然后利用基准深度修正新物体的深度特征;利用新物体修正后的深度特征替换编辑区域的原深度特征;其中,新物体被修正深度特征后,新物体的平均深度接近或等于基准深度,且新物体上的各个区域之间的深度差保留不变。In combination with the first aspect, in some embodiments, the image features may include depth features. Correcting the image features of the first image may specifically include: determining the perspective relationship between the new object and the original object based on the image features of the first image and the editing parameters in the first editing instruction; determining the reference depth of the new object based on the perspective relationship between the new object and the original object, and the depth value of the original object, and then correcting the depth features of the new object using the reference depth; replacing the original depth features of the editing area with the corrected depth features of the new object; wherein, after the depth features of the new object are corrected, the average depth of the new object is close to or equal to the reference depth, and the depth difference between various areas on the new object remains unchanged.

结合第一方面,在一些实施例中,图像特征还可以包括第一图像特征,第一图像特征是深度特征之外的图像特征,可以包括以下一项或多项:蒙层特征、轮廓特征、颜色特征。In combination with the first aspect, in some embodiments, the image feature may further include a first image feature, which is an image feature other than a depth feature and may include one or more of the following: a mask feature, a contour feature, and a color feature.

对第一图片的图像特征进行修正,还可以包括:在编辑区域内,如果一个像素区域的深度特征被替换成新物体修正后的深度特征,则利用新物体上像素区域的第一图像特征替换编辑区域中像素区域原来的第一图像特征。Correcting the image features of the first picture may also include: in the editing area, if the depth feature of a pixel area is replaced with the corrected depth feature of the new object, then using the first image feature of the pixel area on the new object to replace the original first image feature of the pixel area in the editing area.

第二方面,本申请实施例提供了一种终端设备,可包括:人机交互模块,处理器以及存储器,其中,人机交互模块与处理器耦合,存储器与处理器耦合;人机交互模块可以包括触控屏等输入输出部件;其中,存储器可用于存储计算机程序代码,计算机程序代码可包括计算机指令,当处理器执行计算机指令时,使得终端设备执行前述第一方面的任一种或多种实施例描述的方法。In the second aspect, an embodiment of the present application provides a terminal device, which may include: a human-computer interaction module, a processor and a memory, wherein the human-computer interaction module is coupled to the processor, and the memory is coupled to the processor; the human-computer interaction module may include input and output components such as a touch screen; wherein the memory may be used to store computer program code, and the computer program code may include computer instructions, and when the processor executes the computer instructions, the terminal device executes the method described in any one or more embodiments of the first aspect mentioned above.

第二方面提供的终端设备具备强大的计算能力时,复杂的计算可直接在终端设备的计算模块(包括处理器)上完成。当终端设备具备强大计算能力时,后续实施例中提及的人机交互模块和计算模块都可以部署在终端设备上,二者所执行的步骤即终端设备所执行的步骤,二者之间的通信或数据交互属于设备内通信。When the terminal device provided in the second aspect has powerful computing capabilities, complex calculations can be performed directly on the computing module (including the processor) of the terminal device. When the terminal device has powerful computing capabilities, the human-computer interaction module and the computing module mentioned in the subsequent embodiments can be deployed on the terminal device, and the steps performed by the two are the steps performed by the terminal device, and the communication or data interaction between the two belongs to intra-device communication.

第三方面,本申请实施例提供了一种计算机可读存储介质,包括指令,其特征在于,当指令在终端设备上运行时,使得终端设备执行前述第一方面的任一种或多种实施例描述的方法。In a third aspect, an embodiment of the present application provides a computer-readable storage medium, comprising instructions, characterized in that when the instructions are executed on a terminal device, the terminal device executes the method described in any one or more embodiments of the first aspect above.

第四方面,本申请实施例提供了一种图片编辑方法,该方法可应用于人机交互模块,人机交互模块包括于图片编辑系统,图片编辑系统还包括:计算模块。In a fourth aspect, an embodiment of the present application provides a method for editing an image, which can be applied to a human-computer interaction module. The human-computer interaction module is included in an image editing system, and the image editing system also includes: a computing module.

该方法可以包括:人机交互模块显示第一图片,检测到作用于第一图片上的用于选择编辑区域的用户操作,并在第一图片中区别显示编辑区域;然后,人机交互模块接收计算模块发送的推荐编辑指令,并显示推荐编辑指令,推荐编辑指令是计算模块根据编辑区域的图像特征生成的;人机交互模块检测到用户针对编辑区域输入第一编辑指令,第一编辑指令包括:推荐编辑指令;最后,人机交互模块接收计算模块发送的图像编辑处理后的第一图片,并展示图像编辑处理后的第一图片,图像编辑处理是计算模块根据第一编辑指令执行的。The method may include: a human-computer interaction module displays a first image, detects a user operation on the first image for selecting an editing area, and displays the editing area in the first image; then, the human-computer interaction module receives a recommended editing instruction sent by a computing module, and displays the recommended editing instruction, which is generated by the computing module according to image features of the editing area; the human-computer interaction module detects that a user inputs a first editing instruction for the editing area, and the first editing instruction includes: a recommended editing instruction; finally, the human-computer interaction module receives the first image after image editing processing sent by the computing module, and displays the first image after image editing processing, and the image editing processing is performed by the computing module according to the first editing instruction.

第四方面中,第一图片可以是用户通过相册(又称图库)、修图应用程序、绘图设计程序等用于浏览、管理或处理图片的应用程序打开的图片。第一图片可以存储于手机等终端设备上,也可以存储于网络上。第四方面的一些技术细节可以参考第一方面的任一种或多种实施例。In the fourth aspect, the first picture may be a picture opened by a user through an application for browsing, managing or processing pictures, such as a photo album (also known as a gallery), a photo editing application, a drawing design program, etc. The first picture may be stored on a terminal device such as a mobile phone, or may be stored on a network. Some technical details of the fourth aspect may refer to any one or more embodiments of the first aspect.

结合第四方面,在一些实施例中,检测到用户针对编辑区域输入第一编辑指令,具体可以包括:检测到用户选择输入推荐编辑指令。这样用户选择的推荐编辑指令被确定为第一编辑指令。In conjunction with the fourth aspect, in some embodiments, detecting that the user inputs the first editing instruction for the editing area may specifically include: detecting that the user selects to input a recommended editing instruction, so that the recommended editing instruction selected by the user is determined as the first editing instruction.

结合第四方面,在一些实施例中,在识别出编辑区域之后,不限于显示推荐编辑指令,人机交互模块还可以通过下述方式提示用户输入编辑指令:显示第一输入框,第一输入框可用于接收语音或文本编辑指令;第一编辑指令还包括通过输入框输入的语音或文本编辑指令。当采用这种方式提示用户输入编辑指令时,检测到用户针对编辑区域输入第一编辑指令,具体可以包括:检测到用户在第一输入框中输入的语音或文本指令,语音或文本指令被确定为第一编辑指令。也即,第一编辑指令可以包括用户在输入框中输入文本指令或按住语音键输入语音指令。In conjunction with the fourth aspect, in some embodiments, after identifying the editing area, the human-computer interaction module is not limited to displaying recommended editing instructions, and can also prompt the user to enter editing instructions in the following manner: display a first input box, the first input box can be used to receive voice or text editing instructions; the first editing instruction also includes a voice or text editing instruction entered through the input box. When prompting the user to enter an editing instruction in this manner, it is detected that the user enters the first editing instruction for the editing area, which can specifically include: detecting the voice or text instruction entered by the user in the first input box, and the voice or text instruction is determined to be the first editing instruction. That is, the first editing instruction can include the user entering a text instruction in the input box or pressing the voice key to enter a voice instruction.

结合第四方面,在一些实施例中,在识别出编辑区域之后,不限于显示推荐编辑指令,人机交互模块还可以通过下述方式提示用户输入编辑指令:显示一个或多个预置编辑指令,如常用编辑指令或者用户提前保存的编辑指令。当采用这种方式提示用户输入编辑指令时,检测到用户针对编辑区域输入第一编辑指令,具体可以包括:检测到用户选择输入预置编辑指令,语音或文本指令被确定为第一编辑指令。In conjunction with the fourth aspect, in some embodiments, after identifying the editing area, in addition to displaying recommended editing instructions, the human-computer interaction module may also prompt the user to enter the editing instruction in the following manner: display one or more preset editing instructions, such as commonly used editing instructions or editing instructions saved in advance by the user. When prompting the user to enter the editing instruction in this manner, detecting that the user enters the first editing instruction for the editing area may specifically include: detecting that the user selects to enter the preset editing instruction, and the voice or text instruction is determined as the first editing instruction.

结合第四方面,在一些实施例中,用于选择编辑区域的用户操作可以包括:在第一图片中选择第一物体的用户操作,用户操作在第一图片上的操作位置落在第一图片中的第一物体上,第一物体所在的图像区域为编辑区域;第一物体所在的图像区域是对第一图片进行图像分割处理确定出的。In combination with the fourth aspect, in some embodiments, the user operation for selecting an editing area may include: a user operation of selecting a first object in a first picture, the operation position of the user operation on the first picture falls on the first object in the first picture, and the image area where the first object is located is the editing area; the image area where the first object is located is determined by performing image segmentation processing on the first picture.

结合第四方面,在一些实施例中,用于选择编辑区域的用户操作可以包括:在第一图片中绘制编辑区域的用户操作。In conjunction with the fourth aspect, in some embodiments, the user operation for selecting the editing area may include: a user operation for drawing the editing area in the first picture.

结合第四方面,在一些实施例中,在第一图片中区别显示编辑区域可以包括以下一项或多项方式:高亮编辑区域的轮廓、高亮整个编辑区域,或沿着编辑区域的轮廓显示虚线框。In combination with the fourth aspect, in some embodiments, distinctively displaying the editing area in the first picture may include one or more of the following methods: highlighting the outline of the editing area, highlighting the entire editing area, or displaying a dotted box along the outline of the editing area.

结合第四方面,在一些实施例中,在人机交互模块检测到用户针对编辑区域输入第一编辑指令之后,方法还可以包括:若第一编辑指令应用于编辑区域不合理,则人机交互模块重新推荐的编辑区域;重新推荐的编辑区域是计算模块根据第一编辑指令对应的特征向量从编辑区域之外的区域中找到的。In combination with the fourth aspect, in some embodiments, after the human-computer interaction module detects that the user inputs a first editing instruction for the editing area, the method may further include: if the first editing instruction is not reasonable to be applied to the editing area, the human-computer interaction module re-recommends the editing area; the re-recommended editing area is found by the calculation module from an area outside the editing area based on the feature vector corresponding to the first editing instruction.

结合第四方面,在一些实施例中,在人机交互模块检测到用户针对编辑区域输入第一编辑指令之后,方法还可以包括:若第一编辑指令应用于编辑区域不合理,则人机交互模块推荐修改后的第一编辑指令;修改后的第一编辑指令的编辑类型和/或编辑参数是计算模块根据编辑区域的融合特征向量从推荐池中找到的。In combination with the fourth aspect, in some embodiments, after the human-computer interaction module detects that the user inputs a first editing instruction for the editing area, the method may also include: if the application of the first editing instruction to the editing area is unreasonable, the human-computer interaction module recommends a modified first editing instruction; the editing type and/or editing parameters of the modified first editing instruction are found by the calculation module from the recommendation pool based on the fused feature vector of the editing area.

第五方面,本申请实施例提供了一种图片编辑方法,方法应用于计算模块,计算模块包括于图片编辑系统,图片编辑系统还包括:人机交互模块;In a fifth aspect, an embodiment of the present application provides a method for editing a picture, the method is applied to a computing module, the computing module is included in a picture editing system, and the picture editing system further includes: a human-computer interaction module;

该方法可以包括:计算模块根据第一图片的编辑区域的图像特征生成推荐编辑指令,并向人机交互模块发送推荐编辑指令,用于人机交互模块显示推荐编辑指令;然后,计算模块接收人机交互模块发送的第一编辑指令,并根据第一编辑指令对第一图片进行图像编辑处理;最后,计算模块向人机交互模块发送图像编辑处理后的第一图片,用于人机交互模块显示图像编辑处理后的第一图片。The method may include: a computing module generates a recommended editing instruction based on image features of an editing area of a first image, and sends the recommended editing instruction to a human-computer interaction module, so that the human-computer interaction module displays the recommended editing instruction; then, the computing module receives the first editing instruction sent by the human-computer interaction module, and performs image editing processing on the first image according to the first editing instruction; finally, the computing module sends the first image after image editing processing to the human-computer interaction module, so that the human-computer interaction module displays the first image after image editing processing.

结合第五方面,在一些实施例中,计算模块根据编辑区域的图像特征生成推荐编辑指令,具体可以包括:计算模块将编辑区域的多种图像特征的融合特征向量作为第一人工智能算法的输入之一,并将一种或多种预设编辑类型作为第一人工智能算法的输入之二;然后,计算模块通过第一人工智能算法的运算得到推荐编辑指令,推荐编辑指令包括预设编辑类型对应的编辑参数。其中,多种图像特征可以包括蒙层特征、深度特征、轮廓特征、颜色特征中的多项。In conjunction with the fifth aspect, in some embodiments, the computing module generates a recommended editing instruction based on the image features of the editing area, which may specifically include: the computing module uses the fused feature vector of multiple image features of the editing area as one of the inputs of the first artificial intelligence algorithm, and uses one or more preset editing types as the second input of the first artificial intelligence algorithm; then, the computing module obtains the recommended editing instruction through the operation of the first artificial intelligence algorithm, and the recommended editing instruction includes the editing parameters corresponding to the preset editing type. Among them, the multiple image features may include multiple features of the mask feature, depth feature, contour feature, and color feature.

结合第五方面,在一些实施例中,预设编辑类型包括以下一项或多项:删除、拖动、替换、增加,或者颜色调节。In combination with the fifth aspect, in some embodiments, the preset editing type includes one or more of the following: deletion, dragging, replacement, addition, or color adjustment.

结合第五方面,在一些实施例中,第五方面的方法还可以包括:计算模块获取第一图片的预处理信息,并从预处理信息指示的区域中确定出编辑区域。这样,编辑区域的确定可以基于图片的预处理信息,而不再基于图像分割技术,不需要重复在线计算。In conjunction with the fifth aspect, in some embodiments, the method of the fifth aspect may further include: the computing module obtains preprocessing information of the first image, and determines the editing area from the area indicated by the preprocessing information. In this way, the editing area can be determined based on the preprocessing information of the image, rather than based on image segmentation technology, and repeated online calculations are not required.

结合第五方面,在一些实施例中,预处理信息可以包括多个区域的指示信息,其中,预处理信息包括多个区域各自的轮廓点的坐标、二值图、灰度图,二值图中多个区域的取值为第一值,灰度图中多个区域的灰度值为第一灰度值或第一灰度范围。这样,从预处理信息指示的区域中确定出编辑区域,具体可以包括:将多个区域中用户操作的操作位置所处的区域确定为编辑区域,或者将多个区域中距离作用位置最近的区域确定为编辑区域。In conjunction with the fifth aspect, in some embodiments, the preprocessing information may include indication information of multiple regions, wherein the preprocessing information includes coordinates of contour points of each of the multiple regions, a binary image, and a grayscale image, the values of the multiple regions in the binary image are first values, and the grayscale values of the multiple regions in the grayscale image are first grayscale values or first grayscale ranges. In this way, determining the editing region from the region indicated by the preprocessing information may specifically include: determining the region where the operation position of the user operation is located in the multiple regions as the editing region, or determining the region closest to the operation position in the multiple regions as the editing region.

结合第五方面,在一些实施例中,预处理信息可以仅包含一个区域的指示信息,如一个区域的轮廓点的坐标,指示一个区域的二值图、灰度图等。这样,从预处理信息指示的区域中确定出编辑区域,具体可以包括:直接将该一个区域确定为编辑区域。In conjunction with the fifth aspect, in some embodiments, the preprocessing information may only include indication information of a region, such as coordinates of contour points of a region, a binary image or a grayscale image indicating a region, etc. In this way, determining the editing region from the region indicated by the preprocessing information may specifically include: directly determining the region as the editing region.

结合第五方面,在一些实施例中,第一图片的预处理信息也可以不直接指示一个或多个区域,而包含其他数据,例如多图层信息、或轮廓信息、或深度信息。这样,可以先利用该其他数据确定出一个或多个区域,然后再从这一个或多个区域中确定出编辑区域。In conjunction with the fifth aspect, in some embodiments, the preprocessing information of the first image may not directly indicate one or more regions, but may include other data, such as multi-layer information, or contour information, or depth information. In this way, the other data may be used to first determine one or more regions, and then the editing region may be determined from the one or more regions.

确定预处理信息间接指示的一个或多个区域的方法可包括但不限于:Methods for determining one or more regions indirectly indicated by the pre-processing information may include, but are not limited to:

若预处理信息包含了多个图层的图层信息,每个图层的图层信息包括图层中不透明像素的坐标,那么:计算模块在从预处理信息指示的区域中确定出编辑区域之前,可以将每一个图层中连成一片的不透明像素确定成预处理信息指示的一个区域。If the preprocessing information includes layer information of multiple layers, and the layer information of each layer includes the coordinates of opaque pixels in the layer, then: before determining the editing area from the area indicated by the preprocessing information, the calculation module can determine the connected opaque pixels in each layer as an area indicated by the preprocessing information.

若预处理信息包含轮廓信息,那么:计算模块在从预处理信息指示的区域中确定出编辑区域之前,可以将轮廓信息指示的轮廓所包围形成的区域确定成预处理信息指示的区域。If the preprocessing information includes contour information, then: before determining the editing area from the area indicated by the preprocessing information, the calculation module may determine the area surrounded by the contour indicated by the contour information as the area indicated by the preprocessing information.

若预处理信息包含深度信息,那么:计算模块在从预处理信息指示的区域中确定出编辑区域之前,可以根据深度信息,将深度值相同或相近的像素确定成预处理信息指示的一个区域。If the preprocessing information includes depth information, then: before determining the editing area from the area indicated by the preprocessing information, the calculation module may determine pixels with the same or similar depth values as an area indicated by the preprocessing information according to the depth information.

结合第五方面,在一些实施例中,第一方面的方法还可以包括:在根据第一编辑指令对第一图片进行图像编辑处理之前,若判断出第一编辑指令应用于编辑区域不合理,则计算模块重新推荐编辑指令或重新推荐编辑区域。这样,解决了用户随意输入编辑指令所导致的编辑效果不符合常识逻辑的问题,从而可以减少用户编辑图片时的试错次数。In conjunction with the fifth aspect, in some embodiments, the method of the first aspect may further include: before performing image editing processing on the first picture according to the first editing instruction, if it is determined that the first editing instruction is not reasonable to apply to the editing area, the computing module re-recommends the editing instruction or re-recommends the editing area. In this way, the problem that the editing effect caused by the user's random input of the editing instruction does not conform to common sense logic is solved, thereby reducing the number of trial and error times when the user edits the picture.

结合第五方面,在一些实施例中,计算模块可以通过比对第一编辑指令对应的特征向量与编辑区域的各个图像特征的融合特征向量判断第一编辑指令应用于编辑区域是否合理。In conjunction with the fifth aspect, in some embodiments, the computing module may determine whether it is reasonable to apply the first editing instruction to the editing area by comparing the feature vector corresponding to the first editing instruction with the fused feature vector of each image feature of the editing area.

结合第五方面,在一些实施例中,重新推荐编辑区域,具体可以包括:计算模块遍历第一图片中编辑区域之外的区域,比对所遍历的区域的融合特征向量与第一编辑指令对应的特征向量,找到能够合理搭配第一编辑指令的区域,并将找到的区域重新推荐为编辑区域。In combination with the fifth aspect, in some embodiments, re-recommending the editing area may specifically include: the computing module traverses the area outside the editing area in the first image, compares the fused feature vector of the traversed area with the feature vector corresponding to the first editing instruction, finds the area that can be reasonably matched with the first editing instruction, and re-recommends the found area as the editing area.

结合第五方面,在一些实施例中,重新推荐编辑指令,具体可以包括:计算模块遍历推荐池中的编辑类型和/或编辑参数,比对所遍历的编辑类型和/或编辑参数对应的特征向量与编辑区域的融合特征向量,找到能够合理搭配编辑区域的编辑类型和/或编辑参数,并根据找到的编辑类型和/或编辑参数修改第一编辑指令,重新推荐修改后的第一编辑指令。In combination with the fifth aspect, in some embodiments, re-recommending editing instructions may specifically include: a computing module traversing the editing types and/or editing parameters in the recommendation pool, comparing the feature vectors corresponding to the traversed editing types and/or editing parameters with the fused feature vector of the editing area, finding the editing types and/or editing parameters that can reasonably match the editing area, and modifying the first editing instruction according to the found editing type and/or editing parameters, and re-recommending the modified first editing instruction.

结合第五方面,在一些实施例中,第一编辑指令对应的编辑处理可以包括:增加新物体到编辑区域。经历这样的图像编辑处理后,在第一图片中编辑区域,第一像素区域的深度特征被替换成了新物体的深度特征,第二像素区域的深度特征保持为原物体的深度特征,其中,第一像素区域的透视关系为新物体在原物体前面,第二像素区域的透视关系为原物体在新物体前面。In conjunction with the fifth aspect, in some embodiments, the editing process corresponding to the first editing instruction may include: adding a new object to the editing area. After such image editing process, in the editing area of the first picture, the depth feature of the first pixel area is replaced with the depth feature of the new object, and the depth feature of the second pixel area remains the depth feature of the original object, wherein the perspective relationship of the first pixel area is that the new object is in front of the original object, and the perspective relationship of the second pixel area is that the original object is in front of the new object.

结合第五方面,在一些实施例中,计算模块根据第一编辑指令对第一图片进行图像编辑处理,包括:计算模块对第一图片的图像特征进行修正,利用修正后的第一图片的图像特征重新生成第一图片。In combination with the fifth aspect, in some embodiments, the computing module performs image editing processing on the first picture according to the first editing instruction, including: the computing module corrects the image features of the first picture, and regenerates the first picture using the corrected image features of the first picture.

结合第五方面,在一些实施例中,图像特征可以包括深度特征。计算模块对第一图片的图像特征进行修正,具体包括:计算模块根据第一图片的图像特征以及第一编辑指令中的编辑参数确定新物体与原物体之间的透视关系;根据新物体与原物体之间的透视关系,以及原物体的深度值,确定出新物体的基准深度,然后利用基准深度修正新物体的深度特征;利用新物体修正后的深度特征替换编辑区域的原深度特征;其中,新物体被修正深度特征后,新物体的平均深度接近或等于基准深度,且新物体上的各个区域之间的深度差保留不变。In conjunction with the fifth aspect, in some embodiments, the image features may include depth features. The calculation module corrects the image features of the first image, specifically including: the calculation module determines the perspective relationship between the new object and the original object according to the image features of the first image and the editing parameters in the first editing instruction; determines the reference depth of the new object according to the perspective relationship between the new object and the original object, and the depth value of the original object, and then corrects the depth feature of the new object using the reference depth; replaces the original depth feature of the editing area with the corrected depth feature of the new object; wherein, after the depth feature of the new object is corrected, the average depth of the new object is close to or equal to the reference depth, and the depth difference between various areas on the new object remains unchanged.

结合第五方面,在一些实施例中,图像特征还可以包括第一图像特征,第一图像特征是深度特征之外的图像特征,包括以下一项或多项:蒙层特征、轮廓特征、颜色特征。In combination with the fifth aspect, in some embodiments, the image feature may also include a first image feature, which is an image feature other than a depth feature, and includes one or more of the following: a mask feature, a contour feature, and a color feature.

计算模块对第一图片的图像特征进行修正,还可以包括:在编辑区域内,如果一个像素区域的深度特征被替换成新物体修正后的深度特征,则计算模块利用新物体上像素区域的第一图像特征替换编辑区域中像素区域原来的第一图像特征。The calculation module corrects the image features of the first image, and may also include: in the editing area, if the depth feature of a pixel area is replaced with the corrected depth feature of the new object, the calculation module uses the first image feature of the pixel area on the new object to replace the original first image feature of the pixel area in the editing area.

第六方面,本申请实施例提供了一种终端设备,可包括:人机交互模块,处理器以及存储器,其中,人机交互模块与处理器耦合,存储器与处理器耦合;人机交互模块可以包括触控屏等输入输出部件;其中,存储器可以用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器执行计算机指令时,使得终端设备执行前述第四方面的任一种或多种实施例描述的方法。In the sixth aspect, an embodiment of the present application provides a terminal device, which may include: a human-computer interaction module, a processor and a memory, wherein the human-computer interaction module is coupled to the processor, and the memory is coupled to the processor; the human-computer interaction module may include input and output components such as a touch screen; wherein the memory can be used to store computer program code, and the computer program code includes computer instructions. When the processor executes the computer instructions, the terminal device executes the method described in any one or more embodiments of the aforementioned fourth aspect.

与第二方面提供的终端设备具备强大计算能力不同的是,第六方面提供的终端设备不具备强大的计算能力,需通过网络将复杂计算任务(如编辑指令推荐算法、图像处理算法、图像语义理解算法等等)提交至云端执行并等待云侧服务器返回的任务执行结果。而当终端设备不具备强大计算能力时,前述人机交互模块可部署在终端设备上,前述计算模块可部署在服务器上,人机交互模块执行的步骤即终端设备执行的步骤,计算模块执行的步骤即服务器执行的步骤,二者之间的通信或数据交互属于设备间通信。Unlike the terminal device provided in the second aspect, which has powerful computing capabilities, the terminal device provided in the sixth aspect does not have powerful computing capabilities, and needs to submit complex computing tasks (such as editing instruction recommendation algorithms, image processing algorithms, image semantic understanding algorithms, etc.) to the cloud through the network for execution and wait for the task execution results returned by the cloud-side server. When the terminal device does not have powerful computing capabilities, the aforementioned human-computer interaction module can be deployed on the terminal device, and the aforementioned computing module can be deployed on the server. The steps executed by the human-computer interaction module are the steps executed by the terminal device, and the steps executed by the computing module are the steps executed by the server. The communication or data interaction between the two belongs to inter-device communication.

第七方面,本申请实施例提供了一种服务器,可包括:处理器以及存储器,其中,存储器与处理器耦合;其中,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器执行计算机指令时,使得终端设备执行前述第五方面的任一种或多种实施例描述的方法。In the seventh aspect, an embodiment of the present application provides a server, which may include: a processor and a memory, wherein the memory is coupled to the processor; wherein the memory is used to store computer program code, and the computer program code includes computer instructions, and when the processor executes the computer instructions, the terminal device executes the method described in any one or more embodiments of the aforementioned fifth aspect.

第八方面,本申请实施例提供了一种计算机可读存储介质,包括指令,其特征在于,当指令在终端设备上运行时,使得终端设备执行前述第四方面的任一种或多种实施例描述的方法。In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium, comprising instructions, characterized in that when the instructions are executed on a terminal device, the terminal device executes the method described in any one or more embodiments of the aforementioned fourth aspect.

第九方面,本申请实施例提供了一种计算机可读存储介质,包括指令,其特征在于,当指令在终端设备上运行时,使得终端设备执行前述第五方面的任一种或多种实施例描述的方法。In the ninth aspect, an embodiment of the present application provides a computer-readable storage medium, comprising instructions, characterized in that when the instructions are executed on a terminal device, the terminal device executes the method described in any one or more embodiments of the aforementioned fifth aspect.

第十方面,本申请实施例提供了一种图片编辑系统,可包括:终端设备和服务器,其中,终端设备为第六方面描述的终端设备,服务器为第七方面描述的服务器。In the tenth aspect, an embodiment of the present application provides a picture editing system, which may include: a terminal device and a server, wherein the terminal device is the terminal device described in the sixth aspect, and the server is the server described in the seventh aspect.

第十一方面,本申请实施例提供了一种图片编辑系统,可包括:人机交互模块和计算模块,其中:In an eleventh aspect, an embodiment of the present application provides a picture editing system, which may include: a human-computer interaction module and a computing module, wherein:

人机交互模块可用于显示第一图片,并检测到作用于第一图片上的用于选择编辑区域的用户操作,并将用户操作在第一图片上的操作位置告知计算模块;The human-computer interaction module may be used to display the first image, detect a user operation on the first image for selecting an editing area, and inform the calculation module of an operation position of the user operation on the first image;

计算模块可用于根据用户操作在第一图片上的操作位置从第一图片中确定出编辑区域,以及将编辑区域告知人机交互模块;The calculation module may be used to determine an editing area from the first image according to an operation position of a user operation on the first image, and inform the human-computer interaction module of the editing area;

人机交互模块还可用于在第一图片中区别显示编辑区域;The human-computer interaction module may also be used to distinguish and display the editing area in the first picture;

计算模块还可用于根据编辑区域的图像特征生成推荐编辑指令,并将该推荐编辑指令告知人机交互模块;The calculation module may also be used to generate a recommended editing instruction according to the image features of the editing area, and inform the human-computer interaction module of the recommended editing instruction;

人机交互模块还可用于显示推荐编辑指令,然后检测到用户针对编辑区域输入第一编辑指令,并向计算模块发送第一编辑指令;第一编辑指令包括:推荐编辑指令;The human-computer interaction module can also be used to display recommended editing instructions, then detect that the user inputs a first editing instruction for the editing area, and send the first editing instruction to the calculation module; the first editing instruction includes: a recommended editing instruction;

计算模块还可用于根据第一编辑指令对第一图片进行图像编辑处理,并向人机交互模块发送图像编辑处理后的第一图片;The calculation module may also be used to perform image editing processing on the first picture according to the first editing instruction, and send the first picture after image editing processing to the human-computer interaction module;

最后,人机交互模块还可用于展示图像编辑处理后的第一图片。Finally, the human-computer interaction module can also be used to display the first image after image editing.

第十一方面提供的图片编辑系统可以部署在同一个设备(如具备强大计算能力的手机等终端设备)中,也可以部署在两个设备(终端设备和服务器)中。The image editing system provided in the eleventh aspect can be deployed in the same device (such as a terminal device such as a mobile phone with powerful computing capabilities) or in two devices (a terminal device and a server).

第十一方面提供的图片编辑系统中,人机交互模块可以执行前述第四方面的任一种或多种实施例描述的方法,计算模块可以执行前述第五方面的任一种或多种实施例描述的方法。In the picture editing system provided in the eleventh aspect, the human-computer interaction module can execute the method described in any one or more embodiments of the aforementioned fourth aspect, and the computing module can execute the method described in any one or more embodiments of the aforementioned fifth aspect.

第十二方面,本申请实施例提供了一种芯片系统,该芯片系统应用于终端设备,该芯片系统包括一个或多个处理器,该处理器用于调用计算机指令以使得该终端设备可执行前述第一方面,或者第四方面任一种或多种实施例描述的方法。In the twelfth aspect, an embodiment of the present application provides a chip system, which is applied to a terminal device. The chip system includes one or more processors, which are used to call computer instructions so that the terminal device can execute the method described in the aforementioned first aspect, or any one or more embodiments of the fourth aspect.

第十三方面,本申请实施例提供了一种芯片系统,该芯片系统应用于服务器,该芯片系统包括一个或多个处理器,该处理器用于调用计算机指令以使得该服务器可执行前述第五方面的任一种或多种实施例描述的方法。In the thirteenth aspect, an embodiment of the present application provides a chip system, which is applied to a server, and the chip system includes one or more processors, which are used to call computer instructions so that the server can execute the method described in any one or more embodiments of the aforementioned fifth aspect.

第十四方面,本申请提供一种包含指令的计算机程序产品,当上述计算机程序产品在电子设备上运行时,使得该电子设备可执行前述第一方面,或者第四方面,或者第五方面的任一种或多种实施例描述的方法。In a fourteenth aspect, the present application provides a computer program product comprising instructions. When the above-mentioned computer program product runs on an electronic device, the electronic device can execute the method described in any one or more embodiments of the above-mentioned first aspect, fourth aspect, or fifth aspect.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1示出了本申请实施例提供的图片编辑系统;FIG1 shows a picture editing system provided by an embodiment of the present application;

图2示出了本申请实施例提供的终端设备;FIG2 shows a terminal device provided in an embodiment of the present application;

图3示出了本申请实施例提供的服务器;FIG3 shows a server provided in an embodiment of the present application;

图4示出了本申请各个方法实施例间的关系;FIG4 shows the relationship between various method embodiments of the present application;

图5示出了本申请实施例提供的一种图片编辑方法;FIG5 shows a method for editing a picture provided by an embodiment of the present application;

图6示出了用户打开图片的一种示例;FIG6 shows an example of a user opening a picture;

图7示出了用户在图片中选择编辑区域“天空”的一种示例;FIG. 7 shows an example in which a user selects the editing area “sky” in a picture;

图8示出了用户在图片中绘制心形编辑区域的一种示例;FIG8 shows an example of a user drawing a heart-shaped editing area in a picture;

图9示出了提示用户通过语音或文本输入编辑指令的一种示例;FIG9 shows an example of prompting a user to input an editing instruction by voice or text;

图10示出了向用户输出推荐编辑指令的一种示例;FIG10 shows an example of outputting a recommended editing instruction to a user;

图11示出了向用户输出推荐编辑指令的另一种示例;FIG11 shows another example of outputting a recommended editing instruction to a user;

图12示例性示出了图片的蒙层特征的数据化形式;FIG12 exemplarily shows the digitized form of the mask feature of the image;

图13示例性示出了蒙层特征的可视化形式;FIG13 exemplarily shows a visualization form of a mask feature;

图14示例性示出了深度特征的可视化形式;FIG14 exemplarily shows a visualization form of a deep feature;

图15示例性示出了轮廓特征的可视化形式;FIG15 exemplarily shows a visualization form of contour features;

图16示例性示出了颜色特征的可视化形式;FIG16 exemplarily shows a visualization form of a color feature;

图17示例性示出了针对编辑区域“天空”执行编辑指令“新增‘翱翔的大雁’”后的结果;FIG17 exemplarily shows the result of executing the editing command “add ‘flying geese’” for the editing area “sky”;

图18示例性示出了针对编辑区域“天空”执行编辑指令“调整色调:亮度降至82%,饱和度降至75%,色相变为红色”后的结果;FIG18 exemplarily shows the result after executing the editing instruction “adjust hue: reduce brightness to 82%, reduce saturation to 75%, and change hue to red” for the editing area “sky”;

图19示例性示出了“翱翔的大雁”的图像;FIG. 19 exemplarily shows an image of “flying geese”;

图20示出了本申请实施例提供的另一种图片编辑方法;FIG20 shows another picture editing method provided by an embodiment of the present application;

图21示例性示出了图片的预处理信息仅包含一个区域的轮廓点的坐标数据;FIG. 21 exemplarily shows that the preprocessing information of a picture only includes the coordinate data of the contour points of one area;

图22示例性示出了图片的预处理信息仅包含一个区域的二值数据;FIG. 22 exemplarily shows that the preprocessing information of a picture only includes binary data of one region;

图23示例性示出了基于包含轮廓信息的预处理信息确定预处理信息指示的多个区域;FIG23 exemplarily shows a method of determining a plurality of regions indicated by preprocessing information based on preprocessing information including contour information;

图24示例性示出了基于包含深度信息的预处理信息确定预处理信息指示的多个区域;FIG. 24 exemplarily shows a method of determining a plurality of regions indicated by preprocessing information based on preprocessing information including depth information;

图25示出了本申请实施例提供的另一种图片编辑方法;FIG. 25 shows another picture editing method provided by an embodiment of the present application;

图26示出了本申请实施例提供的判断第一编辑指令与编辑区域是否搭配合理的方法流程;FIG. 26 shows a method flow for determining whether a first editing instruction and an editing area are reasonably matched, provided in an embodiment of the present application;

图27示出了提示用户第一编辑指令与编辑区域搭配不合理的一种示例;FIG. 27 shows an example of prompting the user that the first editing instruction and the editing area are not properly matched;

图28示出了提示用户改变编辑区域的一种示例;FIG28 shows an example of prompting the user to change the editing area;

图29示出了提示用户改变编辑指令的一种示例;FIG29 shows an example of prompting the user to change the editing instruction;

图30示出了本申请实施例提供的如何执行涉及新增物体的图像的方法流程;FIG30 shows a method flow of how to execute an image involving a newly added object provided by an embodiment of the present application;

图31示例性示出了新物体与图片中的原物体之间的透视关系不合理的情况;FIG31 exemplarily shows a situation where the perspective relationship between the new object and the original object in the picture is unreasonable;

图32示例性示出了新物体图像;FIG32 exemplarily shows a new object image;

图33示例性示出了编辑区域中不能采用新物体蒙层特征的像素区域;FIG33 exemplarily shows a pixel area in an edit area where a new object mask feature cannot be used;

图34示例性示出了新物体与图片中的原物体之间的透视关系合理的情况;FIG34 exemplarily shows a case where the perspective relationship between the new object and the original object in the picture is reasonable;

图35示例性示出了新物体图像区域和原图中发生特征替换(如深度特征替换、蒙层特征替换、轮廓特征替换等)的区域的大小对比。FIG35 exemplarily shows the size comparison between the new object image area and the area in the original image where feature replacement (such as depth feature replacement, mask feature replacement, contour feature replacement, etc.) occurs.

具体实施方式DETAILED DESCRIPTION

本申请以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。The terms used in the following embodiments of the present application are only for the purpose of describing specific embodiments and are not intended to limit the present application.

为降低图片编辑的复杂度,一些简单的图片编辑功能被广泛应用,例如擦除图片局部区域、移动图片中某个物体的位置、调整图像中某个物体的颜色或亮度等。在这些图片编辑功能中,用户不需要对图片中的局部区域进行抠图、重绘、上色等复杂编辑处理,仅需要通过点击操作选择目标区域或目标物体,并输入删除、调色、移动物体位置等编辑指令。但是,这类图片编辑功能仅支持固有的几种编辑操作,不支持用户自定义编辑操作;而且,仅支持用户处理图片中已经出现的内容,不支持用户像图片中添加新内容;另外,不能提示用户编辑操作是否合理,例如“将街边的长椅拖动至天空”是不合理的。In order to reduce the complexity of image editing, some simple image editing functions are widely used, such as erasing a local area of the image, moving the position of an object in the image, adjusting the color or brightness of an object in the image, etc. In these image editing functions, users do not need to perform complex editing processes such as cutting out, redrawing, and coloring the local area of the image. They only need to select the target area or target object by clicking, and enter editing instructions such as deletion, color adjustment, and moving the object position. However, this type of image editing function only supports a few inherent editing operations, and does not support user-defined editing operations; moreover, it only supports users to process content that has already appeared in the image, and does not support users to add new content to the image; in addition, it cannot prompt users whether the editing operation is reasonable, for example, "dragging a street bench to the sky" is unreasonable.

本申请实施例提供了一种图片编辑方法、相关设备及系统,可降低图片编辑门槛,支持用户自定义编辑指令,并且能够理解用户操作意图,为用户提供编辑思路,避免用户经历大量试错,显著提高了出图效率。The embodiments of the present application provide a picture editing method, related equipment and system, which can lower the threshold for picture editing, support user-defined editing instructions, and understand the user's operating intentions, provide users with editing ideas, avoid users from experiencing a lot of trial and error, and significantly improve the efficiency of picture output.

本申请实施例提供的图片编辑方法可基于图1所示的图片编辑系统10而实现,如图1所示,图片编辑系统10可包括人机交互模块100和计算模块200。人机交互模块100和计算模块200一起协作,共同实现本申请实施例提供的图片编辑方法。The picture editing method provided in the embodiment of the present application can be implemented based on the picture editing system 10 shown in Figure 1. As shown in Figure 1, the picture editing system 10 may include a human-computer interaction module 100 and a computing module 200. The human-computer interaction module 100 and the computing module 200 work together to jointly implement the picture editing method provided in the embodiment of the present application.

其中,人机交互模块100可作为用户使用图片编辑功能的人机交互接口。人机交互模块100上可运行图片编辑程序,例如相册(又称图库)、修图应用程序、绘图设计程序等。人机交互模块100可提供触控输入、音频输入、手势输入等输入能力,以及显示输出、音频输出等输出能力,这样用户可以通过点击、拖动、文本输入、手势等方式使用图片编辑功能,例如添加新物体到图片中、替换图片中的某个物体、擦除图片中的局部区域、移动图片中的某个物体的位置、调整图片中某个物体的颜色或亮度,等等。人机交互模块100还可用于和计算模块200通信,以向计算模块200传输编辑参数(如用户点击的位置、编辑区域的图像特征等),使得计算模块根据编辑参数执行图片编辑功能涉及的复杂计算。关于编辑参数是什么、计算模块根据编辑参数执行何种计算,后面实施例中会详细说明,这里先不展开。Among them, the human-computer interaction module 100 can be used as a human-computer interaction interface for users to use the picture editing function. Picture editing programs, such as photo albums (also known as gallery), picture editing applications, drawing design programs, etc., can be run on the human-computer interaction module 100. The human-computer interaction module 100 can provide input capabilities such as touch input, audio input, gesture input, and output capabilities such as display output and audio output, so that users can use picture editing functions by clicking, dragging, text input, gestures, etc., such as adding new objects to the picture, replacing an object in the picture, erasing a local area in the picture, moving the position of an object in the picture, adjusting the color or brightness of an object in the picture, etc. The human-computer interaction module 100 can also be used to communicate with the calculation module 200 to transmit editing parameters (such as the position clicked by the user, the image features of the editing area, etc.) to the calculation module 200, so that the calculation module performs complex calculations involved in the picture editing function according to the editing parameters. What are the editing parameters and what kind of calculations the calculation module performs according to the editing parameters will be described in detail in the following embodiments, and will not be expanded here.

其中,计算模块200可负责图片编辑功能涉及的复杂计算,例如编辑指令推荐算法、图像处理算法、图像理解算法等等。计算模块200还可以用于与人机交互模块100通信,以接收人机交互模块100传输的编辑参数,并向人机交互模块100返回处理结果,使得人机交互模块100依据该处理结果向用户提供编辑建议或展示编辑后的图片。Among them, the calculation module 200 can be responsible for the complex calculations involved in the picture editing function, such as editing instruction recommendation algorithm, image processing algorithm, image understanding algorithm, etc. The calculation module 200 can also be used to communicate with the human-computer interaction module 100 to receive the editing parameters transmitted by the human-computer interaction module 100, and return the processing results to the human-computer interaction module 100, so that the human-computer interaction module 100 provides editing suggestions to the user or displays the edited picture according to the processing results.

人机交互模块100和计算模块200可以分别处于不同设备中,例如人机交互模块100可以处于手机、平板电脑、智慧屏、智能手表等终端设备中,计算模块200可以处于如云侧服务器等能够提供更强计算能力的设备中,或者处于具有富余计算能力的其他终端设备中。此时,二者之间的通信是设备与设备间的通信,它可以是2G/3G/4G/5G等移动通信、无线高保真(wireless fidelity,Wi-Fi)通信,卫星通信等无线通信,也可以是以太网(Ethernet)通信、通用串口总线(Universal Serial Bus,USB)通信等有线通信。The human-computer interaction module 100 and the computing module 200 can be in different devices respectively. For example, the human-computer interaction module 100 can be in a terminal device such as a mobile phone, a tablet computer, a smart screen, a smart watch, etc., and the computing module 200 can be in a device such as a cloud-side server that can provide stronger computing power, or in other terminal devices with surplus computing power. At this time, the communication between the two is the communication between devices, which can be mobile communications such as 2G/3G/4G/5G, wireless fidelity (Wi-Fi) communication, satellite communication and other wireless communications, or Ethernet communication, Universal Serial Bus (USB) communication and other wired communications.

人机交互模块100和计算模块200也可以集成于同一设备中,例如二者都处于兼具人机交互能力和复杂计算能力的手机、平板电脑、个人电脑等终端设备中。此时,二者之间的通信是设备内的通信,它可以是总线通信、共享内存通信等设备内通信方式。The human-computer interaction module 100 and the computing module 200 may also be integrated into the same device, for example, both are in a terminal device such as a mobile phone, a tablet computer, a personal computer, etc. that has both human-computer interaction capabilities and complex computing capabilities. In this case, the communication between the two is intra-device communication, which can be intra-device communication such as bus communication and shared memory communication.

图2示例性示出了本申请实施例提供的终端设备300。FIG. 2 exemplarily shows a terminal device 300 provided in an embodiment of the present application.

终端设备300可兼具人机交互能力和计算能力。终端设备300的设备类型可以是手机、平板电脑、手持计算机、桌面型计算机、膝上型计算机、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、蜂窝电话、个人数字助理(personal digital assistant,PDA),以及智能大屏等智能家居设备,智能手表、智能眼镜等可穿戴设备,增强现实(augmented reality,AR)、虚拟现实(virtual reality,VR)、混合现实(mixed reality,MR)等扩展现实(extended reality,XR)设备,车载设备或智慧城市设备,等等中的任一种。The terminal device 300 may have both human-computer interaction capability and computing capability. The device type of the terminal device 300 may be any of a mobile phone, a tablet computer, a handheld computer, a desktop computer, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a personal digital assistant (PDA), and smart home devices such as smart large screens, wearable devices such as smart watches and smart glasses, extended reality (XR) devices such as augmented reality (AR), virtual reality (VR), and mixed reality (MR), in-vehicle devices or smart city devices, etc.

如图2所示,终端设备300可以包括:处理器110,存储器120,显示器130,显示驱动集成电路(display driver integrated circuit,DDIC)140,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中,传感器模块180可以包括陀螺仪传感器180B,加速度传感器180E,以及触摸传感器180K等。终端设备300中的各个部分之间可通过总线连接。As shown in FIG2 , the terminal device 300 may include: a processor 110, a memory 120, a display 130, a display driver integrated circuit (DDIC) 140, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, and a subscriber identification module (SIM) card interface 195, etc. Among them, the sensor module 180 may include a gyroscope sensor 180B, an acceleration sensor 180E, and a touch sensor 180K, etc. The various parts in the terminal device 300 may be connected through a bus.

其中,处理器110可负责提供计算能力,可用作终端设备的计算模块;显示器130、音频模块170、扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193等输入输出部件,可负责提供人机交互能力,可用作终端设备的人机交互模块。当终端设备内的计算模块具有强大的计算能力时,终端设备300可独自实现成图1所示的图片编辑系统10;此时,终端设备300内负责提供计算能力的部分(如处理器)可构成图片编辑系统10中的计算模块200,终端设备300内负责提供人机交互能力的部分(如显示屏、触摸传感器等)可构成图片编辑系统10中的人机交互模块100。当终端设备内的计算模块不具备强大的计算能力时,终端设备300也可以仅实现成图片编辑系统10中的人机交互模块100,而图片编辑系统10中的计算模块200可由云侧服务器来实现。Among them, the processor 110 can be responsible for providing computing power and can be used as a computing module of the terminal device; the display 130, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 and other input and output components can be responsible for providing human-computer interaction capabilities and can be used as a human-computer interaction module of the terminal device. When the computing module in the terminal device has powerful computing power, the terminal device 300 can be independently implemented as the picture editing system 10 shown in Figure 1; at this time, the part of the terminal device 300 responsible for providing computing power (such as the processor) can constitute the computing module 200 in the picture editing system 10, and the part of the terminal device 300 responsible for providing human-computer interaction capabilities (such as the display screen, touch sensor, etc.) can constitute the human-computer interaction module 100 in the picture editing system 10. When the computing module in the terminal device does not have strong computing power, the terminal device 300 can also be implemented as only the human-computer interaction module 100 in the picture editing system 10, and the computing module 200 in the picture editing system 10 can be implemented by the cloud-side server.

处理器110可以是一个或多个,它们可以集成于片上系统(system on chip,SOC)的集成电路之内。SOC是一种系统级芯片。处理器110可以包括中央处理单元(central processing unit,CPU)、图形处理单元(graphic processing unit,GPU)、神经网络处理单元(neural-network processing unit,NPU)等。其中,CPU可以包括应用处理器(application processor,AP)、基带处理器芯片(baseband processor,BP)等,其中AP可负责运行终端设备上的操作系统、用户界面、应用程序;BP可负责收发无线信号、管理射频服务。GPU可负责图形渲染,根据来自CPU的渲染指令和数据进行着色,材质的填充、渲染、输出等工作。NPU通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。NPU可用于运行人工智能算法,例如编辑指令推荐算法、图像处理算法、图像理解算法等等。CPU和GPU可用于渲染合成出要送显到显示器130的画面。The processor 110 may be one or more, and they may be integrated into an integrated circuit of a system on chip (SOC). SOC is a system-level chip. The processor 110 may include a central processing unit (CPU), a graphic processing unit (GPU), a neural-network processing unit (NPU), etc. Among them, the CPU may include an application processor (AP), a baseband processor chip (BP), etc., wherein the AP may be responsible for running the operating system, user interface, and application program on the terminal device; the BP may be responsible for receiving and sending wireless signals and managing radio frequency services. The GPU may be responsible for graphics rendering, coloring according to the rendering instructions and data from the CPU, filling, rendering, and outputting materials, etc. The NPU quickly processes input information by drawing on the biological neural network structure, such as the transmission mode between neurons in the human brain, and can also continuously self-learn. The NPU can be used to run artificial intelligence algorithms, such as editing instruction recommendation algorithms, image processing algorithms, image understanding algorithms, etc. The CPU and GPU can be used to render and synthesize the images to be sent to the display 130.

处理器110可以包括一个或多个接口,例如集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。The processor 110 may include one or more interfaces, such as an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general purpose input output (GPIO) interface, a subscriber identity module (SIM) interface, and/or a universal serial bus (USB) interface, etc.

处理器110内可以设置有高速缓冲存储器,可用于保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从高速缓冲存储器中直接调用,可减少处理器110的等待时间,提高程序运行效率。The processor 110 may be provided with a cache memory for storing instructions or data just used or cyclically used by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the cache memory, which can reduce the waiting time of the processor 110 and improve the program running efficiency.

其中,存储器120可以包括程序存储区和用户数据存储区,其中,程序存储区可存储操作系统以及一个或多个应用程序(如游戏应用),数据存储区可存储用户在使用终端设备300过程中所创建的数据(比如照片,联系人)。存储器120可以是高速随机存取存储器,也可以是非易失性存储器,例如磁盘,闪存,通用闪存存储器(universal flash storage,UFS)等。存储器120还可以是外部存储卡,例如Micro SD卡。The memory 120 may include a program storage area and a user data storage area, wherein the program storage area may store an operating system and one or more application programs (such as game applications), and the data storage area may store data (such as photos, contacts) created by the user when using the terminal device 300. The memory 120 may be a high-speed random access memory or a non-volatile memory, such as a disk, a flash memory, a universal flash storage (UFS), etc. The memory 120 may also be an external memory card, such as a Micro SD card.

存储器120中还可存储有本申请实施例提供的图片编辑方法的代码指令,当处理器110从存储器120上读取该代码指令并运行该代码指令时,可使得终端设备300执行本申请实施例提供的图片编辑方法中人机交互模块和/或计算模块所执行的步骤。The memory 120 may also store code instructions of the image editing method provided in the embodiment of the present application. When the processor 110 reads the code instructions from the memory 120 and runs the code instructions, the terminal device 300 can execute the steps performed by the human-computer interaction module and/or the computing module in the image editing method provided in the embodiment of the present application.

存储器120也可以和处理器110一起集成于SOC的集成电路中。The memory 120 may also be integrated with the processor 110 in an integrated circuit of a SOC.

如图2所示,终端设备300可以通过SOC,DDIC 140,以及显示器130等实现显示功能。As shown in Figure 2, the terminal device 300 can realize the display function through SOC, DDIC 140, and display 130.

其中,显示器130具有多个刷新率。刷新率表示显示屏在1秒内刷新显示画面的次数。例如,60赫兹(Hz)刷新率表示显示屏在1秒内刷新显示画面60次。显示器130可以采用LTPO显示面板,允许刷新率降低到低刷新率,如10Hz、1Hz,从而支持降低显示屏功耗。The display 130 has multiple refresh rates. The refresh rate indicates the number of times the display screen refreshes the display screen in 1 second. For example, a 60 Hz refresh rate indicates that the display screen refreshes the display screen 60 times in 1 second. The display 130 may use an LTPO display panel, allowing the refresh rate to be reduced to a low refresh rate, such as 10 Hz or 1 Hz, thereby supporting the reduction of the power consumption of the display screen.

其中,显示驱动集成电路(DDIC)140可用作显示器130的控制核心,驱动显示器130工作,并接收来自SOC(处理器110)的数据,如图像数据以及一些指令。DDIC 140可通过电信号的形式向显示器130的显示面板发送驱动信号和数据,继而实现对屏幕亮度和色彩的控制,使得诸如字母、图片等图像信息得以在屏幕上显现,完成屏幕刷新。Among them, the display driver integrated circuit (DDIC) 140 can be used as the control core of the display 130 to drive the display 130 to work and receive data from the SOC (processor 110), such as image data and some instructions. The DDIC 140 can send driving signals and data to the display panel of the display 130 in the form of electrical signals, and then realize the control of the screen brightness and color, so that image information such as letters and pictures can be displayed on the screen, completing the screen refresh.

SOC发送给DDIC 140的待显示画面的图像数据可送到帧缓存(Frame Buffer)中存放,以完成送显(或称送图)。然后,DDIC 140从帧缓存中取出图像数据并驱动显示器130进行显示。The image data of the screen to be displayed sent by the SOC to the DDIC 140 can be sent to the frame buffer for storage to complete the display (or image sending). Then, the DDIC 140 takes out the image data from the frame buffer and drives the display 130 for display.

终端设备300的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the terminal device 300 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.

天线1和天线2用于发射和接收电磁波信号。终端设备300中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in terminal device 300 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve the utilization of antennas. For example, antenna 1 can be reused as a diversity antenna for a wireless local area network. In some other embodiments, the antenna can be used in combination with a tuning switch.

移动通信模块150可以提供应用在终端设备300上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 can provide solutions for wireless communications including 2G/3G/4G/5G applied to the terminal device 300. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, and filter, amplify, and process the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and convert it into electromagnetic waves for radiation through the antenna 1. In some embodiments, at least some of the functional modules of the mobile communication module 150 can be set in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 can be set in the same device as at least some of the modules of the processor 110.

调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示器130显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display 130. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be set in the same device as the mobile communication module 150 or other functional modules.

无线通信模块160可以提供应用在终端设备300上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide wireless communication solutions including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared (IR) and the like applied to the terminal device 300. The wireless communication module 160 can be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the frequency of the electromagnetic wave signal and performs filtering, and sends the processed signal to the processor 110. The wireless communication module 160 can also receive the signal to be sent from the processor 110, modulate the frequency of the signal, amplify the signal, and convert it into electromagnetic waves for radiation through the antenna 2.

在一些实施例中,终端设备300的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得终端设备300可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the terminal device 300 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal device 300 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology. The GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS) and/or a satellite based augmentation system (SBAS).

终端设备300可以通过ISP,摄像头193,视频编解码器,GPU,显示器130以及应用处理器等实现拍摄功能。The terminal device 300 can realize the shooting function through ISP, camera 193, video codec, GPU, display 130 and application processor.

ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。ISP is used to process the data fed back by camera 193. For example, when taking a photo, the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens. The light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to ISP for processing and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, ISP can be set in camera 193.

摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,终端设备300可以包括1个或N个摄像头193,N为大于1的正整数。The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and projects it onto the photosensitive element. The photosensitive element can be a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format. In some embodiments, the terminal device 300 may include 1 or N cameras 193, where N is a positive integer greater than 1.

视频编解码器用于对数字视频压缩或解压缩。终端设备300可以支持一种或多种视频编解码器。这样,终端设备300可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital videos. The terminal device 300 may support one or more video codecs. Thus, the terminal device 300 may play or record videos in multiple coding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

终端设备300可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The terminal device 300 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.

音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 can be arranged in the processor 110, or some functional modules of the audio module 170 can be arranged in the processor 110.

扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。终端设备300可以通过扬声器170A收听音乐,或收听免提通话。The speaker 170A, also called a "speaker", is used to convert an audio electrical signal into a sound signal. The terminal device 300 can listen to music or listen to a hands-free call through the speaker 170A.

受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当终端设备300接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The receiver 170B, also called a "handset", is used to convert audio electrical signals into sound signals. When the terminal device 300 receives a call or voice message, the voice can be received by placing the receiver 170B close to the ear.

麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。终端设备300可以设置至少一个麦克风170C。在另一些实施例中,终端设备300可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,终端设备300还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。Microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak by putting their mouth close to the microphone 170C to input the sound signal into the microphone 170C. The terminal device 300 can be provided with at least one microphone 170C. In other embodiments, the terminal device 300 can be provided with two microphones 170C, which can not only collect sound signals but also realize noise reduction function. In other embodiments, the terminal device 300 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify the sound source, realize directional recording function, etc.

耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone interface 170D is used to connect a wired earphone. The earphone interface 170D may be a USB interface, or a 3.5 mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.

按键190包括电源键,音量键等。按键190可以是机械按键,也可以是触摸式按键。终端设备300可以接收按键输入,产生与终端设备300的用户设置以及功能控制有关的键信号输入。马达191可以产生振动提示。SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和终端设备300的接触和分离。The button 190 includes a power button, a volume button, etc. The button 190 can be a mechanical button or a touch button. The terminal device 300 can receive the button input and generate a key signal input related to the user settings and function control of the terminal device 300. The motor 191 can generate a vibration prompt. The SIM card interface 195 is used to connect the SIM card. The SIM card can be connected to and separated from the terminal device 300 by inserting the SIM card interface 195 or pulling it out from the SIM card interface 195.

图2示意的结构并不构成对终端设备300的具体限定。终端设备300可以包括比图示更多或更少的部分,或者组合某些部分,或者拆分某些部分,或者不同的部分布置。图示的各个部分可以以硬件,软件或软件和硬件的组合实现。The structure shown in FIG2 does not constitute a specific limitation on the terminal device 300. The terminal device 300 may include more or fewer parts than shown in the figure, or combine some parts, or split some parts, or arrange the parts differently. The various parts shown in the figure may be implemented in hardware, software, or a combination of software and hardware.

图3示例性示出了本申请实施例提供的服务器400。FIG. 3 exemplarily shows a server 400 provided in an embodiment of the present application.

服务器400可提供复杂计算能力,用于执行本申请实施例提供的图片编辑方法中被计算模块执行的步骤。The server 400 can provide complex computing capabilities for executing the steps performed by the computing module in the picture editing method provided in the embodiment of the present application.

如图3所示,服务器400可包括:处理器210、存储器220、输入输出设备230、通信模块240等,这些部件可以通过总线耦合。As shown in FIG. 3 , the server 400 may include: a processor 210 , a memory 220 , an input/output device 230 , a communication module 240 , etc. These components may be coupled via a bus.

服务器400可具有强大的计算资源,其上的处理器210可以包括算力强大的一个或多个处理器,如中央处理器(CPU)、神经网络处理单元(NPU)、图形处理器(GPU)等。The server 400 may have powerful computing resources, and the processor 210 thereon may include one or more processors with powerful computing power, such as a central processing unit (CPU), a neural network processing unit (NPU), a graphics processing unit (GPU), etc.

处理器210可以包括一个或多个接口,例如集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。The processor 210 may include one or more interfaces, such as an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general purpose input output (GPIO) interface, a subscriber identity module (SIM) interface, and/or a universal serial bus (USB) interface, etc.

处理器210内可以设置有高速缓冲存储器,可用于保存处理器210刚用过或循环使用的指令或数据。如果处理器210需要再次使用该指令或数据,可从高速缓冲存储器中直接调用,可减少处理器210的等待时间,提高程序运行效率。The processor 210 may be provided with a cache memory, which may be used to store instructions or data just used or cyclically used by the processor 210. If the processor 210 needs to use the instruction or data again, it may be directly called from the cache memory, which may reduce the waiting time of the processor 210 and improve the program running efficiency.

处理器210还可以在外部连接存储器。该存储器可以是高速随机存取存储器,也可以是非易失性存储器,例如磁盘,闪存,通用闪存存储器(UFS)等。该存储器还可以是外部存储卡,例如Micro SD卡。The processor 210 may also be connected to an external memory. The memory may be a high-speed random access memory or a non-volatile memory, such as a disk, a flash memory, a universal flash memory (UFS), etc. The memory may also be an external memory card, such as a Micro SD card.

处理器210是服务器400的计算核心,具有强大的计算能力。它与存储器220耦合,可用于读取并执行存储器220中的计算机可读指令,运行操作系统及各种程序。具体的,CPU210可用于调用存储于存储器220中的程序,例如本申请实施例提供的图片编辑方法的实现程序,并执行该程序包含的指令。The processor 210 is the computing core of the server 400 and has powerful computing capabilities. It is coupled to the memory 220 and can be used to read and execute computer-readable instructions in the memory 220 and run the operating system and various programs. Specifically, the CPU 210 can be used to call a program stored in the memory 220, such as the implementation program of the picture editing method provided in the embodiment of the present application, and execute the instructions contained in the program.

存储器220可包括高速随机存取存储器、非易失性存储器,例如磁盘、闪存或其他非易失性固态存储设备。存储器220可用于存储各种软件程序、多组指令。存储器220可以存储操作系统,例如Linux等操作系统。存储器220还可以存储一个或多个程序,例如补丁制作涉及的程序,如编译器、链接器。存储器220还可以存储本申请实施例提供的图片编辑方法的代码指令,当处理器210从存储器220上读取该代码指令并运行该代码指令时,可使得服务器400执行本申请实施例提供的图片编辑方法中被计算模块所执行的步骤。The memory 220 may include a high-speed random access memory, a non-volatile memory, such as a disk, a flash memory or other non-volatile solid-state storage device. The memory 220 may be used to store various software programs and multiple sets of instructions. The memory 220 may store an operating system, such as an operating system such as Linux. The memory 220 may also store one or more programs, such as programs involved in patch production, such as a compiler and a linker. The memory 220 may also store code instructions of the image editing method provided in the embodiment of the present application. When the processor 210 reads the code instructions from the memory 220 and runs the code instructions, the server 400 may execute the steps performed by the computing module in the image editing method provided in the embodiment of the present application.

输入输出设备230可包括显示屏、键盘、鼠标等设备,可用于接收用户输入、向用户输出程序运行结果。The input and output devices 230 may include a display screen, a keyboard, a mouse and other devices, which can be used to receive user input and output program execution results to the user.

通信模块240可以包括有线通信模块、无线通信模块。其中,有线通信模块可支持有线通信协议,如通用串行总线(universal serial bus,USB)、串口、以太网等协议,通过物理通信线缆与其他设备通信。无线通信模块可包括2G/3G/4G/5G等无线通信模块,Wi-Fi通信模块等。无线通信模块经由天线接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到CPU210;无线通信模块还可以从CPU210接收待发送的信号,对其进行调频,放大,经天线转为电磁波辐射出去。The communication module 240 may include a wired communication module and a wireless communication module. Among them, the wired communication module may support wired communication protocols, such as universal serial bus (USB), serial port, Ethernet and other protocols, and communicate with other devices through physical communication cables. The wireless communication module may include 2G/3G/4G/5G and other wireless communication modules, Wi-Fi communication modules, etc. The wireless communication module receives electromagnetic waves via an antenna, modulates and filters the electromagnetic wave signals, and sends the processed signals to the CPU 210; the wireless communication module may also receive the signal to be sent from the CPU 210, modulate the frequency, amplify it, and convert it into electromagnetic waves for radiation through the antenna.

图3示意的结构并不构成对服务器400的限定服务器400可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件,或软件和硬件的组合实现。The structure shown in FIG3 does not constitute a limitation on the server 400. The server 400 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently. The components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.

基于前述实施例介绍的各种产品,下面将通过四个实施例详细说明本申请实施例提供的图片编辑方法。这四个实施例可分别称为实施例一、实施例二、实施例三和实施例四。如图4所示,实施例一介绍了本申请提供的图片编辑方法的总体流程,实施例二是实施例一的代替方案。实施例一、二介绍了两种用户操作方式,均可预测用户编辑意图并给出推荐编辑操作。实施例三介绍了如何判断用户输入的编辑操作是否合理。实施例四解决了新增或替换元素时无法如何将该元素插入到原图已有物体中间的问题。实施例一、二是并列关系,实施例三、四是对实施例一、二的补充,是对整体流程中某些步骤的展开介绍。Based on the various products introduced in the aforementioned embodiments, the picture editing method provided by the embodiments of the present application will be described in detail through four embodiments below. These four embodiments may be referred to as Embodiment 1, Embodiment 2, Embodiment 3 and Embodiment 4, respectively. As shown in Figure 4, Embodiment 1 introduces the overall process of the picture editing method provided by the present application, and Embodiment 2 is an alternative to Embodiment 1. Embodiments 1 and 2 introduce two user operation modes, both of which can predict the user's editing intentions and give recommended editing operations. Embodiment 3 introduces how to judge whether the editing operation input by the user is reasonable. Embodiment 4 solves the problem of how to insert the element into the middle of an existing object in the original image when adding or replacing an element. Embodiments 1 and 2 are in a parallel relationship, and Embodiments 3 and 4 are supplements to Embodiments 1 and 2, and are an expanded introduction to certain steps in the overall process.

实施例一Embodiment 1

实施例一介绍了用户通过终端设备进行图片编辑的整体流程。当终端设备具备强大的计算能力时,复杂的图像处理可直接在终端设备的计算模块上完成,当终端设备不具备强大的计算能力时,终端设备可通过网络将复杂计算任务(如复杂的图像处理任务)提交至云端执行并等待云侧服务器返回的任务执行结果。后面各个方法实施例中,将以人机交互模块、计算模块为执行主体来描述技术方案。那么,当终端设备具备强大计算能力时,人机交互模块和计算模块都可以位于终端设备上,二者所执行的步骤即终端设备所执行的步骤,二者之间的通信或数据交互属于设备内通信;而当终端设备不具备强大计算能力时,人机交互模块可位于终端设备上,计算模块可位于服务器上,人机交互模块执行的步骤即终端设备执行的步骤,计算模块执行的步骤即服务器执行的步骤,二者之间的通信或数据交互属于设备间通信。Embodiment 1 introduces the overall process of a user editing a picture through a terminal device. When the terminal device has powerful computing power, complex image processing can be completed directly on the computing module of the terminal device. When the terminal device does not have powerful computing power, the terminal device can submit complex computing tasks (such as complex image processing tasks) to the cloud through the network for execution and wait for the task execution results returned by the cloud server. In the following method embodiments, the technical solution will be described with the human-computer interaction module and the computing module as the execution subject. Then, when the terminal device has powerful computing power, the human-computer interaction module and the computing module can both be located on the terminal device, and the steps executed by the two are the steps executed by the terminal device, and the communication or data interaction between the two belongs to intra-device communication; and when the terminal device does not have powerful computing power, the human-computer interaction module can be located on the terminal device, and the computing module can be located on the server. The steps executed by the human-computer interaction module are the steps executed by the terminal device, and the steps executed by the computing module are the steps executed by the server, and the communication or data interaction between the two belongs to inter-device communication.

如图5所示,实施例一提供的图片编辑方法的总体流程可包括:As shown in FIG5 , the overall process of the picture editing method provided in the first embodiment may include:

S10-S13:打开第一图片。S10-S13: Open the first picture.

具体的:如S10所述,人机交互模块可检测到用户打开第一图片的用户操作。响应于此,如S11-S12所述,人机交互模块可以从存储模块获取第一图片。如S13所述,在获取到第一图片之后,人机交互模块可以显示第一图片。Specifically: as described in S10, the human-computer interaction module may detect a user operation of opening the first picture. In response thereto, as described in S11-S12, the human-computer interaction module may obtain the first picture from the storage module. As described in S13, after obtaining the first picture, the human-computer interaction module may display the first picture.

终端设备上可安装有相册(又称图库)、修图应用程序、绘图设计程序等用于浏览、管理或处理图片的应用程序。用户可通过该应用程序打开第一图片。The terminal device may be installed with an application for browsing, managing or processing pictures, such as a photo album (also called a gallery), a photo editing application, a drawing design program, etc. The user may open the first picture through the application.

本申请实施例中,用户打开第一图片的用户操作可以称为第一用户操作。In the embodiment of the present application, the user operation of opening the first picture can be referred to as the first user operation.

如图6所示,第一用户操作可例如是用户在相册中点击第一图片的缩略图61的操作,响应于该操作,终端设备显示第一图片的原图63。相对于缩略图,原图又可以称为大图。不限于图6所示,打开第一图片的用户操作也可以是点击文件夹中的缩略图的用户操作、点击网页中的缩略图的用户操作、点击聊天界面中的缩略图的用户操作,等等。第一图用户操作还可以是点击第一图片的链接的用户操作、点击第一图片的文件图标的用户操作,等等。As shown in FIG6 , the first user operation may be, for example, an operation of a user clicking on a thumbnail 61 of the first picture in the album, and in response to the operation, the terminal device displays the original picture 63 of the first picture. Relative to the thumbnail, the original picture may also be called a large picture. Not limited to what is shown in FIG6 , the user operation of opening the first picture may also be a user operation of clicking on a thumbnail in a folder, a user operation of clicking on a thumbnail in a web page, a user operation of clicking on a thumbnail in a chat interface, and so on. The first picture user operation may also be a user operation of clicking on a link of the first picture, a user operation of clicking on a file icon of the first picture, and so on.

第一图片可以存储于终端设备本地的存储模块,也可以存储于网络存储模块。当第一图片存储于网络上时,人机交互模块可从网络存储模块下载得到第一图片。The first picture can be stored in a local storage module of the terminal device, or in a network storage module. When the first picture is stored on the network, the human-computer interaction module can download the first picture from the network storage module.

S14-S18:识别编辑区域。S14-S18: Identify the editing area.

具体的,如S14所述,人机交互模块可检测到用户在第一图片中选择编辑区域的操作。响应于此,如S15所述,人机交互模块可以将该用户操作在第一图片上的操作位置传输给计算模块。然后,如S16所述,计算模块可以根据该操作位置确定用户选择的编辑区域是图片中的哪一个区域,并如S17所述将用户选中的编辑区域的轮廓信息告知给人机交互模块。这样,如S18所述,人机交互模块便可以根据用户选中的编辑区域的轮廓信息在第一图片中区别显示出该编辑区域。不限于编辑区域的轮廓信息,计算模块在S17中返回给人机交互模块的信息还可以是编辑区域的二值图、灰度图等,它们和轮廓信息一样,都能在第一图片中唯一指示出编辑区域。本申请实施例将这些信息统称为编辑区域的指示信息。Specifically, as described in S14, the human-computer interaction module can detect the operation of the user selecting the editing area in the first picture. In response to this, as described in S15, the human-computer interaction module can transmit the operation position of the user's operation on the first picture to the calculation module. Then, as described in S16, the calculation module can determine which area in the picture the editing area selected by the user is based on the operation position, and inform the human-computer interaction module of the contour information of the editing area selected by the user as described in S17. In this way, as described in S18, the human-computer interaction module can distinguish and display the editing area in the first picture based on the contour information of the editing area selected by the user. Not limited to the contour information of the editing area, the information returned by the calculation module to the human-computer interaction module in S17 can also be a binary image, a grayscale image, etc. of the editing area, which, like the contour information, can uniquely indicate the editing area in the first picture. In the embodiment of the present application, this information is collectively referred to as the indication information of the editing area.

这里,“区别”是指将编辑区域区别于第一图片中的其他区域,其实现手段可以包括但不限于:高亮编辑区域的轮廓、高亮整个编辑区域,或沿着编辑区域的轮廓显示虚线框等等。Here, "distinguishing" means distinguishing the editing area from other areas in the first image, and its implementation means may include but are not limited to: highlighting the outline of the editing area, highlighting the entire editing area, or displaying a dotted frame along the outline of the editing area, etc.

本申请实施例中,用户在第一图片中选择编辑区域的操作可以称为第二用户操作,用户选中的编辑区域可以称为第一编辑区域。In the embodiment of the present application, the operation of the user selecting the editing area in the first picture can be called the second user operation, and the editing area selected by the user can be called the first editing area.

第二用户操作可以是选择第一图片中某个物体的操作,如点击、长按某个物体的操作,第一编辑区域可以是用户选中的物体所在的图像区域。例如,如图7所示,在第一图片71中,用户点击“天空”这一物体,第一编辑区域便是“天空”的图像区域,响应于此,人机交互模块可以在“天空”的轮廓上显示虚线框,以将“天空”作为第一编辑区域而区别于其他区域。The second user operation may be an operation of selecting an object in the first picture, such as clicking or long pressing an object, and the first editing area may be the image area where the object selected by the user is located. For example, as shown in FIG7 , in the first picture 71 , the user clicks on the object “sky”, and the first editing area is the image area of “sky”. In response to this, the human-computer interaction module may display a dotted frame on the outline of “sky” to distinguish “sky” from other areas as the first editing area.

第一编辑区域具体可以是计算模块根据第二用户操作在第一图片上的作用位置识别出的。具体的,人机交互设备可以将第二用户操作在屏幕上的作用位置发送给计算模块,计算模块根据该作用位置识别出第一编辑区域。其中,第一编辑区域的识别可以基于现有的图像分割技术。图像分割技术可用于识别和划分图像内每个物体实例,并为图像中的每个像素标记相应的物体类别标签。在采用图像分割技术识别和划分出图片内每个物体后,计算模块可以根据第二操作在图片上的作用位置确定出用户选择的是图片中的哪一个物体,并将该物体所在的图像区域确定为第一编辑区域。本申请实施例基于对图片语义内容的理解,实现了根据用户操作预测用户意图选择的编辑区域的范围。这样,可支持用户简单高效的在待编辑图片中选择编辑区域,不需要用户通过套索、抠图、分离图层等一系列复杂操作实现这一目的,不仅降低了使用复杂程度,还提高了编辑效率。The first editing area can be specifically identified by the computing module according to the action position of the second user operation on the first picture. Specifically, the human-computer interaction device can send the action position of the second user operation on the screen to the computing module, and the computing module identifies the first editing area according to the action position. Among them, the identification of the first editing area can be based on the existing image segmentation technology. Image segmentation technology can be used to identify and divide each object instance in the image, and mark the corresponding object category label for each pixel in the image. After using the image segmentation technology to identify and divide each object in the picture, the computing module can determine which object in the picture the user has selected according to the action position of the second operation on the picture, and determine the image area where the object is located as the first editing area. The embodiment of the present application is based on the understanding of the semantic content of the picture, and realizes the prediction of the scope of the editing area selected by the user according to the user operation. In this way, it can support users to simply and efficiently select the editing area in the picture to be edited, without the need for users to achieve this purpose through a series of complex operations such as lasso, cutout, and separation of layers, which not only reduces the complexity of use, but also improves editing efficiency.

一种现有的图像分割技术可例如是分割一切(segment anything model,SAM)技术,它是用于图像分割任务的深度学习模型。SAM使用卷积神经网络(convolutional neural network,CNN)和基于Transformer的架构结合在一起以分层和多尺度的方式处理图像。SAM的工作原理可概述如下:SAM使用预训练的视觉转换器(Vision Transformer,VIT)作为其骨干网络。骨干网络用于从输入图像中提取特征。SAM使用特征金字塔网络(feature pyramid network,FPN)在多个尺度上生成特征映射。FPN是一系列卷积层,它们在不同尺度上运作,以从骨干网络的输出中提取特征。FPN确保SAM可以在不同细节层次上识别物体和边界。SAM使用解码器网络为输入图像生成分割掩模。解码器网络接受FPN的输出并将其上采样到原始图像大小。上采样过程使模型能够生成具有与输入图像相同分辨率的分割掩模。SAM还使用基于Transformer的架构来改进分割结果。Transformer是一种神经网络架构,非常有效地处理序列数据,例如文本或图像。使用基于Transformer的架构通过从输入图像中获取上下文信息来改进分割结果。SAM利用自监督学习从未标记的数据中学习。这涉及在大型未标记图像数据集上训练模型,以学习图像中的常见模式和特征。学习到的特征可以用于改善模型在特定图像分割任务上的性能。SAM可以执行全景分割,这涉及结合实例和语义分割。实例分割涉及识别和划分图像内每个物体实例,而语义分割涉及为图像中的每个像素标记相应的类别标签。全景分割将这两种方法结合起来,以提供对图像更全面的理解。One existing image segmentation technique is, for example, the segment anything model (SAM) technique, which is a deep learning model for image segmentation tasks. SAM uses a convolutional neural network (CNN) combined with a Transformer-based architecture to process images in a hierarchical and multi-scale manner. The working principle of SAM can be summarized as follows: SAM uses a pre-trained Vision Transformer (VIT) as its backbone network. The backbone network is used to extract features from the input image. SAM uses a feature pyramid network (FPN) to generate feature maps at multiple scales. FPN is a series of convolutional layers that operate at different scales to extract features from the output of the backbone network. FPN ensures that SAM can identify objects and boundaries at different levels of detail. SAM uses a decoder network to generate a segmentation mask for the input image. The decoder network takes the output of the FPN and upsamples it to the original image size. The upsampling process enables the model to generate a segmentation mask with the same resolution as the input image. SAM also uses a Transformer-based architecture to improve the segmentation results. Transformer is a neural network architecture that processes sequential data, such as text or images, very effectively. Use Transformer-based architectures to improve segmentation results by obtaining contextual information from the input image. SAM uses self-supervised learning to learn from unlabeled data. This involves training the model on a large dataset of unlabeled images to learn common patterns and features in the images. The learned features can be used to improve the performance of the model on specific image segmentation tasks. SAM can perform panoptic segmentation, which involves combining instance and semantic segmentation. Instance segmentation involves identifying and classifying each object instance within an image, while semantic segmentation involves labeling each pixel in the image with a corresponding class label. Panoptic segmentation combines these two methods to provide a more comprehensive understanding of the image.

另一种现有的图像分割技术可例如是一键分割所有地方一切(segment everything everywhere all at once,SEEM)技术。在模型架构方面,SEEM模型采用了常见的编码器-解码器架构。不仅可以实现图像的分割,而且还支持多模态的输入,通过不同种类的视觉提示语(Visual Prompt)和文本提示语(Text Prompt)实现任意图像任意类别物体的一键分割。其中,视觉提示语(Visual Prompt)可以是图像上用户选择的某一个点、框、随机绘制的涂鸦、遮罩(mask)、另一张图像的某个引用区域;而文本提示语(Text Prompt)可以是用户需要分割的类型(class)或者某个表示任务的语句(sentence)。换言之,对于一张图像,用户可以用手绘的形式点一个点、画一个框、随便涂几下就可以完成对应物体的分割。此外,用户也可以用本文的形式告诉模型要分割的物体名称或者用一句话描述,同样能够完成一键分割。Another existing image segmentation technology is the segment everything everywhere all at once (SEEM) technology. In terms of model architecture, the SEEM model adopts a common encoder-decoder architecture. It can not only achieve image segmentation, but also support multimodal input, and achieve one-click segmentation of any category of objects in any image through different types of visual prompts and text prompts. Among them, the visual prompt can be a point, a box, a randomly drawn graffiti, a mask, or a reference area of another image selected by the user on the image; and the text prompt can be the type (class) that the user needs to segment or a sentence (sentence) that represents the task. In other words, for an image, the user can click a point, draw a box, or paint a few times in the form of hand-drawing to complete the segmentation of the corresponding object. In addition, the user can also tell the model the name of the object to be segmented in the form of this article or describe it in a sentence, which can also complete the one-click segmentation.

不限于直接选择物体,第二用户操作还可以是在第一图片中绘制出编辑区域的用户操作。例如,如图8所示,在第一图片81中,用户绘制出了一个爱心轮廓82,该爱心轮廓内部的图像区域就是用户选择的编辑区域。不限于绘制轮廓,第二用户操作还可以是涂抹出编辑区域的操作等等。The second user operation is not limited to directly selecting an object, and can also be a user operation of drawing an edit area in the first image. For example, as shown in FIG8 , in the first image 81 , the user draws a heart outline 82 , and the image area inside the heart outline is the edit area selected by the user. The second user operation is not limited to drawing an outline, and can also be an operation of painting out an edit area, etc.

S19:接收用户输入的编辑指令。S19: receiving an editing instruction input by the user.

具体的,人机交互模块可检测到用户针对第一编辑区域输入第一编辑指令。Specifically, the human-computer interaction module may detect that the user inputs a first editing instruction for the first editing area.

本申请实施例中,在识别出编辑区域之后,在S19之前,人机交互模块还可以提示用户输入编辑指令。In the embodiment of the present application, after the editing area is identified and before S19, the human-computer interaction module may also prompt the user to input an editing instruction.

提示用户输入编辑指令的方式可包括但不限于下述几种方式:The methods for prompting the user to input editing instructions may include but are not limited to the following methods:

方式1.如图9所示,人机交互模块弹出输入框121,以供用户在该输入框中输入编辑指令,如文本或语音指令。例如,用户可以在输入框中输入以下文本或语音指令:“增加‘翱翔的大雁’”。在该示例的编辑指令中,“增加”是编辑类型,“翱翔的大雁”是编辑参数。图9仅作示例,本申请实施例对输入框121的外观样式、显示位置等不作限制。Method 1. As shown in FIG. 9 , the human-computer interaction module pops up an input box 121 for the user to enter an editing instruction, such as a text or voice instruction, in the input box. For example, the user can enter the following text or voice instruction in the input box: "Add 'flying geese'". In the editing instruction of this example, "add" is the editing type, and "flying geese" is the editing parameter. FIG. 9 is only an example, and the embodiment of the present application does not limit the appearance style, display position, etc. of the input box 121.

当采用方式1提示用户输入编辑指令时,第一编辑指令可以是用户在输入框中输入文本指令或按住语音键输入语音指令。When the user is prompted to input an editing instruction in method 1, the first editing instruction may be a text instruction input by the user in an input box or a voice instruction input by pressing a voice key.

方式2.人机交互模块可显示一个或多个推荐编辑指令,该推荐编辑指令是计算模块根据第一编辑区域的图像特征生成的。Mode 2: The human-computer interaction module may display one or more recommended editing instructions, which are generated by the calculation module according to the image features of the first editing area.

例如,如图10所示,用户选择的编辑区域是天空中的一块心形区域131,针对该编辑区域产生的推荐编辑指令是“推荐新增画面内容‘翱翔的大雁’”,其中“增加”是编辑类型,“翱翔的大雁”是编辑参数。For example, as shown in FIG10 , the editing area selected by the user is a heart-shaped area 131 in the sky, and the recommended editing instruction generated for the editing area is “recommended adding new screen content ‘soaring geese’”, where “add” is the editing type and “soaring geese” is the editing parameter.

又例如,如图11所示,用户选择的编辑区域是天空区域,针对该编辑区域产生的推荐编辑指令是“推荐调整色调:亮度降至82%,饱和度降至75%,色相变为红色”,其中“调整色调”为编辑类型,“亮度降至82%,饱和度降至75%,色相变为红色”为编辑参数。For another example, as shown in FIG11 , the editing area selected by the user is the sky area, and the recommended editing instruction generated for the editing area is “Recommended adjustment of hue: reduce brightness to 82%, saturation to 75%, and hue to red”, where “adjust hue” is the editing type, and “reduce brightness to 82%, saturation to 75%, and hue to red” are the editing parameters.

相较于现有技术中需要用户自主确定编辑类型、编辑参数的方式,本申请实施例基于对原图语义内容的理解,根据用户选择的编辑区域的图像特征为用户推荐编辑类型、编辑参数,为用户提供编辑思路,不仅降低了使用复杂程度,也保证了编辑后图片的内容合理性,避免用户经历大量试错,显著提高了出图效率。Compared with the prior art method that requires users to independently determine the editing type and editing parameters, the embodiment of the present application is based on the understanding of the semantic content of the original image, and recommends editing types and editing parameters to users according to the image features of the editing area selected by the user, thereby providing users with editing ideas. This not only reduces the complexity of use, but also ensures the rationality of the content of the edited image, avoids users from experiencing a lot of trial and error, and significantly improves the efficiency of image output.

第一编辑区域的图像特征可以包括但不限于:蒙层(mask)特征、深度特征、轮廓特征、颜色特征。这些特征可以是利用各类现有技术从第一编辑区域中提取的向量特征。The image features of the first editing area may include, but are not limited to: mask features, depth features, contour features, and color features. These features may be vector features extracted from the first editing area using various existing technologies.

具体的,计算模块可以利用图像分割算法(如SAM、SEEM算法)提取出图片的蒙层特征。蒙层特征可以携带图片中每个像素对应的类别信息。图12示例性示出了一张包括天空、城堡、山、山谷、森林和树的图片的蒙层特征的矩阵存储形式,矩阵的大小与原图大小相同,矩阵的坐标与原图的坐标一一对应,矩阵里的值代表了原图对应的位置上的像素所属的类别,如“0”代表“天空”,“1”代表“森林”等。蒙层特征的存储形式可以多种多样,只要能够反映出每个像素的类别信息即可,本申请实施例对此不作限制。不论何种存储形式,蒙层特征都可以通过换算得到如图13所示的可视化结果。Specifically, the calculation module can use an image segmentation algorithm (such as SAM, SEEM algorithm) to extract the mask features of the image. The mask features can carry the category information corresponding to each pixel in the image. Figure 12 exemplifies the matrix storage form of the mask features of a picture including the sky, castle, mountain, valley, forest and tree. The size of the matrix is the same as the size of the original image, and the coordinates of the matrix correspond one-to-one with the coordinates of the original image. The values in the matrix represent the category to which the pixels at the corresponding position of the original image belong, such as "0" for "sky", "1" for "forest", etc. The storage form of the mask features can be varied, as long as it can reflect the category information of each pixel, and the embodiments of the present application do not limit this. Regardless of the storage form, the mask features can be converted to obtain the visualization results shown in Figure 13.

以图10为例,当用户选中的编辑区域为心形区域131时,计算模块根据蒙层特征可判断出心形区域131中的像素的类别是“天空”,再根据特定推荐规则确定推荐的编辑类型为“增加”,相配套的编辑参数为“翱翔的大雁”。该示例中,该特定推荐规则包括:当第一编辑区域是天空时,编辑类型为“增加”,编辑参数是“翱翔的大雁”。实际应用中,推荐规则可以设置成其他,本申请实施例对此不作限制。Taking Figure 10 as an example, when the editing area selected by the user is the heart-shaped area 131, the calculation module can determine that the category of the pixels in the heart-shaped area 131 is "sky" based on the mask characteristics, and then determine the recommended editing type as "add" and the corresponding editing parameter as "flying geese" based on the specific recommendation rule. In this example, the specific recommendation rule includes: when the first editing area is the sky, the editing type is "add" and the editing parameter is "flying geese". In actual applications, the recommendation rules can be set to other, and the embodiments of the present application do not limit this.

计算模块还可以利用深度估计算法提取出深度特征、利用边缘检测和轮廓提取算法提取出轮廓特征、利用主色提取算法(如Kmeans聚类算法)提取出颜色特征,等等。还是以那张包括天空、城堡、山、山谷、森林和树的图片为例,深度特征的可视化形式可如图14所示,轮廓特征的可视化形式可如图15所示,颜色特征的可视化形式可如图16所示。图16可视化出了图片的颜色矩,并通过二阶矩形式直观描述了图片中颜色分布。深度特征、轮廓特征、颜色特征这3种特征的存储形式也可以是矩阵形式,类似图12,只是矩阵中的数值代表了不同的含义,深度特征的矩阵值代表了每个像素的深度值、轮廓特征的矩阵值代表了每个元素是否为轮廓、颜色特征的矩阵值代表了每个元素的颜色。The calculation module can also extract depth features using a depth estimation algorithm, extract contour features using an edge detection and contour extraction algorithm, extract color features using a primary color extraction algorithm (such as a Kmeans clustering algorithm), and so on. Taking the picture of the sky, castle, mountain, valley, forest, and tree as an example, the visualization form of the depth feature can be shown in FIG14, the visualization form of the contour feature can be shown in FIG15, and the visualization form of the color feature can be shown in FIG16. FIG16 visualizes the color moment of the picture and intuitively describes the color distribution in the picture through a second-order rectangular form. The storage form of the three features of depth feature, contour feature, and color feature can also be in a matrix form, similar to FIG12, except that the values in the matrix represent different meanings. The matrix value of the depth feature represents the depth value of each pixel, the matrix value of the contour feature represents whether each element is a contour, and the matrix value of the color feature represents the color of each element.

在提取出第一编辑区域的蒙层(mask)特征、深度特征、轮廓特征、颜色特征等各种特征之后,计算模块可以使用加权求和等技术,将这些特征融合,得到融合特征向量,并将融合特征向量作为特定人工智能算法的输入之一。另外,计算模块还可以将一些编辑类型映射成数字(如“增加”映射为1,“删除”映射成2)后作为该特定人工智能算法的输入之二。最终,计算模块可经过该特定人工智能算法的计算,得到各种编辑类型对应的编辑参数。对于某种编辑类型,若人工智能算法无法输出它对应的编辑参数或者输出的相应编辑参数的置信度过低(如低于60%),则可确定该编辑类型不适合编辑区域,不进行推荐。这里,该特定人工智能算法可以是经过大量样本训练后的可信模型。它的训练样本集中,每一个训练样本可以包括输入样本和输出样本,其中,输入样本包括图像区域的融合特征向量、编辑类型的映射ID,输出样本包括对该图像区域应用该编辑类型时采用的编辑参数。训练样本越合理,样本集越大,训练得到的模型的可靠性越高。After extracting various features such as mask features, depth features, contour features, and color features of the first editing area, the calculation module can use weighted summation and other techniques to fuse these features to obtain a fused feature vector, and use the fused feature vector as one of the inputs of the specific artificial intelligence algorithm. In addition, the calculation module can also map some editing types into numbers (such as "add" is mapped to 1, and "delete" is mapped to 2) as the second input of the specific artificial intelligence algorithm. Finally, the calculation module can obtain the editing parameters corresponding to various editing types through the calculation of the specific artificial intelligence algorithm. For a certain editing type, if the artificial intelligence algorithm cannot output the editing parameters corresponding to it or the confidence level of the output corresponding editing parameters is too low (such as less than 60%), it can be determined that the editing type is not suitable for the editing area and is not recommended. Here, the specific artificial intelligence algorithm can be a credible model trained with a large number of samples. In its training sample set, each training sample can include an input sample and an output sample, wherein the input sample includes the fused feature vector of the image area and the mapping ID of the editing type, and the output sample includes the editing parameters used when the editing type is applied to the image area. The more reasonable the training sample, the larger the sample set, and the higher the reliability of the trained model.

在一些实施例中,计算模块也可以基于编辑区域的图像特征和预设编辑类型生成编辑问题,该编辑问题的回答就是搭配编辑类型的编辑参数该如何,然后输入到chatGPT等大模型以获得答案。例如,基于编辑区域“天空”和编辑类型“增加”生成的编辑问题是“往天空中增加什么?”chatGPT模型可输出答案,如:“往天空中增加‘云朵’”,其中‘云朵’便是该大模型对问题的回答。该例子仅用于解释本申请实施例,实际应用中对chatGPT提出的问题可以更加复杂,携带更多细节,如增加额外的画面描述。In some embodiments, the computing module can also generate editing questions based on the image features of the editing area and the preset editing type. The answer to the editing question is how to match the editing parameters of the editing type, and then input it into a large model such as chatGPT to obtain the answer. For example, the editing question generated based on the editing area "sky" and the editing type "add" is "What to add to the sky?" The chatGPT model can output an answer, such as: "Add 'clouds' to the sky", where 'clouds' is the answer to the question by the large model. This example is only used to explain the embodiments of the present application. In actual applications, the questions posed to chatGPT can be more complex and carry more details, such as adding additional picture descriptions.

在一些实施例中,计算模块还可以遍历推荐池中的编辑指令,并比对所遍历的编辑指令对应的特征向量与编辑区域的融合特征向量,具体可参考实施例三中的S52,以找到能够合理搭配编辑区域的编辑指令,最后推荐编辑指令。In some embodiments, the computing module may further traverse the editing instructions in the recommendation pool, and compare the feature vectors corresponding to the traversed editing instructions with the fused feature vectors of the editing area. For details, please refer to S52 in Example 3, to find the editing instructions that can be reasonably matched with the editing area, and finally recommend the editing instructions.

当采用方式2提示用户输入编辑指令时,用户输入的第一编辑指令可以选自计算模块产生的推荐编辑指令,例如用户点击某个推荐编辑指令以确认将该推荐编辑指令作为第一编辑指令输入,又例如用户拖动某个推荐编辑指令到第一编辑区域以确认将该推荐编辑指令作为第一编辑指令输入。示例仅用于解释本申请实施例,实际应用中还可以不同,本申请实施例对用户输入第一编辑指令的方式不作限制。When the user is prompted to input an editing instruction using method 2, the first editing instruction input by the user may be selected from the recommended editing instructions generated by the calculation module, for example, the user clicks on a recommended editing instruction to confirm inputting the recommended editing instruction as the first editing instruction, or the user drags a recommended editing instruction to the first editing area to confirm inputting the recommended editing instruction as the first editing instruction. The example is only used to explain the embodiments of the present application, and may be different in actual applications. The embodiments of the present application do not limit the manner in which the user inputs the first editing instruction.

方式3.人机交互模块可显示一个或多个预置编辑指令,如常用编辑指令或者用户提前保存的编辑指令。Mode 3: The human-computer interaction module may display one or more preset editing instructions, such as commonly used editing instructions or editing instructions saved in advance by the user.

当采用方式3提示用户输入编辑指令时,用户输入的第一编辑指令可以选自人机交互模块显示的预置编辑指令,例如用户点击某个预置编辑指令以确认将该推荐编辑指令作为第一编辑指令输入,又例如用户拖动某个预置编辑指令到第一编辑区域以确认将该推荐编辑指令作为第一编辑指令输入。示例仅用于解释本申请实施例,实际应用中还可以不同,本申请实施例对用户输入第一编辑指令的方式不作限制。When the user is prompted to enter an editing instruction using method 3, the first editing instruction entered by the user may be selected from preset editing instructions displayed by the human-computer interaction module, for example, the user clicks on a preset editing instruction to confirm that the recommended editing instruction is entered as the first editing instruction, or the user drags a preset editing instruction to the first editing area to confirm that the recommended editing instruction is entered as the first editing instruction. The examples are only used to explain the embodiments of the present application, and may be different in actual applications. The embodiments of the present application do not limit the manner in which the user enters the first editing instruction.

本申请实施例中,上面介绍的提示用户输入编辑指令的几种方式还可以结合实施,例如人机交互模块既如方式1弹出输入框提示用户输入编辑指令,又如方式2显示推荐编辑指令,还能如方式3显示阈值编辑。用户可以在界面上同时看到多种方式的提示,并选择按照某种提示进行编辑指令的输入。In the embodiment of the present application, the above-described methods of prompting the user to input the editing instruction can also be implemented in combination, for example, the human-computer interaction module pops up an input box to prompt the user to input the editing instruction as in method 1, displays the recommended editing instruction as in method 2, and displays the threshold editing as in method 3. The user can see prompts of multiple methods on the interface at the same time, and choose to input the editing instruction according to a certain prompt.

S20-S23:编辑图片。S20-S23: Edit pictures.

响应于用户针对第一编辑区域输入第一编辑指令,如S20所述,人机交互模块可以向计算模块发送第一编辑指令,然后,如S21所述,计算模块可以根据第一编辑指令对第一图片进行相应的图像编辑处理,并如S22所述,向人机交互模块返回图像编辑处理后的第一图片。这样,如S23所述,人机交互模块可以展示图像编辑处理后的第一图片。In response to the user inputting the first editing instruction for the first editing area, as described in S20, the human-computer interaction module can send the first editing instruction to the calculation module, and then, as described in S21, the calculation module can perform corresponding image editing processing on the first picture according to the first editing instruction, and return the first picture after the image editing processing to the human-computer interaction module as described in S22. In this way, as described in S23, the human-computer interaction module can display the first picture after the image editing processing.

相较于图像编辑处理前,图像编辑处理后的第一编辑区域的图像被改变了,该改变由第一编辑指令决定。编辑指令可以包括以下内容:编辑类型及其对应的编辑参数。其中,编辑类型可例如删除、拖动、替换、增加、颜色调节等,相应的编辑参数可例如可替换的内容、拖动的目标位置、新增的内容、颜色调节数值等。Compared with the image before the image editing process, the image of the first editing area after the image editing process is changed, and the change is determined by the first editing instruction. The editing instruction may include the following content: the editing type and its corresponding editing parameters. Among them, the editing type may be, for example, deletion, dragging, replacement, addition, color adjustment, etc., and the corresponding editing parameters may be, for example, replaceable content, drag target position, newly added content, color adjustment value, etc.

例如,若用户选中的编辑区域是“天空”,第一编辑指令的是“删除‘乌云’”,则相较于图像编辑处理前,图像编辑处理后的第一图片中的“天空”中的“乌云”被删除了。又例如,若用户选中的编辑区域是“天空”,第一编辑指令的是“增加‘翱翔的大雁’”,则相较于图像编辑处理前,图像编辑处理后的第一图片中的“天空”中的增加了“翱翔的大雁”。For example, if the editing area selected by the user is "sky" and the first editing instruction is "delete 'dark clouds'", then compared with the image before editing, the "dark clouds" in the "sky" of the first picture after image editing are deleted. For another example, if the editing area selected by the user is "sky" and the first editing instruction is "add 'flying geese'", then compared with the image before editing, the "flying geese" in the "sky" of the first picture after image editing are added.

假设第一图片为图7所示图片71、图8所示图片81,图17示例性示出了第一编辑指令为“新增‘翱翔的大雁’”时编辑处理后第一图片,图18示例性示出了第一编辑指令为“调整色调:亮度降至82%,饱和度降至75%,色相变为红色”时编辑处理后的第一图片。Assuming that the first picture is Picture 71 shown in Figure 7 and Picture 81 shown in Figure 8, Figure 17 exemplifies the first picture after editing when the first editing instruction is "add 'soaring geese'", and Figure 18 exemplifies the first picture after editing when the first editing instruction is "adjust the color tone: reduce the brightness to 82%, the saturation to 75%, and the hue to red".

当第一编辑指令中的编辑类型是“增加”时,计算模块还可以根据第一编辑指令中的编辑参数,如“翱翔的大雁”,获得要增加到第一图片中的图像内容,并将该图像内容添加到第一图片中以完成第一编辑指令(如“增加‘翱翔的大雁’”)对应的图像编辑处理。该图像内容可以承载于素材图片中,该素材图片可以来自网络、终端设备,或者该素材图片可以是计算模块利用人工智能生成的。图19示例性示出了根据编辑指令“增加‘翱翔的大雁’”的编辑参数(“翱翔的大雁”)获得的素材图片。When the editing type in the first editing instruction is "add", the computing module can also obtain the image content to be added to the first picture according to the editing parameters in the first editing instruction, such as "soaring geese", and add the image content to the first picture to complete the image editing process corresponding to the first editing instruction (such as "add 'soaring geese'"). The image content can be carried in a material picture, which can come from a network, a terminal device, or the material picture can be generated by the computing module using artificial intelligence. Figure 19 exemplifies the material picture obtained according to the editing parameters ("soaring geese") of the editing instruction "add 'soaring geese'".

实施例二Embodiment 2

实施例二是实施例一的替代方案,也介绍了图片编辑方法的总体流程,但在实施例二中,编辑区域的确定可以基于图片的预处理信息,而不再基于图像分割技术,不需要重复在线计算。Embodiment 2 is an alternative to Embodiment 1 and also introduces the overall process of the image editing method. However, in Embodiment 2, the editing area can be determined based on the preprocessing information of the image instead of image segmentation technology, and repeated online calculations are not required.

如图20所示,实施例二提供的图片编辑方法的总体流程可包括:As shown in FIG. 20 , the overall process of the picture editing method provided in the second embodiment may include:

S30-S33:打开第一图片。S30-S33: Open the first picture.

具体的,可以参考实施例一中的S10-S13,此处不再赘述。For details, please refer to S10-S13 in the first embodiment, which will not be described in detail here.

S34-S39:识别编辑区域。S34-S39: Identify the editing area.

具体的,如S34所述,人机交互模块可检测到用户在第一图片中选择编辑区域的操作,例如点击图片中某一个物体。响应于此,如S35所述,人机交互模块可触发计算模块获取第一图片的预处理信息;另外,如S36所述,人机交互模块还可以将该用户操作在第一图片上的操作位置传输给计算模块。这样,如S37所述,计算模块便可以根据该预处理信息,或者再进一步结合该操作位置,确定用户选择的编辑区域是图片中的哪一个区域,并如S38所述将用户选中的编辑区域的指示信息(如轮廓信息、二值图、灰度图)告知给人机交互模块。然后,如S39所述,人机交互模块便可以根据用户选中的编辑区域的轮廓信息在第一图片中区别显示出该编辑区域。Specifically, as described in S34, the human-computer interaction module can detect the operation of the user selecting the editing area in the first picture, such as clicking on an object in the picture. In response to this, as described in S35, the human-computer interaction module can trigger the calculation module to obtain the preprocessing information of the first picture; in addition, as described in S36, the human-computer interaction module can also transmit the operation position of the user's operation on the first picture to the calculation module. In this way, as described in S37, the calculation module can determine which area of the picture the editing area selected by the user is based on the preprocessing information, or further combined with the operation position, and inform the human-computer interaction module of the indication information (such as contour information, binary image, grayscale image) of the editing area selected by the user as described in S38. Then, as described in S39, the human-computer interaction module can distinguish and display the editing area in the first picture according to the contour information of the editing area selected by the user.

关于“识别编辑区域”的一些实现细节,还可以参考实施例一中的S14-S18及其相关说明。例如,用户在第一图片中选择编辑区域的操作可以是点击、长按第一图片中的某个物体的操作,如图7所示,在第一图片71中,用户通过点击“天空”这一物体来选择“天空”作为编辑区域。又例如,在第一图片中区别显示出编辑区域的方式可以包括但不限于:高亮编辑区域的轮廓、高亮整个编辑区域,或沿着编辑区域的轮廓显示虚线框等等。For some implementation details of "identifying the editing area", please refer to S14-S18 in Example 1 and its related instructions. For example, the user's operation of selecting the editing area in the first picture can be an operation of clicking or long pressing an object in the first picture. As shown in Figure 7, in the first picture 71, the user selects "sky" as the editing area by clicking the object "sky". For another example, the way to distinguish and display the editing area in the first picture may include but is not limited to: highlighting the outline of the editing area, highlighting the entire editing area, or displaying a dotted box along the outline of the editing area, etc.

实施例二中,预处理信息可用于指示一个或多个区域。这里,一个或多个区域是指第一图片中的图像区域,第一图片中的图像区域可以物体为单位进行划分,一个物体的全部像素点构成一个图像区域。编辑区域可以从这一个或多个区域中确定出。这一个或多个区域可以是用户偏好编辑的区域,或者推荐编辑的区域,或者被允许编辑的区域,等等。编辑区域可以基于预处理信息而被快速确定出,而无需对第一图片进行图像分割处理。In the second embodiment, the preprocessing information may be used to indicate one or more regions. Here, the one or more regions refer to image regions in the first picture, and the image regions in the first picture may be divided into units of objects, and all pixels of an object constitute an image region. The editing region may be determined from the one or more regions. The one or more regions may be regions that the user prefers to edit, or regions that are recommended to edit, or regions that are allowed to edit, and so on. The editing region may be quickly determined based on the preprocessing information without performing image segmentation processing on the first picture.

在人机交互模块检测到用户在第一图片中选择编辑区域的操作之后,计算模块可以先读取预处理信息:After the human-computer interaction module detects the user's operation of selecting the editing area in the first picture, the calculation module may first read the preprocessing information:

1.若预处理信息仅包括一个区域的指示信息,则将该区域确定为编辑区域。1. If the pre-processing information only includes indication information of one area, the area is determined as the editing area.

预处理信息可以仅包含一个区域的轮廓点的坐标。如图21所示,预处理信息可采用Json格式的数组,一个数组表示一个区域的轮廓,该数组中的每个元素是一个元组,元组的第一个值代表了该区域的一个轮廓点的x坐标值,元组的第二个值代表了该轮廓点的y坐标值。因此,根据该数组记录的轮廓便能确定该轮廓所包围的区域,该区域可被确定为编辑区域。The preprocessing information may only contain the coordinates of the contour points of a region. As shown in FIG21 , the preprocessing information may be in the form of an array in Json format, where an array represents the contour of a region, and each element in the array is a tuple, where the first value of the tuple represents the x-coordinate value of a contour point of the region, and the second value of the tuple represents the y-coordinate value of the contour point. Therefore, the region enclosed by the contour recorded in the array can be determined, and the region can be determined as the editing region.

不限于Jason格式,预处理信息也可以采用其他数据格式。The preprocessing information is not limited to the Jason format and may also adopt other data formats.

不限于区域的轮廓点的坐标,预处理信息中的数据内容还可以是二值图、灰度图等等。Not limited to the coordinates of the contour points of the region, the data content in the preprocessing information can also be a binary image, a grayscale image, etc.

预处理信息也可以是仅包含一个区域的二值图。在该二值图中,仅一个区域的取值为第一值。该二值图可如图22示例性所示,其中,一个元素对应一个像素,元素的坐标可表示像素的位置,元素的取值可表示像素的像素类别,且被二值化,如“1”可表示该位置上的像素属于“天空”,“0”可表示该位置上的像素不属于“天空”。基于图22所示的二值图,取值为“1”的区域(“天空”区域,第一值为“1”)或取值为“0”的区域(非“天空”区域,第一值为“0”)可以被确定为编辑区域。The preprocessing information may also be a binary image containing only one region. In the binary image, only one region has a first value. The binary image may be exemplified as shown in FIG22, wherein one element corresponds to one pixel, the coordinates of the element may represent the position of the pixel, the value of the element may represent the pixel category of the pixel, and is binarized, such as "1" may represent that the pixel at this position belongs to the "sky", and "0" may represent that the pixel at this position does not belong to the "sky". Based on the binary image shown in FIG22, an area with a value of "1" ("sky" area, the first value is "1") or an area with a value of "0" (non-"sky" area, the first value is "0") may be determined as an editing area.

预处理信息可以是灰度图,该灰度图中可仅有一个区域的灰度值为特定灰度值或处于特定灰度范围,因此,根据该灰度图便可以将该仅有的一个区域确定为编辑区域。The preprocessing information may be a grayscale image, in which only one region may have a grayscale value of a specific grayscale value or be within a specific grayscale range. Therefore, the only region may be determined as the editing region based on the grayscale image.

2.若预处理信息包括多个区域的指示信息,则可进一步结合第二用户操作在第一图片上的作用位置确定编辑区域。具体的,可判断该作用位置处于图片中的哪个区域,或者距离哪个区域更近,并将区域确定为编辑区域。换句话说,在预处理信息指示的多个区域中,可将该作用位置所处的区域确定为编辑区域,或者将这多个区域中距离该作用位置最近的区域确定为用户选中的区域。并且,该用户选中的区域可被确定为编辑区域。2. If the preprocessing information includes indication information of multiple areas, the editing area can be further determined in combination with the action position of the second user operation on the first picture. Specifically, it can be determined which area of the picture the action position is located in, or which area is closer to, and the area is determined as the editing area. In other words, among the multiple areas indicated by the preprocessing information, the area where the action position is located can be determined as the editing area, or the area closest to the action position among the multiple areas can be determined as the area selected by the user. In addition, the area selected by the user can be determined as the editing area.

进一步的,在确定出用户选中的区域之后,还可以判断余下的区域中是否有与被选中的区域重叠较多的区域,若有,则将被选中的区域以及与其重叠较多的区域合并为一个区域,并最终确定合并区域为编辑区域。在一些实施例中,可采用非极大值抑制(non-maximum suppression,NMS)算法合并多个重叠区域。本申请实施例中,重叠较多可以是指,重叠部分的面积超过特定值,如10个像素单位(px)。Furthermore, after determining the area selected by the user, it is also possible to determine whether there are areas in the remaining areas that overlap more with the selected area. If so, the selected area and the area that overlaps more with it are merged into one area, and the merged area is finally determined to be the editing area. In some embodiments, a non-maximum suppression (NMS) algorithm may be used to merge multiple overlapping areas. In the embodiment of the present application, more overlap may mean that the area of the overlapping part exceeds a specific value, such as 10 pixels (px).

为了直接指示出多个区域,预处理信息可以包括多个区域各自的轮廓点的坐标,其中每一个区域的轮廓的坐标可表达成一个数组。预处理信息也可以是包括多个区域的二值图,该二值图中有多个区域的取值为第一值(如“1”)。例如,“1”表示该位置上的像素属于“花朵”,“0”表示该位置上的像素不属于“花朵”。预处理信息还可以是灰度图,该灰度图中有多个区域的灰度值为特定灰度值或处于特定灰度范围,例如存在多个区域的灰度值处于0-50这个灰度范围。In order to directly indicate multiple regions, the preprocessing information may include the coordinates of the contour points of the multiple regions, wherein the coordinates of the contour of each region may be expressed as an array. The preprocessing information may also be a binary image including multiple regions, wherein the values of multiple regions in the binary image are the first value (such as "1"). For example, "1" indicates that the pixel at this position belongs to "flower", and "0" indicates that the pixel at this position does not belong to "flower". The preprocessing information may also be a grayscale image, wherein the grayscale values of multiple regions in the grayscale image are specific grayscale values or are in a specific grayscale range, for example, there are multiple regions whose grayscale values are in the grayscale range of 0-50.

不限于区域的轮廓点的坐标、二值图、灰度图,可用于直接指示出多个区域的预处理信息还可以有其他形式的数据内容,本申请实施例对此不作限制。The preprocessing information is not limited to the coordinates, binary images, and grayscale images of the contour points of the region and can be used to directly indicate multiple regions. The preprocessing information may also have other forms of data content, which is not limited in the embodiments of the present application.

3.第一图片的预处理信息中也可以不直接指示一个或多个区域,而包含其他数据,例如多图层信息、或轮廓信息、或深度信息。但计算模块可以利用该其他数据确定出一个或多个区域,然后再基于前述1或2中介绍的方法从这一个或多个区域中确定出编辑区域。也即,第一图片的预处理信息可间接指示一个或多个区域。3. The preprocessing information of the first image may not directly indicate one or more regions, but may include other data, such as multi-layer information, contour information, or depth information. However, the calculation module may use the other data to determine one or more regions, and then determine the editing region from the one or more regions based on the method described in 1 or 2 above. That is, the preprocessing information of the first image may indirectly indicate one or more regions.

确定预处理信息间接指示的一个或多个区域的方法可包括但不限于:Methods for determining one or more regions indirectly indicated by the pre-processing information may include, but are not limited to:

3.1若预处理信息包含了多个图层的图层信息,每个图层的图层信息可包括该图层中不透明像素的坐标,则可以针对该图层信息中每个图层,将图层中连成一片的不透明像素作为一个区域。进一步的,还可以将重叠较多的图层区域合并成一个区域,例如采用NMS算法合并多个重叠图层区域,并利用合并后的一个或多个区域确定编辑区域。3.1 If the preprocessing information includes the layer information of multiple layers, the layer information of each layer may include the coordinates of the opaque pixels in the layer, then for each layer in the layer information, the opaque pixels in the layer that are connected together can be regarded as a region. Furthermore, the layer regions with more overlaps can be merged into one region, for example, multiple overlapping layer regions are merged using the NMS algorithm, and the edited region is determined using the merged one or more regions.

3.2若预处理信息包含了轮廓信息(如图18),该轮廓信息可记录轮廓上的像素,则可以利用膨胀与腐蚀等技术过滤掉该轮廓信息中的无用的内部轮廓,并基于图论等技术获得该轮廓信息完整闭合的一个或多个区域。其中,膨胀与腐蚀可用于消除噪声、分割独立的图像元素,以及连接相邻的元素;图论等技术可用于连通域分析,连通域分析是指在图像中寻找出彼此互相独立的连通域并将其标记出来,图像的连通域是指图像中具有相同像素值并且位置相邻的像素组成的区域。一般情况下,一个连通域内只包含一个像素值,因此为了防止像素值波动对提取不同连通域的影响,连通域分析常处理的是二值化后的图像。3.2 If the preprocessing information contains contour information (such as Figure 18), the contour information can record the pixels on the contour, then the useless internal contours in the contour information can be filtered out using dilation and erosion techniques, and one or more regions of the contour information that are completely closed can be obtained based on graph theory and other techniques. Among them, dilation and erosion can be used to eliminate noise, segment independent image elements, and connect adjacent elements; graph theory and other techniques can be used for connected domain analysis, which refers to finding independent connected domains in an image and marking them out. The connected domain of an image refers to the region composed of pixels with the same pixel value and adjacent positions in the image. Generally, a connected domain contains only one pixel value. Therefore, in order to prevent the influence of pixel value fluctuations on the extraction of different connected domains, connected domain analysis often processes binarized images.

还是以那张包括天空、城堡、山、山谷、森林和树的图片为例,图23示例性示出了基于该图片的轮廓信息获得的多个区域。Still taking the picture including the sky, castle, mountain, valley, forest and tree as an example, FIG23 exemplarily shows a plurality of regions obtained based on the contour information of the picture.

3.3若预处理信息包含了深度信息(如图19),该深度信息可记录多个像素的深度值,则可以根据该深度信息中每个像素的深度值将像素们划分到不同的区域。具体的,可以将深度值相同或相近的像素划分到同一区域。3.3 If the preprocessing information includes depth information (as shown in FIG. 19 ), the depth information may record the depth values of multiple pixels, and the pixels may be divided into different regions according to the depth value of each pixel in the depth information. Specifically, pixels with the same or similar depth values may be divided into the same region.

还是以那张包括天空、城堡、山、山谷、森林和树的图片为例,图24示例性示出了基于该图片的深度信息获得的多个区域,其中,“城堡”区域241可以由深度值相同或相近的一些像素点构成,“塔”区域242可以由深度值相同或相近的一些像素点构成。Still taking the picture including the sky, castle, mountain, valley, forest and tree as an example, FIG24 exemplarily shows a plurality of areas obtained based on the depth information of the picture, wherein the "castle" area 241 may be composed of some pixel points with the same or similar depth values, and the "tower" area 242 may be composed of some pixel points with the same or similar depth values.

本申请实施例中,深度值相近可以是指深度值之差不超过特定值,如0.1。在一张图片中,各个像素的深度值可以表达在0到1这个范围,其中,0表示距离第一图片的拍摄相机最远像素的深度值,1表示第一图片的拍摄相机最近的像素的深度值。不限于此,深度值也可以通过其他方式表示,本申请实施例不作限制。In the embodiment of the present application, similar depth values may mean that the difference in depth values does not exceed a specific value, such as 0.1. In a picture, the depth value of each pixel can be expressed in the range of 0 to 1, where 0 represents the depth value of the pixel farthest from the camera that took the first picture, and 1 represents the depth value of the pixel closest to the camera that took the first picture. Without limitation to this, the depth value may also be expressed in other ways, which are not limited in the embodiment of the present application.

预处理信息可以同时包含上述3.1至3.3中介绍的多种内容,例如预处理信息同时包含图层信息、深度信息和轮廓信息,基于这三种内容分别确定出预处理信息指示的区域之后,可以彼此校准,以提供识别的准确性。如,若图层信息和深度信息都指示了某个相同区域,则该区域的识别往往是准确的;反之,若图层信息和深度信息指示的区域出现冲突,则说明基于深度信息或图层信息的区域识别不准确,可以优化算法之后重新进行识别。The preprocessing information may include multiple contents introduced in 3.1 to 3.3 above at the same time. For example, the preprocessing information may include layer information, depth information, and contour information at the same time. After the areas indicated by the preprocessing information are determined based on these three contents, they can be calibrated with each other to improve the accuracy of recognition. For example, if both the layer information and the depth information indicate the same area, the recognition of the area is often accurate; conversely, if there is a conflict between the areas indicated by the layer information and the depth information, it means that the area recognition based on the depth information or the layer information is inaccurate, and the algorithm can be optimized and re-recognized.

可见,实施例二可以实现从预处理信息直接指示或间接指示的一个或多个区域中确定出编辑区域,而无需先采用图像分割处理以识别和划分出第一图片内每个物体所在的图像区域再根据用户操作位置识别出编辑区域,因而可更加快速预测用户想要编辑的区域,可避免图像分割计算。It can be seen that embodiment 2 can determine the editing area from one or more areas directly or indirectly indicated by the pre-processing information, without first using image segmentation processing to identify and divide the image area where each object in the first picture is located and then identify the editing area according to the user's operation position. Therefore, the area that the user wants to edit can be predicted more quickly, and image segmentation calculations can be avoided.

S40:接收用户输入的编辑指令。S40: receiving an editing instruction input by a user.

具体的,可以参考实施例一中的S19,此处不再赘述。For details, please refer to S19 in Example 1, which will not be described in detail here.

S41-S44:编辑图片。S41-S44: Edit pictures.

具体的,可以参考实施例一中的S20-S23,此处不再赘述。For details, please refer to S20-S23 in Example 1, which will not be described in detail here.

实施例三Embodiment 3

实施例三是对实施例一、二的补充细化,可在执行“编辑图片”步骤之前,增加判断用户输入的编辑指令与用户选中的编辑区域是否搭配合理这一步骤(见图25中的S41b),并且当用户选中的编辑区域和输入的编辑指令不搭配时,不仅会提示用户编辑指令无法执行,同时还可以推荐用户选择新的编辑区域,或者帮助用户修改编辑指令。实施例三解决了用户随意输入编辑指令所导致的编辑效果不符合常识逻辑的问题,从而可以减少用户编辑图片时的试错次数。Embodiment 3 is a supplementary refinement of Embodiments 1 and 2. Before executing the "Edit Picture" step, a step of judging whether the editing instruction input by the user and the editing area selected by the user are reasonably matched (see S41b in FIG. 25) can be added. When the editing area selected by the user and the input editing instruction do not match, not only will the user be prompted that the editing instruction cannot be executed, but the user can also be recommended to select a new editing area, or the user can be helped to modify the editing instruction. Embodiment 3 solves the problem that the editing effect caused by the user's arbitrary input of editing instructions does not conform to common sense logic, thereby reducing the number of trial and error times when the user edits the picture.

具体的,如图26所示,判断第一编辑指令与编辑区域是否搭配合理的具体实现可包括如下流程:Specifically, as shown in FIG26 , the specific implementation of determining whether the first editing instruction and the editing area are reasonably matched may include the following process:

S50.计算得到第一编辑指令中的编辑参数、编辑类型的特征向量。S50. Calculate and obtain the feature vector of the editing parameters and editing type in the first editing instruction.

第一编辑指令中的编辑类型可以是有限的、可枚举的。不同的编辑类型可以映射成不同的标识号码(ID),因而可被相应的ID表示。计算模块可使用深度学习算法等模型转换编辑类型对应的ID,以得到编辑类型对应的特征向量。例如,“增加”对应的ID为“001”,它被转换成以下特征向量:[0.5250,0.7937,0.1356,1.4893,-3.9651,1.5068]。The editing types in the first editing instruction may be finite and enumerable. Different editing types may be mapped to different identification numbers (IDs) and may thus be represented by corresponding IDs. The computing module may convert the IDs corresponding to the editing types using models such as deep learning algorithms to obtain feature vectors corresponding to the editing types. For example, the ID corresponding to "increase" is "001", which is converted into the following feature vector: [0.5250, 0.7937, 0.1356, 1.4893, -3.9651, 1.5068].

第一编辑指令中的编辑参数(如“静谧的湖泊”)也可以先被映射成ID,再被转换成特征向量。不同的是,编辑参数的文字内容是不可预测的,而不便于被直接映射成ID。本实施例中,计算模块可以对表示编辑参数的短语进行分词处理,然后从预设词表中查找到每个分词结果各自对应的ID。例如,“静谧的湖泊”这一编辑参数可被分词为:[“静”、“谧”、“的”、“湖”、“泊”],分词结果对应的ID数组为:[40496,3152,2099,8024,3563,8024,40497,0,0,0]。其中,40496、40497分别表示编辑参数的描述短语的起始、终止,0表示补充。该ID数组的长度,即其包含的元素个数,可以是预设的,用于约束编辑参数的描述短语的最大长度。然后,计算模块可以通过深度学习算法等模型为ID数组中的每一个ID生成对应的特征向量。例如,前一个例子中的ID数组可被转换成以下特征向量数组:[[0.8838,0.1570,0.5249,...,0.4278,0.1725,0.4225],[1.8143,-0.5514,0.0995,...,-4.7141,-1.3811,-1.1166],[0.0186,3.5949,1.1780,...,1.1433,2.7235,-0.5069],...,[1.1674,0.9497,1.8264,...,1.3671,0.5551,-0.4302],[1.1674,0.9497,1.8264,...,1.3671,0.5551,-0.4302]],其中,[0.8838,0.1570,0.5249,...,0.4278,0.1725,0.4225]表示40496的特征向量,[1.8143,-0.5514,0.0995,...,-4.7141,-1.3811,-1.1166]表示3152的特征向量,[0.0186,3.5949,1.1780,...,1.1433,2.7235,-0.5069]表示2099的特征向量,......,[1.1674,0.9497,1.8264,...,1.3671,0.5551,-0.4302]表示0的特征向量。该特征向量是一个二维数组,其中每一个元素均是一个ID对应的特征向量。The editing parameters in the first editing instruction (such as "quiet lake") can also be mapped to an ID first and then converted into a feature vector. The difference is that the text content of the editing parameters is unpredictable and not easy to be directly mapped to an ID. In this embodiment, the calculation module can perform word segmentation on the phrase representing the editing parameter, and then find the ID corresponding to each word segmentation result from the preset word list. For example, the editing parameter "quiet lake" can be segmented into: ["quiet", "quiet", "of", "lake", "park"], and the ID array corresponding to the word segmentation result is: [40496,3152,2099,8024,3563,8024,40497,0,0,0]. Among them, 40496 and 40497 respectively represent the start and end of the description phrase of the editing parameter, and 0 represents supplement. The length of the ID array, that is, the number of elements it contains, can be preset to constrain the maximum length of the description phrase of the editing parameter. Then, the calculation module can generate a corresponding feature vector for each ID in the ID array through a model such as a deep learning algorithm. For example, the ID array in the previous example can be converted into the following array of feature vectors: [[0.8838,0.1570,0.5249,...,0.4278,0.1725,0.4225],[1.8143,-0.5514,0.0995,...,-4.7141,-1.3811,-1.1166],[0.0186,3.5949,1.1780,...,1.1433,2.7235,-0.5069],...,[1.1674,0.9497,1.8264,...,1.3671,0.5551,-0.4302] .5551,-0.4302]], where [0.8838,0.1570,0.5249,...,0.4278,0.1725,0.4225] represents the eigenvector of 40496, [1.8143,-0.5514,0.0995,...,-4.7141,-1.3811,-1.1166] represents the eigenvector of 3152, [0.0186,3.5949,1.1780,...,1.1433,2.7235,-0.5069] represents the eigenvector of 2099, ..., [1.1674,0.9497,1.8264,...,1.3671,0.5551,-0.4302] represents the eigenvector of 0. The feature vector is a two-dimensional array, in which each element is a feature vector corresponding to an ID.

深度学习算法模型可例如是词向量(Word2Vec)模型、Transformers模型等。The deep learning algorithm model may be, for example, a word vector (Word2Vec) model, a Transformers model, etc.

S51.提取出用户选中的编辑区域的蒙层(mask)特征、深度特征、轮廓特征、颜色特征等各种特征,并使用加权求和等技术将这些特征进行融合,得到融合特征向量。S51. Extract various features such as mask features, depth features, contour features, color features, etc. of the editing area selected by the user, and fuse these features using weighted summation and other techniques to obtain a fused feature vector.

编辑区域对应的融合特征向量可以表达成二维数组。例如,“天空”区域的融合特征向量为:[[0.2611,1.8726,...,-0.9721],...,[1.6888,2.6287,...,-5.5910]]。The fused feature vector corresponding to the edited area can be expressed as a two-dimensional array. For example, the fused feature vector of the "sky" area is: [[0.2611, 1.8726, ..., -0.9721], ..., [1.6888, 2.6287, ..., -5.5910]].

一种特征(如深度特征)对应的特征向量可以是通过深度学习算法模型计算得到的,深度学习算法模型可例如是卷积神经网络模型(convolutional neural networks,CNN)。多个特征的融合,可通过多个CNN模型以及加权求和实现。例如通过CNN模型1得到图13所示的蒙层特征,通过CNN模型2得到图14所示的深度特征,通过CNN模型3得到图15所示的轮廓特征等等,这些特征的特征向量的维度是一样长的,因此这些特征向量通过加权求和可得到融合特征向量。The feature vector corresponding to a feature (such as a deep feature) can be calculated by a deep learning algorithm model, and the deep learning algorithm model can be, for example, a convolutional neural network model (CNN). The fusion of multiple features can be achieved through multiple CNN models and weighted summation. For example, the mask feature shown in Figure 13 is obtained by CNN model 1, the deep feature shown in Figure 14 is obtained by CNN model 2, and the contour feature shown in Figure 15 is obtained by CNN model 3, etc. The dimensions of the feature vectors of these features are the same length, so these feature vectors can be obtained by weighted summation to obtain a fused feature vector.

S52.使用机器学习或深度学习算法等模型,比对第一编辑指令对应的特征向量与用户选中的编辑区域的融合特征向量,判断第一编辑指令应用于用户选中的编辑区域是否合理。S52. Use models such as machine learning or deep learning algorithms to compare the feature vector corresponding to the first editing instruction with the fused feature vector of the editing area selected by the user, and determine whether it is reasonable to apply the first editing instruction to the editing area selected by the user.

具体的,计算模块可以将第一编辑指令中的编辑类型对应的特征向量作为机器学习或深度学习算法等模型的输入之一,将第一编辑指令中的编辑参数对应的特征向量作为机器学习或深度学习算法等模型的输入之二,将编辑区域的融合特征向量作为机器学习或深度学习算法等模型的输入之三,最终,计算模块可经过机器学习或深度学习算法等模型的计算,得到第一编辑指令应用于用户选中的编辑区域是否合理的判断结果。Specifically, the calculation module can use the feature vector corresponding to the editing type in the first editing instruction as one of the inputs of models such as machine learning or deep learning algorithms, use the feature vector corresponding to the editing parameters in the first editing instruction as the second input of models such as machine learning or deep learning algorithms, and use the fusion feature vector of the editing area as the third input of models such as machine learning or deep learning algorithms. Finally, the calculation module can obtain a judgment result on whether it is reasonable to apply the first editing instruction to the editing area selected by the user through calculations by models such as machine learning or deep learning algorithms.

当判断出第一编辑指令应用于用户选中的编辑区域不合理时,人机交互模块可输出错误提示。该错误提示可以是屏幕上显示的可视化提示,也可以是触觉上可感知的振动提示,还可以是语音提示,等等。When it is determined that the first editing instruction is not reasonable to be applied to the editing area selected by the user, the human-computer interaction module can output an error prompt, which can be a visual prompt displayed on the screen, a vibration prompt perceptible by touch, a voice prompt, and so on.

例如,如图27所示,当用户选中的编辑区域是“天空”,输入的编辑指令为“增加‘静谧的湖泊’”时,人机交互模块可显示错误提示311,以提示用户在“天空”增加“静谧的湖泊”不符合常识,还可以不允许执行该编辑指令。For example, as shown in FIG. 27 , when the editing area selected by the user is “sky” and the editing instruction input is “add ‘quiet lake’”, the human-computer interaction module may display an error prompt 311 to remind the user that adding “quiet lake” to “sky” is not in line with common sense and may not allow the execution of the editing instruction.

又例如,对于图7所示的图片71,当用户选中的编辑区域是“天空”,输入的编辑指令为“删除‘太阳’”时,计算模块可判断出:该编辑指令适用于“天空”,而若编辑指令变为“删除‘山’”,则计算模块可判断出:新编辑指令不适用于“天空”。For another example, for picture 71 shown in Figure 7, when the editing area selected by the user is "sky" and the editing instruction input is "delete 'sun'", the calculation module can determine that the editing instruction is applicable to the "sky", and if the editing instruction becomes "delete 'mountain'", the calculation module can determine that the new editing instruction is not applicable to the "sky".

再例如,对于图7所示的图片71,当用户选中的编辑区域是“太阳”,输入的编辑指令为“替换成‘云朵’”时,计算模块可判断出:该编辑指令适用于该编辑区域,而若编辑指令变为“替换成‘树’”,则计算模块可判断出:新编辑指令不适用于该编辑区域。For another example, for picture 71 shown in Figure 7, when the editing area selected by the user is "sun" and the editing instruction input is "replace with 'clouds'", the calculation module can determine that the editing instruction is applicable to the editing area, and if the editing instruction becomes "replace with 'trees'", the calculation module can determine that the new editing instruction is not applicable to the editing area.

再例如,对于图7所示的图片71,当用户选中的编辑区域是“树”,输入的编辑指令为“拖动到‘山林中’”时,计算模块可判断出:该编辑指令适用于该编辑区域,而若编辑指令变为“拖动到‘湖泊中’”,则计算模块可判断出:新编辑指令不适用于该编辑区域。For another example, for picture 71 shown in Figure 7, when the editing area selected by the user is "tree" and the editing instruction input is "drag to 'forest'", the calculation module can determine that the editing instruction is applicable to the editing area, and if the editing instruction becomes "drag to 'lake'", the calculation module can determine that the new editing instruction is not applicable to the editing area.

以上示例仅仅用于解释本申请实施例,不应构成限定。The above examples are only used to explain the embodiments of the present application and should not be construed as limiting.

图27中示出的错误提示311也仅作示例,实际应用中,错误提示还可以是其他样式,例如闪烁错误的编辑指令或者闪烁编辑区域,本申请实施例对此不作限制。The error prompt 311 shown in FIG. 27 is also only an example. In actual applications, the error prompt may also be in other styles, such as flashing erroneous editing instructions or flashing editing areas. The embodiments of the present application do not limit this.

S53.当判断出第一编辑指令应用于用户选择的编辑区域不合理时,重新推荐编辑指令、编辑区域。S53. When it is determined that the first editing instruction is not reasonable to be applied to the editing area selected by the user, re-recommend the editing instruction and editing area.

具体的,可以根据第一编辑指令中的编辑类型、编辑参数对应的特征向量为用户推荐第一图片中的其他区域作为编辑区域。计算模块可以遍历第一图片中的其他区域,并采用S52中的方式比对其他区域的融合特征向量与第一编辑指令对应的特征向量,以找到能够合理搭配第一编辑指令的区域,然后将找到的区域推荐作为编辑区域。这里,其他区域是指,第一图片中用户选择的编辑区域之外的区域。Specifically, other areas in the first image can be recommended to the user as editing areas based on the editing type and feature vectors corresponding to the editing parameters in the first editing instruction. The calculation module can traverse other areas in the first image, and use the method in S52 to compare the fused feature vectors of other areas with the feature vectors corresponding to the first editing instruction to find an area that can reasonably match the first editing instruction, and then recommend the found area as the editing area. Here, other areas refer to areas outside the editing area selected by the user in the first image.

具体的,可以根据用户选择的编辑区域修改第一编辑指令,如修改编辑类型和/或编辑参数,使得修改后的编辑指令适配用户选中的编辑区域。其中,该其他区域是指第一图片中用户选择的编辑区域之外的区域。计算模块可以遍历推荐池中的编辑类型和/或编辑参数,并采用S52中的方式比对所遍历的编辑类型、编辑参数对应的特征向量与用户选择的编辑区域的融合特征向量,以找到能够合理搭配用户选择的区域的编辑类型和/或编辑参数,然后修改编辑指令,最后推荐修改后的编辑指令。Specifically, the first editing instruction can be modified according to the editing area selected by the user, such as modifying the editing type and/or editing parameters, so that the modified editing instruction adapts to the editing area selected by the user. Among them, the other area refers to the area outside the editing area selected by the user in the first picture. The calculation module can traverse the editing types and/or editing parameters in the recommendation pool, and use the method in S52 to compare the feature vectors corresponding to the traversed editing types and editing parameters with the fused feature vector of the editing area selected by the user to find the editing type and/or editing parameters that can reasonably match the area selected by the user, and then modify the editing instruction, and finally recommend the modified editing instruction.

例如,如图28所示,当用户选中的编辑区域是“天空”,输入的编辑指令为“增加‘静谧的湖泊’”时,计算模块可判断出:该编辑指令不适用于“天空”区域312,并修改编辑区域,将“树林”区域313作为新编辑区域,最后建议用户将该编辑指令应用于“树林”区域313。For example, as shown in Figure 28, when the editing area selected by the user is "sky" and the editing instruction input is "add 'quiet lake'", the calculation module can determine that: the editing instruction is not applicable to the "sky" area 312, and modify the editing area, using the "forest" area 313 as the new editing area, and finally recommends the user to apply the editing instruction to the "forest" area 313.

又例如,如图29所示,当用户选中的编辑区域是“天空”,输入的编辑指令为“增加‘静谧的湖泊’”时,计算模块可判断出:该编辑指令不适用于“天空”,并修改编辑指令,将编辑指令修改为“增加‘翱翔的大雁’”作为新编辑指令,最后建议用户将该新编辑指令应用于区域“天空”。For another example, as shown in FIG29 , when the editing area selected by the user is “sky” and the editing instruction input is “add ‘quiet lake’”, the calculation module can determine that: the editing instruction is not applicable to the “sky”, and modify the editing instruction to “add ‘soaring geese’” as a new editing instruction, and finally suggest that the user apply the new editing instruction to the area “sky”.

实施例四Embodiment 4

实施例四是对实施例一、二中的“编辑图片”这一步骤的补充细化,介绍了如何通过改变原图中物体的透视关系(即前后关系、景深关系),使得新增内容至原图时,不遮挡原图中前面的物体,确保编辑后的图片中物体间的透视关系合理。Example 4 is a supplementary refinement of the "editing picture" step in Examples 1 and 2, and introduces how to change the perspective relationship of objects in the original image (i.e., front-to-back relationship, depth of field relationship) so that when new content is added to the original image, the front objects in the original image are not obstructed, thereby ensuring that the perspective relationship between objects in the edited image is reasonable.

本申请实施例中,用户针对编辑区域输入的编辑指令涉及新增物体,涉及新增物体的编辑指令可包括:增加、替换、拖动等类型的编辑指令。其中,替换相当于从原图中删除一个物体之后,再新增另一个物体;拖动相当于从原图中的某个位置删除一个物体,再将该物体增加到另一个位置。换句话说,编辑指令涉及新增物体可以是指,编辑指令对应的编辑处理包括:增加新物体到编辑区域。In the embodiment of the present application, the editing instructions input by the user for the editing area involve adding new objects, and the editing instructions involving adding new objects may include: adding, replacing, dragging and other types of editing instructions. Among them, replacing is equivalent to deleting an object from the original image and then adding another object; dragging is equivalent to deleting an object from a certain position in the original image and then adding the object to another position. In other words, the editing instructions involving adding new objects may mean that the editing processing corresponding to the editing instructions includes: adding new objects to the editing area.

本实施例中,“编辑图片”这一步骤具体可包括:响应于用户针对编辑区域输入第一编辑指令,人机交互模块可以向计算模块发送第一编辑指令。这里,第一编辑指令涉及新增物体到编辑区域,其编辑类型可例如是增加、替换、拖动等。然后,计算模块可以根据第一编辑指令对第一图片进行相应的图像编辑处理,并向人机交互模块返回图像编辑处理后的第一图片。这样,人机交互模块可以展示图像编辑处理后的第一图片。In this embodiment, the step of "editing the picture" may specifically include: in response to the user inputting a first editing instruction for the editing area, the human-computer interaction module may send the first editing instruction to the computing module. Here, the first editing instruction involves adding a new object to the editing area, and its editing type may be, for example, adding, replacing, dragging, etc. Then, the computing module may perform corresponding image editing processing on the first picture according to the first editing instruction, and return the first picture after the image editing processing to the human-computer interaction module. In this way, the human-computer interaction module can display the first picture after the image editing processing.

其中,如图30所示,计算模块执行图像编辑的具体实现可如下(S61-S62):As shown in FIG30 , the specific implementation of the image editing performed by the computing module may be as follows (S61-S62):

S61.提取新物体图像的各种图像特征,如深度特征、蒙层特征、轮廓特征、颜色特征等,提取第一图片的各种图像特征,并结合编辑区域的图像特征以及编辑指令中的编辑参数,对第一图片的各种特征进行修正。S61. Extract various image features of the new object image, such as depth features, mask features, contour features, color features, etc., extract various image features of the first image, and modify the various features of the first image in combination with the image features of the editing area and the editing parameters in the editing instructions.

这里,新物体是相对于原物体而言的,原物体是指增加新物体之前编辑区域内的物体,新物体要在编辑区域替换原物体。对于增加、替换这两种编辑类型而言,人机交互模块可以提供编辑参数所指示的新物体的一个或多个图像,以供用户选择新物体的图像。对于拖动这种编辑类型来说,新物体实质是被用户拖拽的那个物体,其图像来自第一图片,并非来自第一图片之外。Here, the new object is relative to the original object. The original object refers to the object in the editing area before the new object is added. The new object replaces the original object in the editing area. For the two editing types of adding and replacing, the human-computer interaction module can provide one or more images of the new object indicated by the editing parameters for the user to select the image of the new object. For the editing type of dragging, the new object is actually the object dragged by the user, and its image comes from the first picture, not from outside the first picture.

下面说明各种特征的修正。Modifications to various features are described below.

深度特征的修正Correction of deep features

步骤1.根据第一图片的图像特征以及第一编辑指令中的编辑参数确定新物体与第一图片中的原物体之间的透视关系,即前后关系,数据上体现为深度值的大小不同。Step 1. Determine the perspective relationship, ie, the front-to-back relationship, between the new object and the original object in the first image according to the image features of the first image and the editing parameters in the first editing instruction, which is reflected in the data as the different depth values.

若直接利用新物体的图像去替换原物体的图像,那会造成物体之间的透视关系不合理,新物体未能合理的融入第一图片,如导致编辑区域内的原物体被新物体完全遮挡。例如,如图31所示,若将图32所示的新物体“静谧的湖泊”直接替换图片315中的编辑区域317内原物体“树”、“山谷”,则会导致“静谧的湖泊”挡在原物体“树”之前,产生了不合理的透视关系。If the image of the new object is directly used to replace the image of the original object, the perspective relationship between the objects will be unreasonable, and the new object will not be reasonably integrated into the first picture, such as causing the original object in the editing area to be completely blocked by the new object. For example, as shown in Figure 31, if the new object "Quiet Lake" shown in Figure 32 directly replaces the original objects "Tree" and "Valley" in the editing area 317 of the picture 315, it will cause the "Quiet Lake" to block the original object "Tree", resulting in an unreasonable perspective relationship.

本申请实施例可采用图像语义理解等人工智能算法确定新物体与原物体之间的透视关系,并基于此修正图片的深度特征,以呈现物体与物体间的合理透视关系。The embodiments of the present application may use artificial intelligence algorithms such as image semantic understanding to determine the perspective relationship between the new object and the original object, and based on this, correct the depth features of the image to present a reasonable perspective relationship between objects.

例如,若针对图31中的图片315的编辑区域317,第一编辑指令为“增加‘静谧的湖泊’”,且新物体“静谧的湖泊”的图像如图32所示;则计算模块根据图片315的蒙层特征可确定出:编辑区域317处的原物体是“树”、“山谷”,新物体“静谧的湖泊”会挡住该原物体“树”、“山谷”,然后,计算模块可以根据图像语义理解等人工智能算法确定新物体“静谧的湖泊”、原物体“树”、“山谷”之间的前后关系,如:“树”在“静谧的湖泊”的前方,“静谧的湖泊”在“山谷”的前方。For example, if the first editing instruction for the editing area 317 of the picture 315 in Figure 31 is "add 'quiet lake'", and the image of the new object "quiet lake" is shown in Figure 32; then the computing module can determine based on the masking features of the picture 315: the original objects in the editing area 317 are "tree" and "valley", and the new object "quiet lake" will block the original objects "tree" and "valley". Then, the computing module can determine the front and back relationship between the new object "quiet lake" and the original objects "tree" and "valley" based on artificial intelligence algorithms such as image semantic understanding, such as: the "tree" is in front of the "quiet lake", and the "quiet lake" is in front of the "valley".

步骤2.根据新物体与原物体之间的透视关系,以及原物体的深度值,确定出新物体的基准深度,然后利用基准深度修正新物体的深度特征。Step 2. According to the perspective relationship between the new object and the original object, and the depth value of the original object, the reference depth of the new object is determined, and then the depth feature of the new object is corrected using the reference depth.

首先,可以采用插值等方法确定新物体的基准深度。在上面示例中,假设原物体“树”、“山谷”的平均深度值分别为0.6、1.0,那么,根据新物体“静谧的湖泊”、原物体“树”、“山谷”之间的前后关系,可以通过插值算法等确定“静谧的湖泊”的基准深度为0.8,0.8大于0.6,小于1.0。这里,越靠后的物体,它的深度越大。这样,可使得新物体在第一图片的整体画面中的透视关系合理。First, the reference depth of the new object can be determined by interpolation or other methods. In the above example, assuming that the average depth values of the original objects "tree" and "valley" are 0.6 and 1.0 respectively, then, based on the front-to-back relationship between the new object "quiet lake" and the original objects "tree" and "valley", the reference depth of the "quiet lake" can be determined to be 0.8 through interpolation algorithms, which is greater than 0.6 and less than 1.0. Here, the further back the object is, the greater its depth. In this way, the perspective relationship of the new object in the overall picture of the first image can be made reasonable.

其次,利用新物体的基准深度以及新物体上各区域之间的深度差,修正新物体的深度特征。Secondly, the depth feature of the new object is corrected by using the baseline depth of the new object and the depth difference between various regions on the new object.

新物体被修正深度特征后,新物体的平均深度可接近或等于该基准深度,且各个区域之间的深度差保留不变。其中,接近可以是指平均深度与基准深度之差不超过特定深度值,如0.05。这样,既可以确保新物体整体上与原物体之间形成了合理的透视关系,还可以保留新物体内各个元素之间的透视关系。After the depth feature of the new object is corrected, the average depth of the new object can be close to or equal to the reference depth, and the depth difference between each area remains unchanged. Here, closeness can mean that the difference between the average depth and the reference depth does not exceed a specific depth value, such as 0.05. In this way, it can be ensured that the new object as a whole forms a reasonable perspective relationship with the original object, and the perspective relationship between each element in the new object can be preserved.

步骤3.利用新物体修正后的深度特征替换编辑区域的原深度特征,以实现对第一图片的深度特征的修正。Step 3: Replace the original depth features of the edited area with the corrected depth features of the new object to correct the depth features of the first image.

这里,替换的实质可以是:遍历编辑区域内的每一个像素区域,如果在该像素区域内新物体在原物体前面,则利用新物体上该像素区域的深度数据替换该像素区域的原深度数据,以实现新物体在原物体前面的透视关系;如果该像素区域内原物体在新物体前面,则保持该像素区域的原深度数据,以实现原物体在新物体前面的透视关系。一个像素区域的原深度数据是指原物体上该像素区域的深度数据。Here, the essence of replacement can be: traverse each pixel area in the editing area, if the new object is in front of the original object in the pixel area, then use the depth data of the pixel area on the new object to replace the original depth data of the pixel area, so as to realize the perspective relationship of the new object in front of the original object; if the original object in the pixel area is in front of the new object, then keep the original depth data of the pixel area, so as to realize the perspective relationship of the original object in front of the new object. The original depth data of a pixel area refers to the depth data of the pixel area on the original object.

编辑区域可以被区分成第一像素区域和第二像素区域,其中,所述第一像素区域的透视关系为新物体在原物体前面,第二像素区域的透视关系为原物体在新物体前面。第一像素区域的深度特征可以被替换成新物体的深度特征,第二像素区域的深度特征可保持原物体的深度特征。实际应用中,新物体也可能全部都在原物体前面,而不包括第二像素区域,这种情况下可直接将整个编辑区域的深度特征都替换成新物体的深度特征。The editing area can be divided into a first pixel area and a second pixel area, wherein the perspective relationship of the first pixel area is that the new object is in front of the original object, and the perspective relationship of the second pixel area is that the original object is in front of the new object. The depth features of the first pixel area can be replaced with the depth features of the new object, and the depth features of the second pixel area can maintain the depth features of the original object. In practical applications, the new object may also be entirely in front of the original object, excluding the second pixel area. In this case, the depth features of the entire editing area can be directly replaced with the depth features of the new object.

所以,实际应用中,编辑区域中可能仅有部分像素区域发生深度数据替换,该部分像素区域的透视关系为:新物体在原物体前面;而其他像素区域可能保持原深度数据,该其他像素区域的透视关系为:原物体在新物体前面。例如,如图33所示,在像素区域319,假设原物体“树”的平均深度值为0.6,新物体“静谧的湖泊”的基准深度为0.8,可知,在像素区域319,原物体“树”在新物体“静谧的湖泊”的前面,因此,像素区域319保持深度数据,可确保:区域319处,在前的“树”不会被在后的“静谧的湖泊”遮挡,形成合理的透视关系。Therefore, in actual applications, only some pixel areas in the editing area may have depth data replaced, and the perspective relationship of these pixel areas is: the new object is in front of the original object; while other pixel areas may keep the original depth data, and the perspective relationship of these other pixel areas is: the original object is in front of the new object. For example, as shown in Figure 33, in pixel area 319, assuming that the average depth value of the original object "tree" is 0.6, and the reference depth of the new object "quiet lake" is 0.8, it can be seen that in pixel area 319, the original object "tree" is in front of the new object "quiet lake". Therefore, pixel area 319 keeps the depth data, which can ensure that: in area 319, the "tree" in front will not be blocked by the "quiet lake" behind, forming a reasonable perspective relationship.

图34示例性示出了通过深度特征的修正新物体“静谧的湖泊”与原物体“树”、“山谷”之间形成的合理透视关系。FIG. 34 exemplarily shows a reasonable perspective relationship formed between the new object “quiet lake” and the original objects “tree” and “valley” through correction of depth features.

蒙层特征的修正Modification of mask features

经过深度特征的修正之后,蒙层特征的修正可能不能直接使用新物体蒙层数据去替换编辑区域的原蒙层数据,而只在发生了深度数据替换的区域利用新物体蒙层数据去替换原蒙层数据。这样,可确保蒙层特征的修正与深度特征的修正相符,使得修正后的蒙层特征、深度特征都指向同一物体,避免二者产生矛盾,确保重新生成的图片语义合理。After the correction of the depth feature, the correction of the mask feature may not directly use the new object mask data to replace the original mask data of the edited area, but only use the new object mask data to replace the original mask data in the area where the depth data has been replaced. In this way, it can ensure that the correction of the mask feature is consistent with the correction of the depth feature, so that the corrected mask feature and depth feature both point to the same object, avoiding conflicts between the two, and ensuring that the semantics of the regenerated image are reasonable.

例如,在图33中,编辑区域317中的像素区域319未发生深度数据替换,即依然采用原物体“树”的深度数据,使得在透视关系上原物体“树”在新物体“静谧的湖泊”前面。对此,在进行蒙层特征的修正时,在区域319,便不利用新物体“静谧的湖泊”的蒙层数据(如“8”)替换区域319的原蒙层数据(如“5”,即“树”的蒙层数据),而保留原物体“树”的蒙层数据;否则,会导致矛盾,即:从深度特征来看,区域319是原物体“树”,而从蒙层特征来看,区域319确是新物体“静谧的湖泊”。For example, in FIG33 , the depth data of pixel area 319 in the editing area 317 is not replaced, that is, the depth data of the original object "tree" is still used, so that the original object "tree" is in front of the new object "quiet lake" in the perspective relationship. In this regard, when correcting the mask feature, in area 319, the mask data of the new object "quiet lake" (such as "8") is not used to replace the original mask data of area 319 (such as "5", that is, the mask data of "tree"), but the mask data of the original object "tree" is retained; otherwise, it will lead to a contradiction, that is: from the perspective of the depth feature, area 319 is the original object "tree", but from the perspective of the mask feature, area 319 is indeed the new object "quiet lake".

换句话说,在编辑区域内,如果一个像素区域的深度数据属于新物体上的深度数据,而不再是该像素区域的原深度数据,则可利用新物体上该像素区域的蒙层数据替该像素区域的原蒙层数据。一个像素区域的原蒙层数据是指原物体上该像素区域的蒙层数据。所以,实际应用中,发生蒙层特征替换的区域可能仅为编辑区域中的部分像素区域,该部分像素区域的深度特征被替换成了新物体的深度特征。In other words, within the editing area, if the depth data of a pixel area belongs to the depth data of the new object, and is no longer the original depth data of the pixel area, the mask data of the pixel area on the new object can be used to replace the original mask data of the pixel area. The original mask data of a pixel area refers to the mask data of the pixel area on the original object. Therefore, in actual applications, the area where the mask feature is replaced may only be part of the pixel area in the editing area, and the depth features of this part of the pixel area are replaced with the depth features of the new object.

在图34所示的例子中,发生蒙层特征替换的区域小于新物体“静谧的湖泊”的区域,图35简单示出了二者的对比。In the example shown in FIG34 , the area where the mask feature replacement occurs is smaller than the area of the new object “Quiet Lake”. FIG35 simply shows a comparison between the two.

轮廓特征的修正Correction of contour features

与蒙层特征的修正一样,经过深度特征的修正之后,轮廓特征的修正可能不能直接使用新物体的轮廓数据去替换编辑区域的原轮廓数据,而只在发生了深度数据替换的区域利用新物体轮廓数据去替换原轮廓数据。这样,可确保轮廓特征的修正与深度特征的修正相符,使得修正后的轮廓特征、深度特征都指向同一物体,避免二者产生矛盾,确保重新生成的图片语义合理。Similar to the correction of the mask feature, after the correction of the depth feature, the correction of the contour feature may not directly use the contour data of the new object to replace the original contour data of the edited area, but only use the contour data of the new object to replace the original contour data in the area where the depth data has been replaced. In this way, it can ensure that the correction of the contour feature is consistent with the correction of the depth feature, so that the corrected contour feature and depth feature both point to the same object, avoiding conflicts between the two, and ensuring that the semantics of the regenerated image are reasonable.

换句话说,在编辑区域内,如果一个像素区域的深度数据属于新物体上的深度数据,而不再是该像素区域的原深度数据,则可利用新物体上该像素区域的轮廓数据替该像素区域的原轮廓数据。一个像素区域的原轮廓数据是指原物体上该像素区域的轮廓数据。In other words, within the editing area, if the depth data of a pixel area belongs to the depth data of the new object, and is no longer the original depth data of the pixel area, the contour data of the pixel area on the new object can be used to replace the original contour data of the pixel area. The original contour data of a pixel area refers to the contour data of the pixel area on the original object.

同样的,在图34所示的例子中,发生轮廓特征替换的区域小于新物体“静谧的湖泊”的区域,图35同样可以用来示意这二者的对比。Similarly, in the example shown in FIG. 34 , the area where the contour feature replacement occurs is smaller than the area of the new object “quiet lake”. FIG. 35 can also be used to illustrate the comparison between the two.

不限于蒙层特征、轮廓特征,其他特征(如颜色特征)的修正也同样考虑,即:在编辑区域内,如果一个像素区域的深度数据属于新物体上的深度数据,而不再是该像素区域的原深度数据,则可利用新物体上该像素区域的颜色特征等其他特征替该像素区域的原颜色特征等其他特征。Not limited to mask features and contour features, the correction of other features (such as color features) is also considered, that is: in the editing area, if the depth data of a pixel area belongs to the depth data on the new object, and is no longer the original depth data of the pixel area, then the color features of the pixel area on the new object and other features can be used to replace the original color features and other features of the pixel area.

S62.利用修正后的深度特征、蒙层特征、轮廓特征等重新生成图片。S62. Regenerate the image using the corrected depth features, mask features, contour features, etc.

具体的,可利用人工智能算法模型重新生成图片,例如通过Stable Diffusion模型结合ControlNet模型来重新生成图片。Stable Diffusion可用于根据文本或者图像用来生成图像的扩散模型,在生成图像过程中,可以通过ControlNet模型引入更多条件(如深度特征、蒙层特征、轮廓特征等)来干预图像生成过程。Specifically, an artificial intelligence algorithm model can be used to regenerate images, for example, by combining a Stable Diffusion model with a ControlNet model. Stable Diffusion can be used to generate diffusion models for images based on text or images. In the process of generating images, more conditions (such as depth features, mask features, contour features, etc.) can be introduced through the ControlNet model to intervene in the image generation process.

重新生成的图片可如图34中右图所示,实现了新物体(如“静谧的湖泊”)合理的插入到第一图片的原物体(如“树”、“山谷”)之间,呈现了物体间的合理的透视关系。The regenerated image can be shown in the right picture of Figure 34, which realizes the reasonable insertion of new objects (such as "quiet lake") between the original objects (such as "tree" and "valley") in the first image, presenting a reasonable perspective relationship between objects.

本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时可实现上述各个方法实施例中人机交互模块所执行的步骤,或人机交互模块和计算模块所执行的步骤。An embodiment of the present application also provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by a processor, it can implement the steps performed by the human-computer interaction module in the above-mentioned method embodiments, or the steps performed by the human-computer interaction module and the computing module.

本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时可实现上述各个方法实施例中计算模块所执行的步骤。The embodiment of the present application further provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by a processor, the steps performed by the computing module in the above-mentioned method embodiments can be implemented.

本申请实施例还提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备可实现上述各个方法实施例中被人机交互模块所执行的步骤,或人机交互模块和计算模块所执行的步骤。The embodiment of the present application also provides a computer program product. When the computer program product is run on a terminal device, the terminal device can implement the steps performed by the human-computer interaction module in the above-mentioned various method embodiments, or the steps performed by the human-computer interaction module and the computing module.

本申请实施例还提供了一种计算机程序产品,当计算机程序产品在服务器上运行时,使得服务器可实现上述各个方法实施例中被计算模块所执行的步骤。The embodiment of the present application also provides a computer program product. When the computer program product is run on a server, the server can implement the steps performed by the computing module in the above-mentioned various method embodiments.

本申请实施例还提供了一种芯片系统,芯片系统包括处理器,处理器与存储器耦合,处理器执行存储器中存储的计算机程序,以实现本申请任一方法实施例中人机交互模块所执行的步骤,或人机交互模块和计算模块所执行的步骤。芯片系统可以为单个芯片,或者多个芯片组成的芯片模组。The embodiment of the present application also provides a chip system, which includes a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory to implement the steps performed by the human-computer interaction module in any method embodiment of the present application, or the steps performed by the human-computer interaction module and the computing module. The chip system can be a single chip, or a chip module composed of multiple chips.

本申请实施例还提供了一种芯片系统,芯片系统包括处理器,处理器与存储器耦合,处理器执行存储器中存储的计算机程序,以实现本申请任一方法实施例中计算模块所执行的步骤。芯片系统可以为单个芯片,或者多个芯片组成的芯片模组。The embodiment of the present application also provides a chip system, which includes a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory to implement the steps performed by the computing module in any method embodiment of the present application. The chip system can be a single chip or a chip module composed of multiple chips.

本申请的说明书及附图中的术语“用户界面(user interface,UI),简称界面”,是应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。应用程序的用户界面是通过java、可扩展标记语言(extensible markup language,XML)等特定计算机语言编写的源代码,界面源代码在终端设备上经过解析,渲染,最终呈现为用户可以识别的内容,比如图片、文字、按钮等控件。控件(control)也称为部件(widget),是用户界面的基本元素,典型的控件有工具栏(toolbar)、菜单栏(menu bar)、文本框(text box)、按钮(button)、滚动条(scrollbar)、图片和文本。界面中的控件的属性和内容是通过标签或者节点来定义的,比如XML通过<Textview>、<ImgView>、<VideoView>等节点来规定界面所包含的控件。一个节点对应界面中一个控件或属性,节点经过解析和渲染之后呈现为用户可视的内容。此外,很多应用程序,比如混合应用(hybrid application)的界面中通常还包含有网页。网页,也称为页面,可以理解为内嵌在应用程序界面中的一个特殊的控件,网页是通过特定计算机语言编写的源代码,例如超文本标记语言(hyper text markup language,HTML),层叠样式表(cascading style sheets,CSS),java脚本(JavaScript,JS)等,网页源代码可以由浏览器或与浏览器功能类似的网页显示组件加载和显示为用户可识别的内容。网页所包含的具体内容也是通过网页源代码中的标签或者节点来定义的,比如HTML通过<p>、<img>、<video>、<canvas>来定义网页的元素和属性。The term "user interface (UI), or interface for short) in the specification and drawings of this application refers to the medium interface for interaction and information exchange between an application or operating system and a user, which realizes the conversion between the internal form of information and the form acceptable to the user. The user interface of an application is a source code written in a specific computer language such as Java and extensible markup language (XML). The interface source code is parsed and rendered on the terminal device, and finally presented as content that the user can recognize, such as pictures, text, buttons and other controls. Controls (controls), also known as widgets, are basic elements of the user interface. Typical controls include toolbars, menu bars, text boxes, buttons, scroll bars, pictures and text. The properties and contents of controls in the interface are defined by tags or nodes. For example, XML specifies the controls contained in the interface through nodes such as <Textview>, <ImgView>, and <VideoView>. A node corresponds to a control or attribute in the interface, and the node is presented as user-visible content after parsing and rendering. In addition, many applications, such as hybrid applications, usually include web pages in their interfaces. A web page, also known as a page, can be understood as a special control embedded in the application interface. A web page is a source code written in a specific computer language, such as hypertext markup language (HTML), cascading style sheets (CSS), JavaScript (JS), etc. The web page source code can be loaded and displayed as user-recognizable content by a browser or a web page display component with similar functions to a browser. The specific content contained in a web page is also defined by tags or nodes in the web page source code. For example, HTML defines the elements and attributes of a web page through <p>, <img>, <video>, and <canvas>.

用户界面常用的表现形式是图形用户界面(graphic user interface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在电子设备的显示屏中显示的一个图标、窗口、控件等界面元素,其中控件可以包括图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。The most common form of user interface is graphical user interface (GUI), which refers to a user interface related to computer operation that is displayed in a graphical manner. It can be an icon, window, control or other interface element displayed on the display screen of an electronic device, where a control can include icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, widgets and other visual interface elements.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘)等。In the above embodiments, it can be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that contains one or more available media integration. The available medium can be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk), etc.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。Those skilled in the art can understand that to implement all or part of the processes in the above-mentioned embodiments, the processes can be completed by computer programs to instruct related hardware, and the programs can be stored in computer-readable storage media. When the programs are executed, they can include the processes of the above-mentioned method embodiments. The aforementioned storage media include: ROM or random access memory RAM, magnetic disk or optical disk and other media that can store program codes.

以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As described above, the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

Claims (34)

一种图片编辑方法,其特征在于,包括:A method for editing a picture, comprising: 显示第一图片;Display the first picture; 检测到作用于所述第一图片上的用于选择编辑区域的用户操作;detecting a user operation on the first picture for selecting an editing area; 根据所述用户操作在所述第一图片上的操作位置从所述第一图片中确定出所述编辑区域,并在所述第一图片中区别显示所述编辑区域;Determining the editing area from the first picture according to the operation position of the user operation on the first picture, and distinguishably displaying the editing area in the first picture; 根据所述编辑区域的图像特征生成推荐编辑指令;generating a recommended editing instruction according to the image features of the editing area; 显示所述推荐编辑指令;displaying the recommended editing instructions; 检测到用户针对所述编辑区域输入第一编辑指令,所述第一编辑指令包括:推荐编辑指令;detecting that a user inputs a first editing instruction for the editing area, the first editing instruction comprising: a recommended editing instruction; 根据所述第一编辑指令对所述第一图片进行图像编辑处理;Performing image editing processing on the first picture according to the first editing instruction; 展示所述图像编辑处理后的所述第一图片。The first picture after the image editing process is displayed. 如权利要求1所述的方法,其特征在于,所述根据所述编辑区域的图像特征生成推荐编辑指令,具体包括:The method according to claim 1, wherein generating the recommended editing instruction according to the image features of the editing area specifically comprises: 将所述编辑区域的多种图像特征的融合特征向量作为第一人工智能算法的输入之一;所述多种图像特征包括蒙层特征、深度特征、轮廓特征、颜色特征中的多项;Using a fusion feature vector of multiple image features of the editing area as one of the inputs of the first artificial intelligence algorithm; the multiple image features include multiple features of mask features, depth features, contour features, and color features; 将一种或多种预设编辑类型作为所述第一人工智能算法的输入之二;Using one or more preset editing types as second input to the first artificial intelligence algorithm; 通过所述第一人工智能算法的运算得到所述推荐编辑指令,所述推荐编辑指令包括所述预设编辑类型对应的编辑参数。The recommended editing instruction is obtained by the operation of the first artificial intelligence algorithm, and the recommended editing instruction includes editing parameters corresponding to the preset editing type. 如权利要求1或2所述的方法,其特征在于,所述预设编辑类型包括以下一项或多项:删除、拖动、替换、增加,或者颜色调节。The method according to claim 1 or 2 is characterized in that the preset editing type includes one or more of the following: deletion, dragging, replacement, addition, or color adjustment. 如权利要求1-3中任一项所述的方法,其特征在于,所述检测到用户针对所述编辑区域输入第一编辑指令,具体包括:检测到用户选择输入所述推荐编辑指令;用户选择的所述推荐编辑指令被确定为所述第一编辑指令。The method as described in any one of claims 1-3 is characterized in that the detecting that the user inputs the first editing instruction for the editing area specifically includes: detecting that the user chooses to input the recommended editing instruction; the recommended editing instruction selected by the user is determined as the first editing instruction. 如权利要求1-4中任一项所述的方法,其特征在于,所述方法还包括:显示一个或多个预置编辑指令;所述第一编辑指令还包括所述预置编辑指令;The method according to any one of claims 1 to 4, characterized in that the method further comprises: displaying one or more preset editing instructions; the first editing instruction also comprises the preset editing instruction; 所述检测到用户针对所述编辑区域输入第一编辑指令,具体包括:检测到用户选择输入所述预置编辑指令;用户选择的所述预置编辑指令被确定为所述第一编辑指令。The detecting that the user inputs the first editing instruction for the editing area specifically includes: detecting that the user selects to input the preset editing instruction; and the preset editing instruction selected by the user is determined as the first editing instruction. 如权利要求1-5中任一项所述的方法,其特征在于,所述方法还包括:显示第一输入框,所述第一输入框用于接收语音或文本编辑指令;所述第一编辑指令还包括通过所述输入框输入的语音或文本编辑指令;The method according to any one of claims 1 to 5, characterized in that the method further comprises: displaying a first input box, the first input box being used to receive a voice or text editing instruction; the first editing instruction further comprises a voice or text editing instruction input through the input box; 所述检测到用户针对所述编辑区域输入第一编辑指令,具体包括:检测到用户在所述第一输入框中输入的语音或文本指令;所述语音或文本指令被确定为所述第一编辑指令。The detecting that the user inputs the first editing instruction for the editing area specifically includes: detecting a voice or text instruction input by the user in the first input box; and determining the voice or text instruction as the first editing instruction. 如权利要求1-6中任一项所述的方法,其特征在于,所述用于选择编辑区域的用户操作包括:在所述第一图片中选择第一物体的用户操作,所述用户操作在所述第一图片上的操作位置落在所述第一图片中的第一物体上,所述第一物体所在的图像区域为所述编辑区域;所述第一物体所在的图像区域是对所述第一图片进行图像分割处理确定出的。The method as described in any one of claims 1-6 is characterized in that the user operation for selecting an editing area includes: a user operation of selecting a first object in the first picture, the operation position of the user operation on the first picture falls on the first object in the first picture, and the image area where the first object is located is the editing area; the image area where the first object is located is determined by performing image segmentation processing on the first picture. 如权利要求1-7中任一项所述的方法,其特征在于,所述用于选择编辑区域的用户操作包括:在第一图片中绘制所述编辑区域的用户操作。The method according to any one of claims 1 to 7, characterized in that the user operation for selecting the editing area includes: a user operation of drawing the editing area in the first picture. 如权利要求1-8中任一项所述的方法,其特征在于,还包括:获取第一图片的预处理信息,并从所述预处理信息指示的区域中确定出所述编辑区域。The method according to any one of claims 1-8 is characterized in that it also includes: obtaining preprocessing information of the first image, and determining the editing area from the area indicated by the preprocessing information. 如权利要求9所述的方法,其特征在于,所述预处理信息包括多个区域的指示信息,其中,所述预处理信息包括所述多个区域各自的轮廓点的坐标、二值图、灰度图,所述二值图中所述多个区域的取值为第一值,所述灰度图中所述多个区域的灰度值为第一灰度值或第一灰度范围;The method of claim 9, wherein the preprocessing information includes indication information of a plurality of regions, wherein the preprocessing information includes coordinates of contour points of the plurality of regions, a binary image, and a grayscale image, wherein the values of the plurality of regions in the binary image are first values, and the grayscale values of the plurality of regions in the grayscale image are first grayscale values or first grayscale ranges; 所述从所述预处理信息指示的区域中确定出所述编辑区域,具体包括:将所述多个区域中所述用户操作的操作位置所处的区域确定为所述编辑区域,或者将所述多个区域中距离所述作用位置最近的区域确定为所述编辑区域。Determining the editing area from the area indicated by the preprocessing information specifically includes: determining the area where the operation position of the user operation is located among the multiple areas as the editing area, or determining the area among the multiple areas that is closest to the operation position as the editing area. 如权利要求9所述的方法,其特征在于,所述预处理信息包含多个图层的图层信息,每个图层的图层信息包括所述图层中不透明像素的坐标;所述方法还包括:在所述从所述预处理信息指示的区域中确定出所述编辑区域之前,将每一个所述图层中连成一片的不透明像素确定成所述预处理信息指示的一个区域。The method as claimed in claim 9 is characterized in that the preprocessing information includes layer information of multiple layers, and the layer information of each layer includes the coordinates of opaque pixels in the layer; the method also includes: before determining the editing area from the area indicated by the preprocessing information, each of the opaque pixels connected in the layer is determined as an area indicated by the preprocessing information. 如权利要求9或11所述的方法,其特征在于,所述预处理信息包含轮廓信息;所述方法还包括:在所述从所述预处理信息指示的区域中确定出所述编辑区域之前,将所述轮廓信息指示的轮廓所包围形成的区域确定成所述预处理信息指示的区域。The method according to claim 9 or 11 is characterized in that the preprocessing information includes contour information; the method also includes: before determining the editing area from the area indicated by the preprocessing information, determining the area surrounded by the contour indicated by the contour information as the area indicated by the preprocessing information. 如权利要求9或11或12所述的方法,其特征在于,所述预处理信息包含深度信息;所述方法还包括:在所述从所述预处理信息指示的区域中确定出所述编辑区域之前,根据所述深度信息,将深度值相同或相近的像素确定成所述预处理信息指示的一个区域。The method according to claim 9, 11 or 12 is characterized in that the preprocessing information includes depth information; the method also includes: before determining the editing area from the area indicated by the preprocessing information, pixels with the same or similar depth values are determined as an area indicated by the preprocessing information based on the depth information. 如权利要求1-13中任一项所述的方法,其特征在于,所述区别显示包括以下一项或多项方式:高亮编辑区域的轮廓、高亮整个编辑区域,或沿着编辑区域的轮廓显示虚线框。The method according to any one of claims 1-13 is characterized in that the distinctive display includes one or more of the following methods: highlighting the outline of the editing area, highlighting the entire editing area, or displaying a dotted frame along the outline of the editing area. 如权利要求1-14中任一项所述的方法,其特征在于,还包括:在根据第一编辑指令对第一图片进行图像编辑处理之前,若判断出所述第一编辑指令应用于所述编辑区域不合理,则重新推荐编辑指令或重新推荐编辑区域。The method as described in any one of claims 1-14 is characterized in that it also includes: before performing image editing processing on the first picture according to the first editing instruction, if it is determined that the application of the first editing instruction to the editing area is unreasonable, then re-recommending the editing instruction or re-recommending the editing area. 如权利要求15所述的方法,其特征在于,还包括:通过比对所述第一编辑指令对应的特征向量与所述编辑区域的各个图像特征的融合特征向量判断所述第一编辑指令应用于所述编辑区域是否合理。The method as claimed in claim 15 is characterized in that it also includes: judging whether it is reasonable to apply the first editing instruction to the editing area by comparing the feature vector corresponding to the first editing instruction with the fused feature vector of each image feature of the editing area. 如权利要求15或16所述的方法,其特征在于,所述重新推荐编辑区域,具体包括:遍历所述第一图片中所述编辑区域之外的区域,比对所遍历的区域的融合特征向量与所述第一编辑指令对应的特征向量,找到能够合理搭配所述第一编辑指令的区域,并将找到的所述区域重新推荐为编辑区域。The method as claimed in claim 15 or 16 is characterized in that the re-recommending the editing area specifically includes: traversing the area outside the editing area in the first image, comparing the fused feature vector of the traversed area with the feature vector corresponding to the first editing instruction, finding an area that can reasonably match the first editing instruction, and re-recommending the found area as the editing area. 如权利要求15-17中任一项所述的方法,其特征在于,所述重新推荐编辑指令,具体包括:遍历推荐池中的编辑类型和/或编辑参数,比对所遍历的编辑类型和/或编辑参数对应的特征向量与所述编辑区域的融合特征向量,找到能够合理搭配所述编辑区域的编辑类型和/或编辑参数,并根据找到的编辑类型和/或编辑参数修改所述第一编辑指令,重新推荐修改后的所述第一编辑指令。The method as described in any one of claims 15-17 is characterized in that the re-recommending editing instructions specifically includes: traversing the editing types and/or editing parameters in the recommendation pool, comparing the feature vectors corresponding to the traversed editing types and/or editing parameters with the fused feature vector of the editing area, finding the editing types and/or editing parameters that can reasonably match the editing area, and modifying the first editing instruction according to the found editing type and/or editing parameters, and re-recommending the modified first editing instruction. 如权利要求1-18中任一项所述的方法,其特征在于,所述第一编辑指令对应的编辑处理包括:增加新物体到所述编辑区域;在所述图像编辑处理后的所述第一图片中,在所述编辑区域,第一像素区域的深度特征被替换成了所述新物体的深度特征,第二像素区域的深度特征保持为原物体的深度特征,其中,所述第一像素区域的透视关系为所述新物体在原物体前面,所述第二像素区域的透视关系为所述原物体在新物体前面。The method as described in any one of claims 1-18 is characterized in that the editing processing corresponding to the first editing instruction includes: adding a new object to the editing area; in the first picture after the image editing processing, in the editing area, the depth features of the first pixel area are replaced with the depth features of the new object, and the depth features of the second pixel area remain the depth features of the original object, wherein the perspective relationship of the first pixel area is that the new object is in front of the original object, and the perspective relationship of the second pixel area is that the original object is in front of the new object. 如权利要求19所述的方法,其特征在于,所述根据第一编辑指令对第一图片进行图像编辑处理,包括:对所述第一图片的图像特征进行修正,利用修正后的所述第一图片的图像特征重新生成所述第一图片。The method as claimed in claim 19 is characterized in that the image editing processing of the first picture according to the first editing instruction includes: correcting the image features of the first picture, and regenerating the first picture using the corrected image features of the first picture. 如权利要求20所述的方法,其特征在于,所述图像特征包括深度特征;所述对所述第一图片的图像特征进行修正,具体包括:The method of claim 20, wherein the image feature comprises a depth feature; and the correcting the image feature of the first image comprises: 根据所述第一图片的图像特征以及所述第一编辑指令中的编辑参数确定所述新物体与所述原物体之间的透视关系;determining a perspective relationship between the new object and the original object according to image features of the first picture and editing parameters in the first editing instruction; 根据所述新物体与所述原物体之间的透视关系,以及所述原物体的深度值,确定出所述新物体的基准深度,然后利用所述基准深度修正所述新物体的深度特征;Determine a reference depth of the new object according to the perspective relationship between the new object and the original object, and the depth value of the original object, and then use the reference depth to correct the depth feature of the new object; 利用所述新物体修正后的深度特征替换所述编辑区域的原深度特征;Replacing the original depth feature of the edited area with the corrected depth feature of the new object; 其中,所述新物体被修正深度特征后,所述新物体的平均深度接近或等于所述基准深度,且所述新物体上的各个区域之间的深度差保留不变。After the depth feature of the new object is corrected, the average depth of the new object is close to or equal to the reference depth, and the depth difference between various regions on the new object remains unchanged. 如权利要求20或21所述的方法,其特征在于,所述图像特征还包括第一图像特征,所述第一图像特征是深度特征之外的图像特征,包括以下一项或多项:蒙层特征、轮廓特征、颜色特征;The method according to claim 20 or 21, characterized in that the image feature further comprises a first image feature, and the first image feature is an image feature other than a depth feature, and comprises one or more of the following: a mask feature, a contour feature, and a color feature; 所述对所述第一图片的图像特征进行修正,还包括:The correcting the image feature of the first picture further includes: 在所述编辑区域内,如果一个像素区域的深度特征被替换成所述新物体修正后的深度特征,则利用所述新物体上所述像素区域的所述第一图像特征替换所述编辑区域中所述像素区域原来的所述第一图像特征。In the editing area, if the depth feature of a pixel area is replaced with the corrected depth feature of the new object, the original first image feature of the pixel area in the editing area is replaced with the first image feature of the pixel area on the new object. 一种终端设备,其特征在于,包括:人机交互模块,处理器以及存储器,其中,所述人机交互模块与所述处理器耦合,所述存储器与所述处理器耦合;所述人机交互模块包括触控屏;A terminal device, characterized in that it comprises: a human-computer interaction module, a processor and a memory, wherein the human-computer interaction module is coupled to the processor, and the memory is coupled to the processor; the human-computer interaction module comprises a touch screen; 其中,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器执行所述计算机指令时,使得所述终端设备执行如权利要求1-22中任一项所述的方法。The memory is used to store computer program codes, and the computer program codes include computer instructions. When the processor executes the computer instructions, the terminal device executes the method as described in any one of claims 1 to 22. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在终端设备上运行时,使得所述终端设备执行如权利要求1-22任一项所述的方法。A computer-readable storage medium, comprising instructions, characterized in that when the instructions are executed on a terminal device, the terminal device executes the method according to any one of claims 1 to 22. 一种图片编辑方法,所述方法应用于人机交互模块,其特征在于,所述人机交互模块包括于图片编辑系统,所述图片编辑系统还包括:计算模块;A picture editing method, the method is applied to a human-computer interaction module, characterized in that the human-computer interaction module is included in a picture editing system, and the picture editing system further includes: a calculation module; 所述方法包括:The method comprises: 所述人机交互模块显示第一图片;The human-computer interaction module displays a first picture; 所述人机交互模块检测到作用于所述第一图片上的用于选择编辑区域的用户操作;The human-computer interaction module detects a user operation for selecting an editing area on the first picture; 所述人机交互模块在所述第一图片中区别显示所述编辑区域;The human-computer interaction module distinguishably displays the editing area in the first picture; 所述人机交互模块接收所述计算模块发送的推荐编辑指令,并显示所述推荐编辑指令,所述推荐编辑指令是所述计算模块根据所述编辑区域的图像特征生成的;The human-computer interaction module receives the recommended editing instruction sent by the calculation module and displays the recommended editing instruction, where the recommended editing instruction is generated by the calculation module according to the image features of the editing area; 所述人机交互模块检测到用户针对所述编辑区域输入第一编辑指令,所述第一编辑指令包括:推荐编辑指令;The human-computer interaction module detects that a user inputs a first editing instruction for the editing area, the first editing instruction comprising: a recommended editing instruction; 所述人机交互模块接收所述计算模块发送的图像编辑处理后的所述第一图片,并展示所述图像编辑处理后的所述第一图片,所述图像编辑处理是所述计算模块执行的。The human-computer interaction module receives the first image after image editing processing sent by the computing module, and displays the first image after image editing processing, wherein the image editing processing is performed by the computing module. 如权利要求25所述的方法,其特征在于,所述方法还包括:所述人机交互模块显示一个或多个预置编辑指令;The method according to claim 25, characterized in that the method further comprises: the human-computer interaction module displays one or more preset editing instructions; 所述检测到用户针对所述编辑区域输入第一编辑指令,具体包括:检测到用户选择输入所述预置编辑指令;用户选择的所述预置编辑指令被确定为所述第一编辑指令。The detecting that the user inputs the first editing instruction for the editing area specifically includes: detecting that the user selects to input the preset editing instruction; and the preset editing instruction selected by the user is determined as the first editing instruction. 如权利要求25-26中任一项所述的方法,其特征在于,所述方法还包括:所述人机交互模块显示第一输入框,所述第一输入框用于接收语音或文本编辑指令;The method according to any one of claims 25-26, characterized in that the method further comprises: the human-computer interaction module displays a first input box, the first input box is used to receive a voice or text editing instruction; 所述检测到用户针对所述编辑区域输入第一编辑指令,具体包括:检测到用户在所述第一输入框中输入的语音或文本指令;所述语音或文本指令被确定为所述第一编辑指令。The detecting that the user inputs the first editing instruction for the editing area specifically includes: detecting a voice or text instruction input by the user in the first input box; and determining the voice or text instruction as the first editing instruction. 如权利要求25-27中任一项所述的方法,其特征在于,所述用于选择编辑区域的用户操作包括:在所述第一图片中选择第一物体的用户操作,所述用户操作在所述第一图片上的操作位置落在所述第一图片中的第一物体上,所述第一物体所在的图像区域为所述编辑区域;所述第一物体所在的图像区域是对所述第一图片进行图像分割处理确定出的。The method as described in any one of claims 25-27 is characterized in that the user operation for selecting an editing area includes: a user operation of selecting a first object in the first picture, the operation position of the user operation on the first picture falls on the first object in the first picture, and the image area where the first object is located is the editing area; the image area where the first object is located is determined by performing image segmentation processing on the first picture. 如权利要求25-28中任一项所述的方法,其特征在于,所述用于选择编辑区域的用户操作包括:在第一图片中绘制所述编辑区域的用户操作。The method according to any one of claims 25-28 is characterized in that the user operation for selecting the editing area includes: a user operation of drawing the editing area in the first picture. 如权利要求25-29中任一项所述的方法,其特征在于,所述区别显示包括以下一项或多项方式:高亮编辑区域的轮廓、高亮整个编辑区域,或沿着编辑区域的轮廓显示虚线框。The method as described in any one of claims 25-29 is characterized in that the distinctive display includes one or more of the following methods: highlighting the outline of the editing area, highlighting the entire editing area, or displaying a dotted box along the outline of the editing area. 如权利要求25-30中任一项所述的方法,其特征在于,在所述人机交互模块检测到用户针对所述编辑区域输入第一编辑指令之后,所述方法还包括:若所述第一编辑指令应用于所述编辑区域不合理,则重新推荐的编辑区域;所述重新推荐的编辑区域是所述计算模块根据所述第一编辑指令对应的特征向量从所述编辑区域之外的区域中找到的。The method as described in any one of claims 25-30 is characterized in that, after the human-computer interaction module detects that the user inputs a first editing instruction for the editing area, the method further includes: if it is unreasonable to apply the first editing instruction to the editing area, then re-recommended the editing area; the re-recommended editing area is found by the calculation module from the area outside the editing area according to the feature vector corresponding to the first editing instruction. 如权利要求25-31中任一项所述的方法,其特征在于,在所述人机交互模块检测到用户针对所述编辑区域输入第一编辑指令之后,所述方法还包括:若所述第一编辑指令应用于所述编辑区域不合理,则推荐修改后的所述第一编辑指令;所述修改后的第一编辑指令的编辑类型和/或编辑参数是所述计算模块根据所述编辑区域的融合特征向量从推荐池中找到的。The method as described in any one of claims 25-31 is characterized in that, after the human-computer interaction module detects that the user inputs a first editing instruction for the editing area, the method further includes: if it is unreasonable to apply the first editing instruction to the editing area, then recommending a modified first editing instruction; the editing type and/or editing parameters of the modified first editing instruction are found by the calculation module from the recommendation pool based on the fused feature vector of the editing area. 一种终端设备,其特征在于,包括:人机交互模块,处理器以及存储器,其中,所述人机交互模块与所述处理器耦合,所述存储器与所述处理器耦合;所述人机交互模块包括触控屏;A terminal device, characterized in that it comprises: a human-computer interaction module, a processor and a memory, wherein the human-computer interaction module is coupled to the processor, and the memory is coupled to the processor; the human-computer interaction module comprises a touch screen; 其中,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器执行所述计算机指令时,使得所述终端设备执行如权利要求25-32中任一项所述的方法。The memory is used to store computer program code, and the computer program code includes computer instructions. When the processor executes the computer instructions, the terminal device executes the method as described in any one of claims 25-32. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在终端设备上运行时,使得所述终端设备执行如权利要求25-32任一项所述的方法。A computer-readable storage medium, comprising instructions, characterized in that when the instructions are executed on a terminal device, the terminal device executes the method according to any one of claims 25 to 32.
PCT/CN2024/140387 2023-12-22 2024-12-18 Picture editing method and system, and related device Pending WO2025130944A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202311795993.1A CN120196252A (en) 2023-12-22 2023-12-22 Image editing method, related equipment and system
CN202311795993.1 2023-12-22

Publications (1)

Publication Number Publication Date
WO2025130944A1 true WO2025130944A1 (en) 2025-06-26

Family

ID=96072459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/140387 Pending WO2025130944A1 (en) 2023-12-22 2024-12-18 Picture editing method and system, and related device

Country Status (2)

Country Link
CN (1) CN120196252A (en)
WO (1) WO2025130944A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120602745A (en) * 2025-06-25 2025-09-05 抖音视界有限公司 Content generation method, device, electronic device, storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005234912A (en) * 2004-02-19 2005-09-02 Fuji Xerox Co Ltd Image display device
CN112801907A (en) * 2021-02-03 2021-05-14 北京字节跳动网络技术有限公司 Depth image processing method, device, equipment and storage medium
CN114943789A (en) * 2022-03-28 2022-08-26 华为技术有限公司 Image processing method, model training method and related device
CN116883307A (en) * 2023-08-11 2023-10-13 维沃移动通信有限公司 Image processing method, device, electronic equipment and readable storage medium
CN116958325A (en) * 2023-07-24 2023-10-27 腾讯科技(深圳)有限公司 Training method and device for image processing model, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005234912A (en) * 2004-02-19 2005-09-02 Fuji Xerox Co Ltd Image display device
CN112801907A (en) * 2021-02-03 2021-05-14 北京字节跳动网络技术有限公司 Depth image processing method, device, equipment and storage medium
CN114943789A (en) * 2022-03-28 2022-08-26 华为技术有限公司 Image processing method, model training method and related device
CN116958325A (en) * 2023-07-24 2023-10-27 腾讯科技(深圳)有限公司 Training method and device for image processing model, electronic equipment and storage medium
CN116883307A (en) * 2023-08-11 2023-10-13 维沃移动通信有限公司 Image processing method, device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN120196252A (en) 2025-06-24

Similar Documents

Publication Publication Date Title
US11914850B2 (en) User profile picture generation method and electronic device
WO2021027476A1 (en) Method for voice controlling apparatus, and electronic apparatus
CN117351115A (en) Training method of image generation model, image generation method, device and equipment
CN112262563A (en) Image processing method and electronic device
WO2020192692A1 (en) Image processing method and related apparatus
CN116193275B (en) Video processing method and related equipment
CN113011328B (en) Image processing method, device, electronic equipment and storage medium
CN114168128A (en) Method for generating responsive page, graphical user interface and electronic equipment
WO2022088946A1 (en) Method and apparatus for selecting characters from curved text, and terminal device
CN117671473B (en) Underwater target detection model and method based on attention and multi-scale feature fusion
WO2022170982A1 (en) Image processing method and apparatus, image generation method and apparatus, device, and medium
CN114117269B (en) Memo information collection method and device, electronic equipment and storage medium
WO2025130944A1 (en) Picture editing method and system, and related device
CN113888432B (en) Image enhancement method and device for image enhancement
CN116304146B (en) Image processing methods and related devices
CN116468882B (en) Image processing methods, devices, equipment, storage media
EP4227807A1 (en) Method for displaying information on electronic device, and electronic device
WO2023241544A1 (en) Component preview method and electronic device
CN116757963B (en) Image processing method, electronic device, chip system and readable storage medium
CN117690147B (en) Text recognition method and electronic device
CN116522400B (en) Image processing method and terminal equipment
CN117764853B (en) Face image enhancement method and electronic equipment
CN114757955B (en) Target tracking method and electronic device
CN118035390A (en) Method, terminal device and server for generating summary
CN117850731A (en) Automatic reading method and device based on terminal device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24906423

Country of ref document: EP

Kind code of ref document: A1