HK1262709A1

HK1262709A1 - Image processing method, device, mobile terminal and computer readable storage medium

Info

Publication number: HK1262709A1
Application number: HK19122682.8A
Authority: HK
Inventors: 刘耀勇
Original assignee: Oppo 广东移动通信有限公司
Filing date: 2019-04-22
Publication date: 2020-01-17

Description

Image processing method, image processing device, mobile terminal and computer readable storage medium

Technical Field

The present application relates to the field of computer applications, and in particular, to an image processing method and apparatus, a mobile terminal, and a computer-readable storage medium.

Background

With the continuous development of mobile terminals, especially the emergence of smart phone devices, almost all mobile terminal devices have a photographing function. However, most users do not have professional shooting skills, and cannot reasonably use composition and set parameters, so that photos with high ornamental value cannot be shot, and user experience is reduced.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, a mobile terminal and a computer readable storage medium, which can improve the ornamental value of an image.

An image processing method comprising:

acquiring a preview image to be processed;

identifying scene information of the preview image;

determining a composition mode corresponding to the scene information;

and composing the preview image according to the composition mode.

An image processing apparatus comprising:

the acquisition module is used for previewing the image to be processed;

the identification module is used for identifying scene information of the preview image;

the determining module is used for determining a composition mode corresponding to the scene information;

and the composition module is used for composing the preview image according to the composition mode.

A mobile terminal comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of the image processing method.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image processing method.

The image processing method, the image processing device, the mobile terminal and the computer readable storage medium in the embodiment of the application acquire the preview image to be processed; identifying scene information of the preview image; determining a composition mode corresponding to the scene information; and composing the preview image according to the composition mode, so that the processed image has higher ornamental value.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow diagram of a method of image processing in one embodiment;

FIG. 2 is a schematic diagram of an embodiment of a neural network;

FIG. 3 is a diagram illustrating categories of scenes captured in one embodiment;

FIG. 4 is a flow diagram of a method for identifying scene information of a preview image based on a neural network, according to one embodiment;

FIG. 5 is a schematic diagram of an architecture of a neural network in another embodiment;

FIG. 6 is a flow diagram of another embodiment of a method for identifying scene information of a preview image based on a neural network;

FIG. 7 is a diagram illustrating a bounding box of a foreground object in a preview image in one embodiment;

FIG. 8 is a flow diagram of a method for determining a composition mode for a preview image based on scene information in one embodiment;

FIG. 9 is a flowchart of a method of determining a composition mode of a preview image based on scene information in another embodiment;

FIG. 10 is a flow diagram of a method for composing a preview image based on scene information and a composition mode in one embodiment;

FIG. 11 is a block diagram showing the configuration of an image processing apparatus according to an embodiment;

fig. 12A is a schematic internal structure diagram of a mobile terminal according to an embodiment;

FIG. 12B is a diagram illustrating an internal configuration of a server in one embodiment;

FIG. 13 is a schematic diagram of an image processing circuit in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

FIG. 1 is a flow diagram of a method of image processing in one embodiment. As shown in fig. 1, an image processing method includes steps 102 to 108.

Step 102: and acquiring a preview image to be processed.

In this embodiment, the preview image to be processed may be a continuous multi-frame preview image, and the continuous multi-frame preview image may be a preview image of two or more continuous frames. The continuous multi-frame preview image may refer to a multi-frame preview image acquired by a camera of the computer device within a preset time. For example, if a camera of the computer device acquires 3 preview images within 0.1 second, the 3 preview images can be regarded as a continuous multi-frame preview image.

In one embodiment, the computer device is further provided with a plurality of preview windows, each of which displays a corresponding one of the preview images.

Step 104: scene information of the preview image is identified.

In this embodiment, scene information of the preview image is identified based on the neural network. It should be understood that the Neural Network may be a Convolutional Neural Network (CNN), and the CNN refers to a Neural Network model developed on the basis of a conventional multi-layer Neural Network for image classification and recognition, and introduces a Convolutional algorithm and a pooling algorithm compared to the conventional multi-layer Neural Network. The convolution algorithm is a mathematical algorithm for weighting and superposing data in the local area, and the pooling algorithm is a mathematical algorithm for sampling and processing data in the local area.

Specifically, the CNN model is composed of convolution layers and pooling layers alternately, as shown in fig. 2, the input layer 210 inputs a preview image, the convolution layer 220 performs image feature extraction on each local region of the image of the input layer, the pooling layer 230 samples the image features of the convolution layers to reduce the dimension, then the image features are connected by a plurality of full-connection layers 240, and the output value of the last hidden layer 250 is the finally extracted feature. And identifying scene information according to the finally extracted features, wherein the scene information comprises background class information and foreground object class information.

In one embodiment, a softmax analyzer is configured after the last hidden layer 250 of the convolutional neural network, and the probability of the class corresponding to the background and the probability of the class corresponding to the foreground object in the image can be obtained by analyzing the finally extracted features through the softmax analyzer.

It should be noted that before the neural network is used to identify the background category and the foreground object of the preview image, the neural network needs to be trained, and the training process includes:

firstly, a training image including at least one background training target (including landscape, beach, snow scene, blue sky, greenbelt, night scene, darkness, backlight, sunrise/sunset, indoor, fireworks, spot lights, etc.) and a foreground training target (including main objects: portrait, baby, cat, dog, food, etc.) is input into a neural network, the neural network extracts features according to the background training target and the foreground training target, extracts features such as SIFT (Scale-innovative feature transform) features, Histogram of Oriented Gradient (HOG) features, etc., and detects the background training target through a target detection algorithm such as ssd (ssd) (single shot multi box detector), vgg (visual Geometry group) to obtain a first prediction confidence coefficient, and detects the foreground training target to obtain a second prediction confidence coefficient. The first prediction confidence is the confidence that a certain pixel point in the background area in the training image predicted by the neural network belongs to the background training target. The second prediction confidence is the confidence that a certain pixel point in the foreground region in the training image predicted by the neural network belongs to the foreground training target. The training image can be pre-labeled with a background training target and a foreground training target to obtain a first real confidence and a second real confidence. The first real confidence coefficient represents the confidence coefficient that the pixel point labeled in advance in the training image belongs to the background training target. The second real confidence coefficient represents the confidence coefficient that the pixel point labeled in advance in the training image belongs to the foreground training target. For each pixel point in the image, the true confidence may be represented as 1 (or a positive value) and 0 (or a negative value), which are respectively used to represent that the pixel point belongs to the training target and does not belong to the training target.

And secondly, solving the difference between the first prediction confidence coefficient and the first real confidence coefficient to obtain a first loss function, and solving the difference between the second prediction confidence coefficient and the second real confidence coefficient to obtain a second loss function. The first loss function and the second loss function can adopt logarithmic function, hyperbolic function, absolute value function and the like.

And finally, carrying out weighted summation on the first loss function and the second loss function to obtain a target loss function, adjusting parameters of the neural network according to the target loss function, and training the neural network.

In one embodiment, as shown in FIG. 3, the captured scene of the training image may include a category of background regions, foreground objects, and others. The categories of background areas may include landscape, beach, snow, blue sky, greenfield, night scene, darkness, backlight, sunrise/sunset, indoors, fireworks, spotlights, etc. The foreground object may be a portrait, an infant, a cat, a dog, a gourmet, etc. Others may be text documents, microspur, etc.

Step 106: and determining a composition mode corresponding to the scene information.

In one embodiment, the scene information includes background category information and foreground object category information. Wherein, the background category information comprises scenery, beach, snow scenery, blue sky, green land, night scenery, darkness, backlight, sunrise/sunset, indoor, firework, spotlight and the like; the foreground category information includes portrait, baby, cat, dog, food, etc.

In one embodiment, the composition pattern includes a squared figure, a cross figure, a triangle figure, a diagonal figure, and the like.

Specifically, different scene information corresponds to different composition modes. At least one composition mode corresponding to different scene information is stored in the computer equipment in advance, and after the scene information of the preview image is determined, the computer equipment calls the composition mode corresponding to the scene information. For example, when the scene information is: when the scene + the portrait (i.e. the background category information is the scene and the foreground object category information is the portrait), the computer device may invoke the nine-grid-graph mode to make the portrait at the golden section position in the preview image; when the scene information is: when the landscape + the food (i.e., the background category information is landscape and the foreground object category information is food), the computer device may invoke the triangle composition mode to highlight the foreground object food.

In one embodiment, the same scene information may correspond to a plurality of composition modes. For example, scene information of landscape + portrait may correspond to a nine-grid drawing mode and may also correspond to a triangle drawing mode. Specifically, the composition mode may be further screened according to the foreground object category information, for example, in scene information of landscape plus portrait, if the number of the portrait is large (more than 3), the nine-grid composition mode may be selected, so that each portrait is at the display position required by the nine-grid composition mode; if the number of the figures is one, the triangle composition mode can be selected to make the figures highlighted.

Step 108: and composing the preview image according to the composition mode.

In this embodiment, different scene information corresponds to the same or different composition modes, and differential composition processing on the preview image can be realized according to different composition modes. For example, the composition pattern includes a squared figure, a cross figure, a triangle figure, a diagonal figure, and the like. Among them, the nine-grid composition mode belongs to a form of the golden section, that is, a preview image is equally divided into nine blocks, and at four corners of points on a center block, a subject target position is arranged with a position of an arbitrary point.

It should be noted that different composition modes corresponding to different scene information are stored in the computer device in advance, and the preview image is composed according to the detected scene information and the corresponding composition mode.

The image processing method comprises the steps of obtaining a preview image to be processed; identifying scene information of the preview image; determining a composition mode corresponding to the scene information; the preview image is composed according to the composition mode, so that automatic identification of scene information of the preview image is realized, different scene information is automatically matched with the corresponding composition mode, subsequent shooting adjustment prompt is carried out on the preview image according to the different scene information and the corresponding composition mode, and the processed image has higher ornamental value.

In an embodiment, the image processing method further includes: and respectively displaying the preview images after the composition processing by adopting a plurality of preview windows. Specifically, a plurality of preview windows displaying images are provided in a screen of a computer device, and each preview window is set as a window displaying one frame of a preview image. More specifically, the preview images subjected to the composition processing may be displayed through the plurality of preview windows, respectively. In one embodiment, the preview images adopt different composition modes, each frame of preview image is displayed through a preview window after composition processing, and a user can compare the composition effect of the preview image according to the image displayed in each preview window and store the preview image of one frame according to the comparison result.

In an embodiment, the scene information includes background category information and foreground object category information, and as shown in fig. 4, the method for identifying the scene information of the preview image includes steps 402 to 410. :

step 402: and extracting the features of the preview image by adopting a basic network of a neural network to obtain feature data.

Step 404: and inputting the characteristic data into a classification network of a neural network to perform classification detection on the background of the preview image, and outputting a first confidence map. Each pixel point in the first confidence map represents the confidence that each pixel point in the preview image belongs to the background detection target.

Step 406: and inputting the characteristic data into a target detection network of the neural network to detect a foreground target of the preview image and outputting a second confidence map. And each pixel point in the second confidence map represents the confidence that each pixel point in the preview image belongs to the foreground detection target.

Step 408: and weighting according to the first confidence map and the second confidence map to obtain a final confidence map of the preview image.

Step 410: and determining the background class information and the foreground object class information of the preview image according to the final confidence map.

In this embodiment, as shown in fig. 5, the neural network includes a base network 510, a classification network 520, and an object detection network 530. Wherein, the basic network 510 is used to extract the feature data of the preview image; respectively inputting the characteristic data into a classification network 520 and a target detection network 530, performing classification detection on the background of the preview image through the classification network 520 to obtain a to-be-detected first confidence map, and performing target detection on the foreground of the preview image through the target detection network 530 to obtain a second confidence map; weighting according to the first confidence map and the second confidence map to obtain a final confidence map of the preview image; and determining the background class information and the foreground object class information of the preview image according to the final confidence map.

It should be noted that, in statistics, the confidence interval of a probability sample is an interval estimate of some overall parameter of the sample. The confidence interval exhibits the extent to which the true value of this parameter has a certain probability of falling around the measurement. Confidence is the confidence level of the measured value of the measured parameter.

In an embodiment, the scene information further includes foreground object position information, and as shown in fig. 6, the method for identifying scene information of the preview image includes steps 602 to 606.

Step 602: and detecting the foreground target position of the preview image by adopting a target detection network of a neural network, and outputting a detection image of the boundary frame. The detection image of the boundary frame comprises corresponding vectors of all pixel points in the preview image, the corresponding vectors represent the position relation between the corresponding pixel points and the corresponding detection boundary frame, and the detection boundary frame is a boundary frame of a foreground object detected in the preview image by adopting a neural network.

Step 604: and weighting according to the first confidence map, the second confidence map and the boundary box detection map to obtain a final confidence map of the preview image.

Step 606: and determining the background category information, the foreground object category information and the foreground object position information of the preview image according to the final confidence map.

Specifically, referring to fig. 7, the bounding box detection map 710 includes a corresponding vector for each pixel point in the bounding box, where the corresponding vector represents the position relationship between the corresponding pixel point and the corresponding bounding box. The vectors of the corresponding pixels in the bounding box detection map 710 determine a first four-dimensional vector and a second four-dimensional vector. The first four-dimensional vector is x ═ x₁,x₂,x₃,x₄) The elements in the first four-dimensional vector are the distances from the pixel point to the upper, lower, left and right boundaries of the boundary block diagram 710 of the foreground target; the second four-dimensional vector is x ═ x₁’,x₂’,x₃’,x₄') of the preview image, the elements in the second four-dimensional vector are the distances from the pixel point to the upper, lower, left and right borders of the detection map 700 of the bounding box of the preview image corresponding to the pixel point, respectively. It is understood that the foreground object position can be determined by detecting the second four-dimensional vectors corresponding to all the pixel points in the bounding box detection map 710. In an embodiment, the target detection network of the neural network detects a foreground target of the preview image, outputs a second confidence map and a bounding box detection map 710, and performs weighting according to the first confidence map, the second confidence map and the bounding box detection 710 to obtain a final confidence map of the preview image; and determining the background category information, the foreground object category information and the foreground object position information of the preview image according to the final confidence map. By further, the area of the bounding box detection map 710 of the foreground object is X ═ X (X ═ X)₁+x₂)*(x₃+x₄). It should be noted that the bounding box detection diagram 710 in this embodiment is a rectangular block diagram, and in other embodiments, the bounding box detection diagram is a block diagram with an arbitrary shape, which is not specifically limited herein.

In one embodiment, as shown in fig. 8, determining a composition mode corresponding to scene information includes steps 802 to 804.

Step 802: and generating related composition characteristic data according to the scene information.

Step 802: and when the composition characteristic data accords with the preset composition characteristic data, acquiring a composition mode corresponding to the composition characteristic data from the preset composition mode.

In this embodiment, the scene information includes background category information and foreground object category information; the composition feature data comprises background category data and the size, the position, the background environment and the like of the foreground object. Specifically, the computer device stores a large number of various preset composition patterns in advance, each preset composition pattern corresponds to matched preset composition feature data, and when the composition feature data accord with the preset composition feature data, the computer device obtains the composition pattern corresponding to the composition feature data from the preset composition patterns. For example, when the scene information of the preview image is a scene of landscape + portrait, composition feature data (size, position, landscape type, and the like) related to the scene information is generated, the composition feature data is compared with pre-stored pre-set composition feature data for analysis, and when the composition feature data is matched with the pre-set composition feature data, a composition pattern of landscape + portrait corresponding to the composition feature data is obtained from the pre-set composition pattern. Specifically, a large number of excellent composition patterns corresponding to different scene information (e.g., landscape + portrait) are pre-stored in the computer device, and each composition pattern corresponds to a set of composition feature data, so that by analyzing the composition feature data in a comparative manner, an optimal composition pattern of the preview image can be determined.

In one embodiment, determining a composition mode corresponding to the scene information includes: and determining a composition mode of the preview image according to the background category information and the foreground object category information. Specifically, the computer device stores at least one scene type in a memory in advance, and calls a corresponding composition mode according to the scene type after the scene type is determined. For example, when the background category information is landscape and the foreground object category information is portrait, i.e. the scene type of landscape + portrait, the corresponding composition mode is: nine-grid pattern mode; the composition processing result according to the scene information and the composition mode is as follows: and determining one third position of the preview image as the composition position of each portrait. When the background category information is landscape and the foreground object category information is food, namely the scene type of landscape + food, the corresponding composition mode is as follows: nine-grid pattern mode; the composition processing result according to the scene information and the composition mode is as follows: and determining that the central position of the preview image is the composition position of the food.

In one embodiment, as shown in fig. 9, the scene information includes foreground object category information, and determining a composition mode corresponding to the scene information includes steps 902 to 906.

Step 902: and determining a main object of the preview image according to the foreground object category information.

Step 904: the area of the subject target is obtained.

Step 906: and determining a composition mode of the preview image according to the area of the region.

In this embodiment, a target detection network of a neural network is used to detect the category of a foreground target to determine a main target of a preview image, and a bounding box detection map is output to obtain a region area of the main target, and a composition position of the main target is determined according to the region area of the main target. Specifically, referring to fig. 7, the area of the region of the subject object may be obtained by detecting the image through the bounding box of the subject object, and when the area of the region of the subject object is larger than a preset area, it may be determined that the corresponding preview image is close shot, and at this time, the composition mode of the preview image may be determined. For example, in a triangle composition mode, a subject object is arranged at the center position of the preview image to highlight the subject object. In other embodiments, a trisection composition mode may also be employed, with subject objects arranged at the golden section line of the preview image and other foreground objects arranged near the golden section line, to make the preview image appear compact and powerful.

In one embodiment, an image processing method further includes: and composing the preview image according to the scene information and the composition mode. Specifically, different scene information corresponds to the same or different composition modes, and composition processing of the preview image can be realized according to the scene information and the composition modes. For example, when the scene information is: landscape + portrait(s), the corresponding composition pattern is: nine-grid pattern mode; the composition processing result according to the scene information and the composition mode is as follows: and determining one third position of the preview image as the composition position of each portrait. When the scene information is: landscape + food, the corresponding composition pattern is: nine-grid pattern mode; the composition processing result according to the scene information and the composition mode is as follows: and determining that the central position of the preview image is the composition position of the food.

In one embodiment, as shown in fig. 10, the scene information includes foreground category information and foreground object position information, and the preview image is composed according to the scene information and a composition mode, including steps 1002 to 1004.

Step 1002: and determining the preset composition position of the foreground target according to the foreground target category information and the composition mode.

Step 1004: and determining the composition position of the foreground target according to the preset composition position and the foreground target position information.

Specifically, the preset composition position is different for different foreground object classes and composition modes. For example, when the foreground object category is a portrait, the preset composition position of the portrait may be one third of the image frame according to the squared figure pattern; when the foreground object category is a gourmet, the preset composition position of the gourmet may be at a center position of the image frame.

And determining the actual composition position of the foreground target according to the preset composition position and the foreground target position information. For example, foreground object location information (x) may be obtained by bounding box detection map₁’,x₂’,x₃’,x₄') (see second four-dimensional vector in fig. 7), the preset constellation position of the foreground subject determined as described above is (y)₁’,y₂’,y₃’,y₄') the actual mapping position of the foreground object is (z)₁’,z₂’,z₃’,z₄'), the actual composition position of the foreground subject can be calculated using the following equations (1) (2) (3) (4):

z₁’＝(x₁’+y₁’)/2； (1)

z₂’＝(x₂’+y₂’)/2； (2)

z₃’＝(x₃’+y₃’)/2； (3)

z₄’＝(x₄’+y₄’)/2； (4)

in this embodiment, the actual composition position of the foreground target is calculated according to the foreground target position information (the coordinate position of the four-dimensional vector) and the preset composition position of the foreground target, and the composition guidance schemes of different composition modes including different foreground targets are unified into one scheme, so that the photographer can learn and operate more easily and easily, and the user experience is improved.

Fig. 11 is a block diagram showing the configuration of an image processing apparatus according to an embodiment. As shown in fig. 11, an image processing apparatus includes an acquisition module 1110, a recognition module 1120, a determination module 1130, and a composition module 1140. Wherein:

the obtaining module 1110: for obtaining a preview image to be processed.

The identification module 1120: scene information for identifying the preview image.

The determination module 1130: for determining a composition mode corresponding to the scene information.

Patterning module 1140: and the preview image is composed according to the composition mode.

In the embodiment of the present application, a preview image to be processed is acquired by an acquiring module 1110; the identifying module 1120 identifies scene information of the preview image; the determining module 1130 determines a composition mode corresponding to the scene information; the composition module 1140 performs composition on the preview image according to the composition mode, so as to realize automatic identification of scene information of the preview image and automatic matching of different scene information with corresponding composition modes, thereby performing subsequent photographing adjustment prompt on the preview image according to different composition modes, and enabling the processed image to have higher ornamental value.

In one embodiment, the identification module 1120 further comprises:

and the characteristic extraction unit is used for extracting the characteristics of the preview image by adopting a basic network of the neural network to obtain characteristic data.

A classification unit: the device comprises a neural network, a background detection module, a first confidence map and a second confidence map, wherein the neural network is used for classifying and detecting the background of the preview image and outputting the first confidence map; each pixel point in the first confidence map represents the confidence that each pixel point in the preview image belongs to the background detection target.

The target detection unit is used for detecting the foreground target of the preview image by adopting a target detection network of a neural network and outputting a second confidence map; and each pixel point in the second confidence coefficient graph represents the confidence coefficient of each pixel point in the preview image, which belongs to the foreground detection target.

A calculation unit: and the final confidence map of the preview image is obtained by weighting according to the first confidence map and the second confidence map.

And the first determining unit is used for determining the background class information and the foreground object class information of the preview image according to the final confidence map.

In one embodiment, the object detection unit further comprises:

a target position detection subunit: the detection method comprises the steps of detecting the foreground target position of a preview image by adopting a target detection network of a neural network, and outputting a boundary frame detection image, wherein the boundary frame detection image comprises corresponding vectors of all pixel points in the preview image, the corresponding vectors represent the position relation between corresponding pixel points and corresponding detection boundary frames, and the detection boundary frames are boundary frames of the foreground target detected in the image to be detected by adopting the neural network.

In an embodiment, the calculation unit is further configured to perform weighting according to the first confidence map, the second confidence map, and the bounding box detection map to obtain a final confidence map of the preview image.

In one embodiment, the first determining unit is further configured to determine background category information, foreground object category information, and foreground object position information of the preview image according to the final confidence map.

In one embodiment, the determining module 1130 further includes:

and the generating unit is used for generating related composition characteristic data according to the scene information.

And the second determining unit is used for acquiring a composition mode corresponding to the composition feature data from a preset composition mode when the composition feature data accords with preset composition feature data.

In one embodiment, the determining module 1130 further includes:

and the third determining unit is used for determining the composition mode of the preview image according to the background class information and the foreground object class information.

In one embodiment, the determining module 1130 further includes:

and the fourth determining unit is used for determining the main object of the preview image according to the foreground object category information.

An area acquisition unit for acquiring a region area of the subject target.

And a fifth determining unit for determining a composition mode of the preview image according to the area of the region.

In one embodiment, the composition module 1140 is further configured to composition the preview image according to the scene information and the composition mode.

In one embodiment, patterning module 1140, further comprises:

and the sixth determining unit is used for determining the preset composition position of the foreground object according to the foreground object category information and the composition mode.

And the seventh determining unit is used for determining the composition position of the foreground target according to the preset composition position and the foreground target position information.

It should be understood that, although the steps in the flowcharts of fig. 1, 4, 6, 8, 9, 10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1, 4, 6, 8, 9, and 10 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.

The division of the modules in the image processing apparatus is only for illustration, and in other embodiments, the image processing apparatus may be divided into different modules as needed to complete all or part of the functions of the image processing apparatus.

The embodiment of the application also provides the mobile terminal. The mobile terminal comprises a memory and a processor, wherein the memory is stored with a computer program, and the computer program causes the processor to execute the steps of the image processing method when being executed by the processor.

The embodiment of the application also provides a computer readable storage medium. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image processing method.

Fig. 12A is a schematic diagram of an internal structure of the mobile terminal in one embodiment. As shown in fig. 12A, the mobile terminal includes a processor, a memory, and a network interface connected through a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole mobile terminal. The memory is used for storing data, programs and the like, and the memory stores at least one computer program which can be executed by the processor to realize the wireless network communication method suitable for the mobile terminal provided by the embodiment of the application. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor to implement a neural network model processing method or an image processing method provided in the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The network interface may be an ethernet card or a wireless network card, and is used for communicating with an external mobile terminal. The mobile terminal can be a mobile phone, a tablet computer, a personal digital assistant or a wearable device.

Fig. 12B is a schematic diagram of an internal structure of a server (or a cloud, etc.) in one embodiment. As shown in fig. 12B, the server includes a processor, a nonvolatile storage medium, an internal memory, and a network interface connected through a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole mobile terminal. The memory is used for storing data, programs and the like, and the memory stores at least one computer program which can be executed by the processor to realize the wireless network communication method suitable for the mobile terminal provided by the embodiment of the application. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor to implement a neural network processing method or an image processing method provided in the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The network interface may be an ethernet card or a wireless network card, and is used for communicating with an external mobile terminal. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers. Those skilled in the art will appreciate that the architecture shown in fig. 12B is a block diagram of only a portion of the architecture associated with the subject application, and does not constitute a limitation on the servers to which the subject application applies, as a particular server may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The implementation of each module in the neural network model processing apparatus or the image processing apparatus provided in the embodiments of the present application may be in the form of a computer program. The computer program may be run on a mobile terminal or a server. The program modules constituted by the computer program may be stored on the memory of the mobile terminal or the server. Which when executed by a processor, performs the steps of the method described in the embodiments of the present application.

A computer program product comprising instructions which, when run on a computer, cause the computer to perform a neural network model processing method or an image processing method.

The embodiment of the application also provides the mobile terminal. The mobile terminal includes an Image Processing circuit, which may be implemented using hardware and/or software components, and may include various Processing units defining an ISP (Image Signal Processing) pipeline. FIG. 13 is a schematic diagram of an image processing circuit in one embodiment. As shown in fig. 13, for convenience of explanation, only aspects of the image processing technique related to the embodiment of the present application are shown.

As shown in fig. 13, the image processing circuit includes an ISP processor 1340 and a control logic 1350. The image data captured by the imaging device 1310 is first processed by the ISP processor 1340, and the ISP processor 1340 analyzes the image data to capture image statistics that may be used to determine and/or control one or more parameters of the imaging device 1310. The imaging device 1310 may include a camera with one or more lenses 1312 and an image sensor 1314. The image sensor 1314 may include an array of color filters (e.g., Bayer filters), and the image sensor 1314 may acquire light intensity and wavelength information captured with each imaging pixel of the image sensor 1314 and provide a set of raw image data that may be processed by the ISP processor 1340. The sensor 1320 (e.g., a gyroscope) may provide parameters of the acquired image processing (e.g., anti-shake parameters) to the ISP processor 1340 based on the type of interface of the sensor 1320. The sensor 1320 interface may utilize a SMIA (Standard Mobile Imaging Architecture) interface, other serial or parallel camera interfaces, or a combination of the above.

In addition, the image sensor 1314 may also send raw image data to the sensor 1320, the sensor 1320 may provide the raw image data to the ISP processor 1340 based on the sensor 1320 interface type, or the sensor 1320 may store the raw image data in the image memory 1330.

ISP processor 1340 processes the raw image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and ISP processor 1340 may perform one or more image processing operations on the raw image data, collecting statistics about the image data. Wherein the image processing operations may be performed with the same or different bit depth precision.

ISP processor 1340 may also receive image data from image memory 1330. For example, the sensor 1320 interface sends raw image data to the image memory 1330, and the raw image data in the image memory 1330 is then provided to the ISP processor 1340 for processing. The image Memory 1330 may be a portion of a Memory device, a storage device, or a separate dedicated Memory within the mobile terminal, and may include a DMA (Direct Memory Access) feature.

Upon receiving raw image data from image sensor 1314 interface or from sensor 1320 interface or from image memory 1330, ISP processor 1340 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to an image memory 1330 for additional processing before being displayed. ISP processor 1340 receives the processed data from image memory 1330 and performs image data processing on the processed data in the raw domain and in the RGB and YCbCr color spaces. The image data processed by ISP processor 1340 may be output to display 1370 for viewing by a user and/or further processed by a Graphics Processing Unit (GPU). Further, the output of the ISP processor 1340 may also be sent to an image memory 1330, and a display 1370 may read image data from the image memory 1330. In one embodiment, image memory 1330 may be configured to implement one or more frame buffers. In addition, the output of the ISP processor 1340 may be transmitted to an encoder/decoder 1360 for encoding/decoding image data. The encoded image data may be saved and decompressed before being displayed on the display 1370 device. The encoder/decoder 1360 may be implemented by a CPU or GPU or coprocessor.

The statistics determined by ISP processor 1340 may be transmitted to control logic 1350 unit. For example, the statistical data may include image sensor 1314 statistics such as auto-exposure, auto-white balance, auto-focus, flicker detection, black level compensation, lens 1312 shading correction, and the like. The control logic 1350 may comprise a processor and/or microcontroller executing one or more routines (e.g., firmware) that determine control parameters of the imaging device 1310 and control parameters of the ISP processor 1340 based on the received statistical data. For example, the control parameters of imaging device 1310 may include sensor 1320 control parameters (e.g., gain, integration time for exposure control, anti-shake parameters, etc.), camera flash control parameters, lens 1312 control parameters (e.g., focal length for focusing or zooming), or a combination of these parameters. The ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (e.g., during RGB processing), as well as lens 1312 shading correction parameters.

Any reference to memory, storage, database, or other medium used herein may include non-volatile and/or volatile memory. Suitable non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image processing method, comprising:

acquiring a preview image to be processed;

identifying scene information of the preview image; the scene information comprises foreground object category information, background category information and foreground object position information;

determining a composition mode corresponding to the scene information;

composition is carried out on the preview image according to the composition mode;

the determining of the composition mode corresponding to the scene information includes:

determining a main target of the preview image according to the foreground target category information;

acquiring the area of the region of the main target;

determining a composition mode of the preview image according to the area of the region;

the method for identifying the scene information of the preview image comprises the following steps:

extracting features of the preview image by adopting a basic network of a neural network to obtain feature data;

inputting the characteristic data into a classification network of the neural network to perform classification detection on the background of the preview image and outputting a first confidence map; each pixel point in the first confidence map represents the confidence that each pixel point in the preview image belongs to a background detection target;

inputting the characteristic data into a target detection network of the neural network to detect a foreground target of a preview image and outputting a second confidence map; each pixel point in the second confidence map represents the confidence that each pixel point in the preview image belongs to the foreground detection target;

detecting the foreground target position of the preview image by adopting a target detection network of a neural network, and outputting a boundary frame detection image, wherein the boundary frame detection image comprises corresponding vectors of all pixel points in the preview image, the corresponding vectors represent the position relation between corresponding pixel points and corresponding detection boundary frames, and the detection boundary frames are the boundary frames of the foreground target detected in the preview image by adopting the neural network;

weighting according to the first confidence map, the second confidence map and the boundary box detection map to obtain a final confidence map of the preview image;

and determining the background category information, the foreground object category information and the foreground object position information of the preview image according to the final confidence map.

2. The method according to claim 1, wherein the determining a composition mode corresponding to the scene information further comprises:

generating related composition characteristic data according to the scene information;

and when the composition characteristic data accords with preset composition characteristic data, acquiring a composition mode corresponding to the composition characteristic data from a preset composition mode.

3. The method of claim 1, further comprising:

and composing the preview image according to the scene information and the composition mode.

4. The method according to claim 3, wherein the composition of the preview image according to the scene information and the composition mode comprises:

determining a preset composition position of the foreground target according to the foreground target category information and the composition mode;

and determining the composition position of the foreground target according to the preset composition position and the foreground target position information.

5. An image processing apparatus comprising:

the acquisition module is used for acquiring a preview image to be processed;

the identification module is used for identifying scene information of the preview image; the scene information comprises foreground object category information, background category information and foreground object position information;

the composition module is used for composing the preview image according to the composition mode;

the determining module includes:

a fourth determining unit, configured to determine a main object of the preview image according to the foreground object category information;

an area acquisition unit configured to acquire a region area of the subject target;

a fifth determining unit, configured to determine a composition mode of the preview image according to the area of the region;

further comprising:

the characteristic extraction unit is used for extracting the characteristics of the preview image by adopting a basic network of a neural network to obtain characteristic data;

the classification unit is used for inputting the feature data into a classification network of the neural network to perform classification detection on the background of the preview image and outputting a first confidence map; each pixel point in the first confidence map represents the confidence that each pixel point in the preview image belongs to a background detection target;

the target detection unit is used for inputting the characteristic data into a target detection network of the neural network to detect a foreground target of the preview image and outputting a second confidence map; each pixel point in the second confidence map represents the confidence that each pixel point in the preview image belongs to the foreground detection target;

the calculation unit is used for weighting according to the first confidence map and the second confidence map to obtain a final confidence map of the preview image;

a first determining unit, configured to determine background category information and foreground object category information of the preview image according to the final confidence map;

the object detection unit includes:

the target position detection subunit is configured to detect a foreground target position of the preview image by using a target detection network of a neural network, and output a bounding box detection map, where the bounding box detection map includes a corresponding vector of each pixel point in the preview image, the corresponding vector represents a positional relationship between the corresponding pixel point and a corresponding detection bounding box, and the detection bounding box is a bounding box of the foreground target detected in the preview image by using the neural network;

the calculation unit is further configured to perform weighting according to the first confidence map, the second confidence map and the bounding box detection map to obtain a final confidence map of the preview image;

the first determining unit is further configured to determine background category information, foreground object category information, and foreground object position information of the preview image according to the final confidence map.

6. The apparatus of claim 5, wherein the determining module further comprises:

the generating unit is used for generating related composition characteristic data according to the scene information;

7. The apparatus of claim 5,

the composition module is further used for composing the preview image according to the scene information and the composition mode.

8. The apparatus of claim 7, wherein the patterning module further comprises:

a sixth determining unit, configured to determine a preset composition position of the foreground object according to the foreground object category information and the composition mode;

9. A mobile terminal comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of the image processing method according to any one of claims 1 to 4.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image processing method according to any one of claims 1 to 4.