WO2018090355A1

WO2018090355A1 - Method for auto-cropping of images

Info

Publication number: WO2018090355A1
Application number: PCT/CN2016/106548
Authority: WO
Inventors: 黄凯奇; 赫然; 考月英
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2016-11-21
Filing date: 2016-11-21
Publication date: 2018-05-24
Anticipated expiration: 2019-05-21

Abstract

The present invention relates to a method for auto-cropping of images. The method comprises: extracting the aesthetically pleasing diagram and the gradient energy diagram of a to-be-cropped image; densely extracting candidate cropped images from the to-be-cropped image; on the basis of the aesthetically pleasing diagram, screening the candidate cropped images; on the basis of the aesthetically pleasing diagram and the gradient energy diagram, estimating the composition score of the obtained candidate cropped images by means of screening, and determining the candidate cropped image having the highest score as a cropped image. The present invention uses the aesthetically pleasing diagram to explore the aesthetically pleasing region of a picture, and uses the aesthetically pleasing diagram to determine the aesthetically reserved portion, thus reserving to the greatest extent the high aesthetical quality of the cropped image. Moreover, the present invention also uses the gradient energy diagram to analyze a gradient distribution rule, and estimates, on the basis of the aesthetically pleasing diagram and the gradient energy diagram, the composition score of the cropped image. The present invention overcomes the defects of image composition representation, and solves the technical problem of how to improve the robustness and accuracy of automatic cropping of images.

Description

Image automatic cropping method

Technical field

本发明涉及模式识别、机器学习及计算机视觉技术领域，特别涉及一种图像自动裁剪方法。The invention relates to the field of pattern recognition, machine learning and computer vision technology, in particular to an image automatic cropping method.

Background technique

随着计算机技术和数字媒体技术的快速发展，人们对计算机视觉、人工智能、机器感知等领域的需求与期盼也越来越高。图像的自动裁剪作为图像自动编辑中的一项非常重要和常见的任务也得到越来越多的关注和发展。图像自动裁剪技术就是希望能够去除多余的区域，强调感兴趣区域，从而提高图像的整体构图和美感质量。一种有效并且自动的图像裁剪方法不仅能够使人类从繁琐的工作中解放出来，而且还能给一些非专业人士提供一些专业的图像编辑的建议。With the rapid development of computer technology and digital media technology, people's needs and expectations for computer vision, artificial intelligence, machine perception and other fields are also increasing. Automatic cropping of images as a very important and common task in image automatic editing has also received more and more attention and development. The image auto-cropping technique is designed to remove excess areas and emphasize areas of interest, thereby improving the overall composition and aesthetic quality of the image. An effective and automatic image cropping method not only frees humans from the tedious work, but also provides some professional image editing advice to some non-professionals.

由于图像裁剪是一项非常主观性的任务，现有的规则很难考虑所有影响因素。传统的图像自动裁剪区域通常使用显著性图来识别图像中的主要区域或感兴趣区域，同时通过制定的一些规则来计算能量函数最小化或学习分类器来寻找裁剪区域。但是这些制定的规则对图像裁剪这一主观性的任务并不够全面，精度也很难达到用户需求。Since image cropping is a very subjective task, it is difficult to consider all the influencing factors in existing rules. Conventional image auto-cropping regions typically use saliency maps to identify major regions or regions of interest in the image, while at the same time formulating rules to calculate energy function minimization or learning classifiers to find crop regions. However, these established rules are not comprehensive enough for the subjective task of image cropping, and the precision is difficult to meet user needs.

有鉴于此，特提出本发明。In view of this, the present invention has been specifically proposed.

发明内容Summary of the invention

为了解决现有技术中的上述问题，即为了解决如何提高图像自动裁剪的鲁棒性和精度的技术问题而提供一种图像自动裁剪方法。In order to solve the above problems in the prior art, an image automatic cropping method is provided in order to solve the technical problem of how to improve the robustness and precision of image automatic cropping.

为了实现上述目的，提供了以下技术方案：In order to achieve the above object, the following technical solutions are provided:

一种图像自动裁剪方法，所述方法包括：An image automatic cropping method, the method comprising:

提取待裁剪图像的美感响应图和梯度能量图；Extracting a beauty response map and a gradient energy map of the image to be cropped;

对所述待裁剪图像密集提取候选裁剪图像；Extracting a candidate cropped image intensively to the image to be cropped;

基于所述美感响应图，筛选所述候选裁剪图像；Filtering the candidate cropped image based on the aesthetic response map;

基于所述美感响应图和所述梯度能量图，估计筛选出的候选裁剪图像的构图分数，并将得分最高的候选裁剪图像确定为裁剪图像。 Based on the aesthetic response map and the gradient energy map, the composition score of the selected candidate cropped image is estimated, and the candidate cropped image with the highest score is determined as the cropped image.

进一步地，所述提取待裁剪图像的美感响应图和梯度能量图，具体包括：Further, the extracting the aesthetic response map and the gradient energy map of the image to be cropped include:

利用深度卷积神经网络和类别响应映射方法，并采用如下公式提取所述待裁剪图像的所述美感响应图：Using the deep convolutional neural network and the category response mapping method, and extracting the aesthetic response map of the image to be cropped by using the following formula:

其中，所述M(x,y)表示在空间位置(x,y)处的美感响应值；所述K表示深度卷积神经网络的最后一层卷积层的特征图的总通道个数；所述k表示第k个通道；所述f_k(x,y)表示所述第k个通道在所述空间位置(x,y)处的特征值；所述w_k表示所述第k个通道的特征图池化后的结果到高美感类别的权值；Wherein, said M(x, y) represents a beauty response value at a spatial position (x, y); said K represents a total number of channels of a feature map of a last layer of the convolutional layer of the deep convolutional neural network; The k represents a kth channel; the f _k (x, y) represents a feature value of the kth channel at the spatial position (x, y); the w _k represents the kth The weight of the feature map of the channel to the high aesthetic category;

对所述待裁剪图像进行平滑处理，并计算每个像素点的梯度值，从而得到所述梯度能量图。Smoothing the image to be cropped, and calculating a gradient value of each pixel to obtain the gradient energy map.

进一步地，所述深度卷积神经网络通过以下方式训练得到：Further, the deep convolutional neural network is trained by:

在所述深度卷积神经网络结构的底层设置卷积层；Forming a convolution layer on the bottom layer of the deep convolutional neural network structure;

在所述深度卷积神经网络结构的最后一个卷积层之后通过全局平均池化的方法，将每一特征图池化为一个点；After the last convolutional layer of the deep convolutional neural network structure, each feature map is pooled into a point by a global average pooling method;

连接与美感质量分类类别数相同的全连接层和损失函数。Connects the full connectivity layer and loss function with the same number of aesthetic quality classification categories.

进一步地，所述基于所述美感响应图，筛选所述候选裁剪图像，具体包括：Further, the filtering the candidate cropped image based on the aesthetic response map includes:

通过如下公式计算所述候选裁剪图像的美感保留分数：The aesthetic retention score of the candidate cropped image is calculated by the following formula:

其中，所述S_a(C)表示所述候选裁剪图像的所述美感保留分数；所述C表示所述候选裁剪图像；所述(i,j)表示像素的位置；所述I表示原始图像；所述A_(i,j)表示在(i,j)位置处的美感响应值；Wherein said S _a (C) represented by the candidate image of the cropped aesthetic retention score; C represents the cropped image of the candidate; the (i, j) represents the position of a pixel; I represents the original image The A _{(i, j)} represents an aesthetic response value at the (i, j) position;

将所有候选裁剪图像按照所述美感保留分数从大到小进行排序；Sorting all candidate cropped images according to the aesthetic retention scores from large to small;

选取得分最高的一部分候选裁剪图像。 Select the candidate crop image with the highest score.

进一步地，所述基于所述美感响应图和所述梯度能量图，估计筛选出的候选裁剪图像的构图分数，并将得分最高的候选裁剪图像确定为裁剪图像，具体包括：Further, the determining, according to the aesthetic response map and the gradient energy map, the composition score of the selected candidate cropped image, and determining the candidate cropped image with the highest score as the cropped image, specifically including:

基于所述美感响应图和所述梯度能量图建立构图模型；Forming a composition model based on the aesthetic response map and the gradient energy map;

利用所述构图模型估计所述筛选出的候选裁剪图像的构图分数，并将所述得分最高的候选裁剪图像确定为所述裁剪图像。A composition score of the selected candidate cropped image is estimated by using the composition model, and the candidate cropped image with the highest score is determined as the cropped image.

进一步地，所述构图模型通过以下方式获得：Further, the composition model is obtained by:

基于所述美感响应图和所述梯度能量图建立训练图像集；Establishing a training image set based on the aesthetic response map and the gradient energy map;

对训练图像进行美感质量类别的标注；Labeling the training image for aesthetic quality categories;

利用标注的训练图像训练深度卷积神经网络；Training a deep convolutional neural network with labeled training images;

针对所述已标注的训练图像，利用训练好的深度卷积神经网络，提取所述美感响应图和所述梯度能量图的空间金字塔特征；Extracting the aesthetic response map and the spatial pyramid feature of the gradient energy map by using the trained deep convolutional neural network for the labeled training image;

将提取的空间金字塔特征拼接在一起；Splicing the extracted spatial pyramid features together;

利用分类器进行训练，自动学习构图规则，得到构图模型。The classifier is used for training, and the composition rules are automatically learned to obtain a composition model.

本发明实施例提供一种图像自动裁剪方法。该方法包括：提取待裁剪图像的美感响应图和梯度能量图；对待裁剪图像密集提取候选裁剪图像；基于美感响应图，筛选候选裁剪图像；基于美感响应图和梯度能量图，估计筛选出的候选裁剪图像的构图分数，并将得分最高的候选裁剪图像确定为裁剪图像。本方案利用美感响应图去探究图片的美感影响区域，利用美感响应图确定美感保留部分，从而更加最大程度地保留了裁剪图像的高美感质量，同时本方案还利用梯度能量图去分析梯度分布规则，并且基于美感响应图和梯度能量图来评估裁剪图的构图分数。本发明实施例弥补了图像构图表达的缺陷，解决了如何提高图像自动裁剪的鲁棒性和精度的技术问题。本发明实施例能应用于涉及图像自动裁剪的众多领域，包括图像编辑、摄影学及图像重定位等。Embodiments of the present invention provide an image automatic cropping method. The method comprises: extracting a beauty response map and a gradient energy map of the image to be cropped; extracting the candidate cropped image intensively for the cropped image; screening the candidate cropped image based on the aesthetic response map; and estimating the selected candidate based on the aesthetic response map and the gradient energy map The composition score of the image is cropped, and the candidate crop image with the highest score is determined as the crop image. This scheme uses the aesthetic response map to explore the aesthetic influence area of the picture, and uses the aesthetic response map to determine the aesthetic retention part, so as to retain the high aesthetic quality of the cropped image to the greatest extent. At the same time, the scheme also uses the gradient energy map to analyze the gradient distribution rule. And evaluating the composition score of the cropped map based on the aesthetic response map and the gradient energy map. The embodiment of the invention compensates for the defects of image composition expression and solves the technical problem of how to improve the robustness and precision of image automatic cropping. Embodiments of the present invention can be applied to a wide variety of fields involving automatic image cropping, including image editing, photography, and image repositioning.

DRAWINGS

图1是根据本发明实施例的图像自动裁剪方法的流程示意图；1 is a schematic flow chart of an image auto-cropping method according to an embodiment of the present invention;

图2是根据本发明实施例的深度卷积神经网络的结构示意图；2 is a schematic structural diagram of a deep convolutional neural network according to an embodiment of the present invention;

图3a是根据本发明实施例的待裁剪图像示意图； FIG. 3a is a schematic diagram of an image to be cropped according to an embodiment of the present invention; FIG.

图3b是根据本发明实施例的裁剪后的图像示意图。Figure 3b is a schematic illustration of a cropped image in accordance with an embodiment of the present invention.

detailed description

下面结合附图以及具体实施例对本发明实施例解决的技术问题、所采用的技术方案以及实现的技术效果进行清楚、完整的描述。显然，所描述的实施例仅仅是本申请的一部分实施例，并不是全部实施例。基于本申请中的实施例，本领域普通技术人员在不付出创造性劳动的前提下，所获的所有其它等同或明显变型的实施例均落在本发明的保护范围内。本发明实施例可以按照权利要求中限定和涵盖的多种不同方式来具体化。The technical problems, the technical solutions, and the technical effects of the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings and specific embodiments. It is apparent that the described embodiments are only a part of the embodiments of the present application, not all of them. Based on the embodiments of the present application, all other equivalent or obvious variations of the embodiments obtained by those skilled in the art without departing from the scope of the present invention fall within the scope of the present invention. The embodiments of the invention may be embodied in many different ways as defined and claimed in the claims.

深度学习在各个领域得到了快速的发展及很好的效果。本发明实施例考虑利用深度学习去自动学习对图像裁剪重要的影响区域，以自动全面地学习规则，从而使得在裁剪时尽可能地保留高美感区域。Deep learning has achieved rapid development and good results in various fields. Embodiments of the present invention contemplate the use of deep learning to automatically learn an area of influence that is important for image cropping to automatically and comprehensively learn the rules so that the high-sensitivity area is preserved as much as possible during cropping.

为此，本发明实施例提供一种自动图像裁剪方法。图1示例性地示出了图像自动裁剪方法的流程。如图1所示，该方法可以包括：To this end, an embodiment of the present invention provides an automatic image cropping method. FIG. 1 exemplarily shows the flow of an image automatic cropping method. As shown in FIG. 1, the method can include:

S100：提取待裁剪图像的美感响应图和梯度能量图。S100: Extract a beauty response map and a gradient energy map of the image to be cropped.

具体地，本步骤可以包括：Specifically, this step may include:

S101：利用深度卷积神经网络和类别响应映射方法，并采用如下公式提取待裁剪图像的美感响应图：S101: Using a deep convolutional neural network and a category response mapping method, and extracting a beauty response map of the image to be cropped by using the following formula:

其中，M(x,y)表示在空间位置(x,y)处的美感响应值；K表示训练好的深度卷积神经网络的最后一层卷积层的特征图f的总通道个数；k表示第k个通道；f_k(x,y)表示第k个通道在空间位置(x,y)处的特征值；w_k表示第k个通道的特征图池化后的结果到高美感类别的权值。Where M(x, y) represents the aesthetic response value at the spatial position (x, y); K represents the total number of channels of the feature map f of the last layer of the convolutional layer of the trained deep convolutional neural network; k represents the kth channel; f _k (x, y) represents the eigenvalue of the kth channel at the spatial position (x, y); w _k represents the result of the feature map of the kth channel to the high aesthetic The weight of the category.

上述步骤在提取美感响应图时可以根据实际需要训练深度卷积神经网络。深度卷积神经网络的训练可以通过以下方式进行：The above steps can train the deep convolutional neural network according to actual needs when extracting the aesthetic response map. The training of deep convolutional neural networks can be done in the following ways:

步骤1：在深度卷积神经网络结构的底层设置卷积层。Step 1: Set up the convolution layer at the bottom of the deep convolutional neural network structure.

步骤2：在深度卷积神经网络结构的最后一个卷积层之后通过全局平均池化的方法，将每一个特征图池化为一个点。Step 2: After the last convolutional layer of the deep convolutional neural network structure, each feature map is pooled into a point by a global average pooling method.

步骤3：连接一个与美感质量分类类别数相同的全连接层和损失函数。 Step 3: Connect a fully connected layer and loss function with the same number of aesthetic quality classification categories.

图2示例性地示出了一个深度卷积神经网络结构。Fig. 2 exemplarily shows a deep convolutional neural network structure.

通过步骤1-3可以训练一个在美感质量分类任务下的深度卷积神经网络模型。然后，利用为美感质量分类任务训练好的深度卷积神经网络和类别响应映射方法；再采用上述公式，可以计算在高美感类别下待裁剪图像的美感响应图M。A deep convolutional neural network model under the aesthetic quality classification task can be trained through steps 1-3. Then, the deep convolutional neural network and the category response mapping method trained for the aesthetic quality classification task are utilized; and the above formula is used to calculate the aesthetic response map M of the image to be cropped under the high aesthetic category.

S102：对待裁剪图像进行平滑处理，并计算每个像素点的梯度值，从而得到梯度能量图。S102: Smoothing the cropped image, and calculating a gradient value of each pixel, thereby obtaining a gradient energy map.

S110：对待裁剪图像密集提取候选裁剪图像。S110: The candidate cropped image is intensively extracted from the cropped image.

这里，可以采用小于图像大小的所有大小的滑动窗口，对待裁剪图像密集提取候选裁剪窗口，通过候选裁剪窗口提取出候选裁剪图像。Here, a sliding window of all sizes smaller than the image size may be employed, the candidate cropping window is intensively extracted for the cropped image, and the candidate cropping image is extracted by the candidate cropping window.

S120：基于美感响应图，筛选候选裁剪图像。S120: Filter the candidate cropped image based on the aesthetic response map.

具体地，本步骤可以包括：Specifically, this step may include:

S121：通过如下公式计算候选裁剪图像的美感保留分数：S121: Calculate the aesthetic retention score of the candidate cropped image by the following formula:

其中，S_a(C)表示候选裁剪图像的美感保留分数；C表示候选裁剪图像；(i,j)表示像素的位置；I表示原始图像；A_(i,j)表示在(i,j)处的美感响应值。Where S _a (C) represents the aesthetic retention score of the candidate cropped image; C represents the candidate cropped image; (i, j) represents the position of the pixel; I represents the original image; A _{(i, j)} represents (i, j) The aesthetic response value.

通过本步骤可以构建美感保留模型。将候选裁剪窗口经过美感保留模型筛选出美感保留分数较高的候选窗口。Through this step, an aesthetic retention model can be constructed. The candidate cropping window is screened by the aesthetic retention model to select candidate windows with higher aesthetic retention scores.

S122：将所有候选裁剪图像按照美感保留分数从大到小进行排序。S122: Sort all candidate cropped images according to the aesthetic retention scores from large to small.

S123：选取得分最高的一部分候选裁剪图像。S123: Select a part of the candidate cropped image with the highest score.

例如：实际应用中可以设置保留前10000个候选裁剪窗口中的候选裁剪图像。For example, in a practical application, a candidate cropped image in the first 10,000 candidate cropping windows can be set.

S130：基于美感响应图和梯度能量图，估计筛选出的候选裁剪图像的构图分数，并将得分最高的候选裁剪图像确定为裁剪图像。S130: Estimating the composition score of the selected candidate cropped image based on the aesthetic response map and the gradient energy map, and determining the candidate cropped image with the highest score as the cropped image.

具体地，本步骤可以通过步骤S131至步骤S133来实现。Specifically, this step can be implemented by step S131 to step S133.

S131：基于美感响应图和梯度能量图建立构图模型。 S131: Establish a composition model based on the aesthetic response map and the gradient energy map.

本步骤在建立构图模型时可以根据实际情况训练构图模型。在训练构图模型的过程中，训练数据可以采用构图较好的图像作为正样本，而将有构图缺陷的图像作为负样本。This step can train the composition model according to the actual situation when building the composition model. In the process of training the composition model, the training data can use the image with better composition as the positive sample, and the image with the composition defect as the negative sample.

可以通过以下方式来训练构图模型：The composition model can be trained in the following ways:

步骤a：基于美感响应图和梯度能量图建立训练图像集。Step a: Establish a training image set based on the aesthetic response map and the gradient energy map.

步骤b：对训练图像进行美感质量类别的标注。Step b: labeling the training image with an aesthetic quality category.

步骤c：利用标注的训练图像训练深度卷积神经网络。Step c: Training the deep convolutional neural network with the labeled training images.

本步骤的训练过程可以参考上述步骤1至步骤3，在此不再赘述。For the training process of this step, refer to Step 1 to Step 3 above, and details are not described herein again.

步骤d：针对已标注的训练图像，利用训练好的深度卷积神经网络，提取美感响应图和梯度能量图的空间金字塔特征。Step d: extracting the spatial pyramid features of the aesthetic response map and the gradient energy map using the trained deep convolutional neural network for the labeled training images.

步骤e：将提取的空间金字塔特征拼接在一起。Step e: stitching the extracted spatial pyramid features together.

步骤f：利用分类器进行训练，自动学习构图规则，得到构图模型。Step f: using the classifier for training, automatically learning the composition rules, and obtaining the composition model.

其中，分类器例如可以采用支持向量机分类器。The classifier may be, for example, a support vector machine classifier.

S132：利用构图模型估计筛选出的候选裁剪图像的构图分数，并将得分最高的候选裁剪图像确定为裁剪图像。S132: Estimating the composition score of the selected candidate cropped image by using the composition model, and determining the candidate cropped image with the highest score as the cropped image.

图3a示例性地示出了待裁剪图像；图3b示例性地示出了裁剪后的图像。Fig. 3a exemplarily shows an image to be cropped; Fig. 3b exemplarily shows a cropped image.

下面再以一优选实施例来更好地说明本发明。The invention will now be better illustrated by a preferred embodiment.

步骤A：将标注有美感质量类别的图像数据集送入深度卷积神经网络进行美感质量类别模型训练。Step A: The image data set marked with the aesthetic quality category is sent to the deep convolutional neural network for aesthetic quality category model training.

步骤B：将标注有构图类别的图像数据集输入训练好的深度卷积神经网络，提取最后一层卷积层的特征图，并计算美感响应图，同时计算美感梯度图，然后采用支持向量机分类器训练构图模型。Step B: Input the image data set marked with the composition category into the trained deep convolutional neural network, extract the feature map of the last layer of the convolution layer, calculate the aesthetic response map, calculate the aesthetic gradient map, and then use the support vector machine. The classifier trains the composition model.

步骤C：对待测试图像提取美感响应图和梯度能量图。Step C: Extract the aesthetic response map and the gradient energy map from the test image.

本步骤的提取方法可参考训练阶段的方法。The extraction method of this step can refer to the method of the training phase.

步骤D：密集采集待测试图像的候选裁剪窗口。Step D: Intensively collecting candidate cropping windows of the image to be tested.

举例来说，在1000×1000的待测试图像上，利用间隔为30个像素的滑动窗口进行采集或提取。For example, on a 1000×1000 image to be tested, a sliding window with an interval of 30 pixels is used for acquisition or extraction.

步骤E：利用美感保留模型筛选候选裁剪窗口。 Step E: Filter the candidate cropping window using the aesthetic retention model.

本步骤利用美感保留模型计算密集采集到的候选裁剪窗口的美感保留分数，筛选出美感分类最高的一部分候选裁剪窗口，例如：筛选出10000个候选裁剪窗口。In this step, the aesthetic retention model is used to calculate the aesthetic retention score of the intensively collected candidate cropping window, and a part of the candidate cropping window with the highest aesthetic classification is selected, for example, 10,000 candidate cropping windows are selected.

步骤F：利用构图模型评估筛选出的候选裁剪窗口。Step F: Using the composition model to evaluate the selected candidate cropping window.

本步骤采集训练阶段训练好的构图模型去评估筛选出的候选裁剪窗口的构图分数，将得分最高的作为最后的裁剪窗口，从而得到裁剪图像。In this step, the composition model trained in the training phase is collected to evaluate the composition score of the selected candidate cropping window, and the highest score is taken as the final cropping window, thereby obtaining the cropped image.

综上所述，本发明实施例提供的方法很好地利用了美感响应图和梯度能量图来最大程度地保留美感质量和图像的构图规则，得到更加鲁棒，精度更高的图像的自动裁剪性能，进而说明了美感响应图和梯度能量图对于图像自动裁剪的有效性。In summary, the method provided by the embodiment of the present invention makes good use of the aesthetic response map and the gradient energy map to maximize the aesthetic quality and image composition rules, and obtains more robust and accurate image auto-cropping. Performance, in turn, illustrates the effectiveness of aesthetic response maps and gradient energy maps for automatic image cropping.

上述实施例中虽然按照上述先后次序描述了本发明实施例提供的方法，但是本领域技术人员可以理解，为了实现本实施例的效果，还可以以诸如并行或颠倒次序等不同的顺序来执行，这些简单的变化都在本发明的保护范围之内。Although the methods provided in the embodiments of the present invention are described in the foregoing embodiments, those skilled in the art may understand that, in order to implement the effects of the embodiments, they may also be executed in different orders, such as parallel or reverse order. These simple variations are all within the scope of the present invention.

以上所述，仅为本发明中的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉该技术的人在本发明所揭露的技术范围内，可理解想到的变换或替换，都应涵盖在本发明的包含范围之内，因此，本发明的保护范围应该以权利要求书的保护范围为准。 The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand the alteration or replacement within the scope of the technical scope of the present invention. The scope of the invention should be construed as being included in the scope of the invention.

Claims

An image automatic cropping method, characterized in that the method comprises:

Extracting a beauty response map and a gradient energy map of the image to be cropped;

Extracting a candidate cropped image intensively to the image to be cropped;

Filtering the candidate cropped image based on the aesthetic response map;

Based on the aesthetic response map and the gradient energy map, the composition score of the selected candidate cropped image is estimated, and the candidate cropped image with the highest score is determined as the cropped image.

The method according to claim 1, wherein the extracting the aesthetic response map and the gradient energy map of the image to be cropped comprises:

Using the deep convolutional neural network and the category response mapping method, and extracting the aesthetic response map of the image to be cropped by using the following formula:

Wherein, said M(x, y) represents a beauty response value at a spatial position (x, y); said K represents a total number of channels of a feature map of a last layer of the convolutional layer of the deep convolutional neural network; The k represents a kth channel; the f _k (x, y) represents a feature value of the kth channel at the spatial position (x, y); the w _k represents the kth The weight of the feature map of the channel to the high aesthetic category;

Smoothing the image to be cropped, and calculating a gradient value of each pixel to obtain the gradient energy map.

The method of claim 2 wherein said deep convolutional neural network is trained in the following manner:

Forming a convolution layer on the bottom layer of the deep convolutional neural network structure;

After the last convolutional layer of the deep convolutional neural network structure, each feature map is pooled into a point by a global average pooling method;

Connects the full connectivity layer and loss function with the same number of aesthetic quality classification categories.

The method according to claim 1, wherein the screening the candidate cropped image based on the aesthetic response map comprises:

The aesthetic retention score of the candidate cropped image is calculated by the following formula:

Wherein said S _a (C) represented by the candidate image of the cropped aesthetic retention score; C represents the cropped image of the candidate; the (i, j) represents the position of a pixel; I represents the original image The A _{(i, j)} represents an aesthetic response value at the (i, j) position;

Sorting all candidate cropped images according to the aesthetic retention scores from large to small;

Select the candidate crop image with the highest score.

The method according to claim 1, wherein the estimating a composition score of the selected candidate crop image based on the aesthetic response map and the gradient energy map, and determining the candidate crop image with the highest score as the cropping The image specifically includes:

Forming a composition model based on the aesthetic response map and the gradient energy map;

A composition score of the selected candidate cropped image is estimated by using the composition model, and the candidate cropped image with the highest score is determined as the cropped image.

The method according to claim 5, wherein the composition model is obtained by:

Establishing a training image set based on the aesthetic response map and the gradient energy map;

Labeling the training image for aesthetic quality categories;

Training a deep convolutional neural network with labeled training images;

Extracting the aesthetic response map and the spatial pyramid feature of the gradient energy map by using the trained deep convolutional neural network for the labeled training image;

Splicing the extracted spatial pyramid features together;

The classifier is used for training, and the composition rules are automatically learned to obtain a composition model.