[go: up one dir, main page]

WO2018090355A1 - Method for auto-cropping of images - Google Patents

Method for auto-cropping of images Download PDF

Info

Publication number
WO2018090355A1
WO2018090355A1 PCT/CN2016/106548 CN2016106548W WO2018090355A1 WO 2018090355 A1 WO2018090355 A1 WO 2018090355A1 CN 2016106548 W CN2016106548 W CN 2016106548W WO 2018090355 A1 WO2018090355 A1 WO 2018090355A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
cropped
aesthetic
map
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/106548
Other languages
French (fr)
Chinese (zh)
Inventor
黄凯奇
赫然
考月英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to PCT/CN2016/106548 priority Critical patent/WO2018090355A1/en
Publication of WO2018090355A1 publication Critical patent/WO2018090355A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the invention relates to the field of pattern recognition, machine learning and computer vision technology, in particular to an image automatic cropping method.
  • image cropping is a very subjective task, it is difficult to consider all the influencing factors in existing rules.
  • Conventional image auto-cropping regions typically use saliency maps to identify major regions or regions of interest in the image, while at the same time formulating rules to calculate energy function minimization or learning classifiers to find crop regions.
  • these established rules are not comprehensive enough for the subjective task of image cropping, and the precision is difficult to meet user needs.
  • an image automatic cropping method is provided in order to solve the technical problem of how to improve the robustness and precision of image automatic cropping.
  • An image automatic cropping method comprising:
  • the composition score of the selected candidate cropped image is estimated, and the candidate cropped image with the highest score is determined as the cropped image.
  • the extracting the aesthetic response map and the gradient energy map of the image to be cropped include:
  • said M(x, y) represents a beauty response value at a spatial position (x, y); said K represents a total number of channels of a feature map of a last layer of the convolutional layer of the deep convolutional neural network;
  • the k represents a kth channel;
  • the f k (x, y) represents a feature value of the kth channel at the spatial position (x, y);
  • the w k represents the kth The weight of the feature map of the channel to the high aesthetic category;
  • the deep convolutional neural network is trained by:
  • each feature map is pooled into a point by a global average pooling method
  • the filtering the candidate cropped image based on the aesthetic response map includes:
  • the aesthetic retention score of the candidate cropped image is calculated by the following formula:
  • S a (C) represented by the candidate image of the cropped aesthetic retention score
  • C represents the cropped image of the candidate
  • the (i, j) represents the position of a pixel
  • I represents the original image
  • the A (i, j) represents an aesthetic response value at the (i, j) position
  • the determining, according to the aesthetic response map and the gradient energy map, the composition score of the selected candidate cropped image, and determining the candidate cropped image with the highest score as the cropped image specifically including:
  • composition model based on the aesthetic response map and the gradient energy map
  • a composition score of the selected candidate cropped image is estimated by using the composition model, and the candidate cropped image with the highest score is determined as the cropped image.
  • composition model is obtained by:
  • the classifier is used for training, and the composition rules are automatically learned to obtain a composition model.
  • Embodiments of the present invention provide an image automatic cropping method.
  • the method comprises: extracting a beauty response map and a gradient energy map of the image to be cropped; extracting the candidate cropped image intensively for the cropped image; screening the candidate cropped image based on the aesthetic response map; and estimating the selected candidate based on the aesthetic response map and the gradient energy map
  • the composition score of the image is cropped, and the candidate crop image with the highest score is determined as the crop image.
  • This scheme uses the aesthetic response map to explore the aesthetic influence area of the picture, and uses the aesthetic response map to determine the aesthetic retention part, so as to retain the high aesthetic quality of the cropped image to the greatest extent.
  • the scheme also uses the gradient energy map to analyze the gradient distribution rule.
  • Embodiments of the present invention can be applied to a wide variety of fields involving automatic image cropping, including image editing, photography, and image repositioning.
  • FIG. 1 is a schematic flow chart of an image auto-cropping method according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a deep convolutional neural network according to an embodiment of the present invention.
  • FIG. 3a is a schematic diagram of an image to be cropped according to an embodiment of the present invention.
  • Figure 3b is a schematic illustration of a cropped image in accordance with an embodiment of the present invention.
  • Embodiments of the present invention contemplate the use of deep learning to automatically learn an area of influence that is important for image cropping to automatically and comprehensively learn the rules so that the high-sensitivity area is preserved as much as possible during cropping.
  • FIG. 1 exemplarily shows the flow of an image automatic cropping method.
  • the method can include:
  • S100 Extract a beauty response map and a gradient energy map of the image to be cropped.
  • this step may include:
  • M(x, y) represents the aesthetic response value at the spatial position (x, y);
  • K represents the total number of channels of the feature map f of the last layer of the convolutional layer of the trained deep convolutional neural network;
  • k represents the kth channel;
  • f k (x, y) represents the eigenvalue of the kth channel at the spatial position (x, y);
  • w k represents the result of the feature map of the kth channel to the high aesthetic The weight of the category.
  • the above steps can train the deep convolutional neural network according to actual needs when extracting the aesthetic response map.
  • the training of deep convolutional neural networks can be done in the following ways:
  • Step 1 Set up the convolution layer at the bottom of the deep convolutional neural network structure.
  • Step 2 After the last convolutional layer of the deep convolutional neural network structure, each feature map is pooled into a point by a global average pooling method.
  • Step 3 Connect a fully connected layer and loss function with the same number of aesthetic quality classification categories.
  • Fig. 2 exemplarily shows a deep convolutional neural network structure.
  • a deep convolutional neural network model under the aesthetic quality classification task can be trained through steps 1-3. Then, the deep convolutional neural network and the category response mapping method trained for the aesthetic quality classification task are utilized; and the above formula is used to calculate the aesthetic response map M of the image to be cropped under the high aesthetic category.
  • a sliding window of all sizes smaller than the image size may be employed, the candidate cropping window is intensively extracted for the cropped image, and the candidate cropping image is extracted by the candidate cropping window.
  • this step may include:
  • S a (C) represents the aesthetic retention score of the candidate cropped image
  • C represents the candidate cropped image
  • (i, j) represents the position of the pixel
  • I represents the original image
  • a (i, j) represents (i, j) The aesthetic response value.
  • an aesthetic retention model can be constructed.
  • the candidate cropping window is screened by the aesthetic retention model to select candidate windows with higher aesthetic retention scores.
  • S122 Sort all candidate cropped images according to the aesthetic retention scores from large to small.
  • a candidate cropped image in the first 10,000 candidate cropping windows can be set.
  • S130 Estimating the composition score of the selected candidate cropped image based on the aesthetic response map and the gradient energy map, and determining the candidate cropped image with the highest score as the cropped image.
  • this step can be implemented by step S131 to step S133.
  • This step can train the composition model according to the actual situation when building the composition model.
  • the training data can use the image with better composition as the positive sample, and the image with the composition defect as the negative sample.
  • composition model can be trained in the following ways:
  • Step a Establish a training image set based on the aesthetic response map and the gradient energy map.
  • Step b labeling the training image with an aesthetic quality category.
  • Step c Training the deep convolutional neural network with the labeled training images.
  • Step 1 For the training process of this step, refer to Step 1 to Step 3 above, and details are not described herein again.
  • Step d extracting the spatial pyramid features of the aesthetic response map and the gradient energy map using the trained deep convolutional neural network for the labeled training images.
  • Step e stitching the extracted spatial pyramid features together.
  • Step f using the classifier for training, automatically learning the composition rules, and obtaining the composition model.
  • the classifier may be, for example, a support vector machine classifier.
  • S132 Estimating the composition score of the selected candidate cropped image by using the composition model, and determining the candidate cropped image with the highest score as the cropped image.
  • Fig. 3a exemplarily shows an image to be cropped
  • Fig. 3b exemplarily shows a cropped image.
  • Step A The image data set marked with the aesthetic quality category is sent to the deep convolutional neural network for aesthetic quality category model training.
  • Step B Input the image data set marked with the composition category into the trained deep convolutional neural network, extract the feature map of the last layer of the convolution layer, calculate the aesthetic response map, calculate the aesthetic gradient map, and then use the support vector machine.
  • the classifier trains the composition model.
  • Step C Extract the aesthetic response map and the gradient energy map from the test image.
  • the extraction method of this step can refer to the method of the training phase.
  • Step D Intensively collecting candidate cropping windows of the image to be tested.
  • a sliding window with an interval of 30 pixels is used for acquisition or extraction.
  • Step E Filter the candidate cropping window using the aesthetic retention model.
  • the aesthetic retention model is used to calculate the aesthetic retention score of the intensively collected candidate cropping window, and a part of the candidate cropping window with the highest aesthetic classification is selected, for example, 10,000 candidate cropping windows are selected.
  • Step F Using the composition model to evaluate the selected candidate cropping window.
  • the composition model trained in the training phase is collected to evaluate the composition score of the selected candidate cropping window, and the highest score is taken as the final cropping window, thereby obtaining the cropped image.
  • the method provided by the embodiment of the present invention makes good use of the aesthetic response map and the gradient energy map to maximize the aesthetic quality and image composition rules, and obtains more robust and accurate image auto-cropping.
  • Performance illustrates the effectiveness of aesthetic response maps and gradient energy maps for automatic image cropping.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a method for auto-cropping of images. The method comprises: extracting the aesthetically pleasing diagram and the gradient energy diagram of a to-be-cropped image; densely extracting candidate cropped images from the to-be-cropped image; on the basis of the aesthetically pleasing diagram, screening the candidate cropped images; on the basis of the aesthetically pleasing diagram and the gradient energy diagram, estimating the composition score of the obtained candidate cropped images by means of screening, and determining the candidate cropped image having the highest score as a cropped image. The present invention uses the aesthetically pleasing diagram to explore the aesthetically pleasing region of a picture, and uses the aesthetically pleasing diagram to determine the aesthetically reserved portion, thus reserving to the greatest extent the high aesthetical quality of the cropped image. Moreover, the present invention also uses the gradient energy diagram to analyze a gradient distribution rule, and estimates, on the basis of the aesthetically pleasing diagram and the gradient energy diagram, the composition score of the cropped image. The present invention overcomes the defects of image composition representation, and solves the technical problem of how to improve the robustness and accuracy of automatic cropping of images.

Description

图像自动裁剪方法Image automatic cropping method 技术领域Technical field

本发明涉及模式识别、机器学习及计算机视觉技术领域,特别涉及一种图像自动裁剪方法。The invention relates to the field of pattern recognition, machine learning and computer vision technology, in particular to an image automatic cropping method.

背景技术Background technique

随着计算机技术和数字媒体技术的快速发展,人们对计算机视觉、人工智能、机器感知等领域的需求与期盼也越来越高。图像的自动裁剪作为图像自动编辑中的一项非常重要和常见的任务也得到越来越多的关注和发展。图像自动裁剪技术就是希望能够去除多余的区域,强调感兴趣区域,从而提高图像的整体构图和美感质量。一种有效并且自动的图像裁剪方法不仅能够使人类从繁琐的工作中解放出来,而且还能给一些非专业人士提供一些专业的图像编辑的建议。With the rapid development of computer technology and digital media technology, people's needs and expectations for computer vision, artificial intelligence, machine perception and other fields are also increasing. Automatic cropping of images as a very important and common task in image automatic editing has also received more and more attention and development. The image auto-cropping technique is designed to remove excess areas and emphasize areas of interest, thereby improving the overall composition and aesthetic quality of the image. An effective and automatic image cropping method not only frees humans from the tedious work, but also provides some professional image editing advice to some non-professionals.

由于图像裁剪是一项非常主观性的任务,现有的规则很难考虑所有影响因素。传统的图像自动裁剪区域通常使用显著性图来识别图像中的主要区域或感兴趣区域,同时通过制定的一些规则来计算能量函数最小化或学习分类器来寻找裁剪区域。但是这些制定的规则对图像裁剪这一主观性的任务并不够全面,精度也很难达到用户需求。Since image cropping is a very subjective task, it is difficult to consider all the influencing factors in existing rules. Conventional image auto-cropping regions typically use saliency maps to identify major regions or regions of interest in the image, while at the same time formulating rules to calculate energy function minimization or learning classifiers to find crop regions. However, these established rules are not comprehensive enough for the subjective task of image cropping, and the precision is difficult to meet user needs.

有鉴于此,特提出本发明。In view of this, the present invention has been specifically proposed.

发明内容Summary of the invention

为了解决现有技术中的上述问题,即为了解决如何提高图像自动裁剪的鲁棒性和精度的技术问题而提供一种图像自动裁剪方法。In order to solve the above problems in the prior art, an image automatic cropping method is provided in order to solve the technical problem of how to improve the robustness and precision of image automatic cropping.

为了实现上述目的,提供了以下技术方案:In order to achieve the above object, the following technical solutions are provided:

一种图像自动裁剪方法,所述方法包括:An image automatic cropping method, the method comprising:

提取待裁剪图像的美感响应图和梯度能量图;Extracting a beauty response map and a gradient energy map of the image to be cropped;

对所述待裁剪图像密集提取候选裁剪图像;Extracting a candidate cropped image intensively to the image to be cropped;

基于所述美感响应图,筛选所述候选裁剪图像;Filtering the candidate cropped image based on the aesthetic response map;

基于所述美感响应图和所述梯度能量图,估计筛选出的候选裁剪图像的构图分数,并将得分最高的候选裁剪图像确定为裁剪图像。 Based on the aesthetic response map and the gradient energy map, the composition score of the selected candidate cropped image is estimated, and the candidate cropped image with the highest score is determined as the cropped image.

进一步地,所述提取待裁剪图像的美感响应图和梯度能量图,具体包括:Further, the extracting the aesthetic response map and the gradient energy map of the image to be cropped include:

利用深度卷积神经网络和类别响应映射方法,并采用如下公式提取所述待裁剪图像的所述美感响应图:Using the deep convolutional neural network and the category response mapping method, and extracting the aesthetic response map of the image to be cropped by using the following formula:

Figure PCTCN2016106548-appb-000001
Figure PCTCN2016106548-appb-000001

其中,所述M(x,y)表示在空间位置(x,y)处的美感响应值;所述K表示深度卷积神经网络的最后一层卷积层的特征图的总通道个数;所述k表示第k个通道;所述fk(x,y)表示所述第k个通道在所述空间位置(x,y)处的特征值;所述wk表示所述第k个通道的特征图池化后的结果到高美感类别的权值;Wherein, said M(x, y) represents a beauty response value at a spatial position (x, y); said K represents a total number of channels of a feature map of a last layer of the convolutional layer of the deep convolutional neural network; The k represents a kth channel; the f k (x, y) represents a feature value of the kth channel at the spatial position (x, y); the w k represents the kth The weight of the feature map of the channel to the high aesthetic category;

对所述待裁剪图像进行平滑处理,并计算每个像素点的梯度值,从而得到所述梯度能量图。Smoothing the image to be cropped, and calculating a gradient value of each pixel to obtain the gradient energy map.

进一步地,所述深度卷积神经网络通过以下方式训练得到:Further, the deep convolutional neural network is trained by:

在所述深度卷积神经网络结构的底层设置卷积层;Forming a convolution layer on the bottom layer of the deep convolutional neural network structure;

在所述深度卷积神经网络结构的最后一个卷积层之后通过全局平均池化的方法,将每一特征图池化为一个点;After the last convolutional layer of the deep convolutional neural network structure, each feature map is pooled into a point by a global average pooling method;

连接与美感质量分类类别数相同的全连接层和损失函数。Connects the full connectivity layer and loss function with the same number of aesthetic quality classification categories.

进一步地,所述基于所述美感响应图,筛选所述候选裁剪图像,具体包括:Further, the filtering the candidate cropped image based on the aesthetic response map includes:

通过如下公式计算所述候选裁剪图像的美感保留分数:The aesthetic retention score of the candidate cropped image is calculated by the following formula:

Figure PCTCN2016106548-appb-000002
Figure PCTCN2016106548-appb-000002

其中,所述Sa(C)表示所述候选裁剪图像的所述美感保留分数;所述C表示所述候选裁剪图像;所述(i,j)表示像素的位置;所述I表示原始图像;所述A(i,j)表示在(i,j)位置处的美感响应值;Wherein said S a (C) represented by the candidate image of the cropped aesthetic retention score; C represents the cropped image of the candidate; the (i, j) represents the position of a pixel; I represents the original image The A (i, j) represents an aesthetic response value at the (i, j) position;

将所有候选裁剪图像按照所述美感保留分数从大到小进行排序;Sorting all candidate cropped images according to the aesthetic retention scores from large to small;

选取得分最高的一部分候选裁剪图像。 Select the candidate crop image with the highest score.

进一步地,所述基于所述美感响应图和所述梯度能量图,估计筛选出的候选裁剪图像的构图分数,并将得分最高的候选裁剪图像确定为裁剪图像,具体包括:Further, the determining, according to the aesthetic response map and the gradient energy map, the composition score of the selected candidate cropped image, and determining the candidate cropped image with the highest score as the cropped image, specifically including:

基于所述美感响应图和所述梯度能量图建立构图模型;Forming a composition model based on the aesthetic response map and the gradient energy map;

利用所述构图模型估计所述筛选出的候选裁剪图像的构图分数,并将所述得分最高的候选裁剪图像确定为所述裁剪图像。A composition score of the selected candidate cropped image is estimated by using the composition model, and the candidate cropped image with the highest score is determined as the cropped image.

进一步地,所述构图模型通过以下方式获得:Further, the composition model is obtained by:

基于所述美感响应图和所述梯度能量图建立训练图像集;Establishing a training image set based on the aesthetic response map and the gradient energy map;

对训练图像进行美感质量类别的标注;Labeling the training image for aesthetic quality categories;

利用标注的训练图像训练深度卷积神经网络;Training a deep convolutional neural network with labeled training images;

针对所述已标注的训练图像,利用训练好的深度卷积神经网络,提取所述美感响应图和所述梯度能量图的空间金字塔特征;Extracting the aesthetic response map and the spatial pyramid feature of the gradient energy map by using the trained deep convolutional neural network for the labeled training image;

将提取的空间金字塔特征拼接在一起;Splicing the extracted spatial pyramid features together;

利用分类器进行训练,自动学习构图规则,得到构图模型。The classifier is used for training, and the composition rules are automatically learned to obtain a composition model.

本发明实施例提供一种图像自动裁剪方法。该方法包括:提取待裁剪图像的美感响应图和梯度能量图;对待裁剪图像密集提取候选裁剪图像;基于美感响应图,筛选候选裁剪图像;基于美感响应图和梯度能量图,估计筛选出的候选裁剪图像的构图分数,并将得分最高的候选裁剪图像确定为裁剪图像。本方案利用美感响应图去探究图片的美感影响区域,利用美感响应图确定美感保留部分,从而更加最大程度地保留了裁剪图像的高美感质量,同时本方案还利用梯度能量图去分析梯度分布规则,并且基于美感响应图和梯度能量图来评估裁剪图的构图分数。本发明实施例弥补了图像构图表达的缺陷,解决了如何提高图像自动裁剪的鲁棒性和精度的技术问题。本发明实施例能应用于涉及图像自动裁剪的众多领域,包括图像编辑、摄影学及图像重定位等。Embodiments of the present invention provide an image automatic cropping method. The method comprises: extracting a beauty response map and a gradient energy map of the image to be cropped; extracting the candidate cropped image intensively for the cropped image; screening the candidate cropped image based on the aesthetic response map; and estimating the selected candidate based on the aesthetic response map and the gradient energy map The composition score of the image is cropped, and the candidate crop image with the highest score is determined as the crop image. This scheme uses the aesthetic response map to explore the aesthetic influence area of the picture, and uses the aesthetic response map to determine the aesthetic retention part, so as to retain the high aesthetic quality of the cropped image to the greatest extent. At the same time, the scheme also uses the gradient energy map to analyze the gradient distribution rule. And evaluating the composition score of the cropped map based on the aesthetic response map and the gradient energy map. The embodiment of the invention compensates for the defects of image composition expression and solves the technical problem of how to improve the robustness and precision of image automatic cropping. Embodiments of the present invention can be applied to a wide variety of fields involving automatic image cropping, including image editing, photography, and image repositioning.

附图说明DRAWINGS

图1是根据本发明实施例的图像自动裁剪方法的流程示意图;1 is a schematic flow chart of an image auto-cropping method according to an embodiment of the present invention;

图2是根据本发明实施例的深度卷积神经网络的结构示意图;2 is a schematic structural diagram of a deep convolutional neural network according to an embodiment of the present invention;

图3a是根据本发明实施例的待裁剪图像示意图; FIG. 3a is a schematic diagram of an image to be cropped according to an embodiment of the present invention; FIG.

图3b是根据本发明实施例的裁剪后的图像示意图。Figure 3b is a schematic illustration of a cropped image in accordance with an embodiment of the present invention.

具体实施方式detailed description

下面结合附图以及具体实施例对本发明实施例解决的技术问题、所采用的技术方案以及实现的技术效果进行清楚、完整的描述。显然,所描述的实施例仅仅是本申请的一部分实施例,并不是全部实施例。基于本申请中的实施例,本领域普通技术人员在不付出创造性劳动的前提下,所获的所有其它等同或明显变型的实施例均落在本发明的保护范围内。本发明实施例可以按照权利要求中限定和涵盖的多种不同方式来具体化。The technical problems, the technical solutions, and the technical effects of the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings and specific embodiments. It is apparent that the described embodiments are only a part of the embodiments of the present application, not all of them. Based on the embodiments of the present application, all other equivalent or obvious variations of the embodiments obtained by those skilled in the art without departing from the scope of the present invention fall within the scope of the present invention. The embodiments of the invention may be embodied in many different ways as defined and claimed in the claims.

深度学习在各个领域得到了快速的发展及很好的效果。本发明实施例考虑利用深度学习去自动学习对图像裁剪重要的影响区域,以自动全面地学习规则,从而使得在裁剪时尽可能地保留高美感区域。Deep learning has achieved rapid development and good results in various fields. Embodiments of the present invention contemplate the use of deep learning to automatically learn an area of influence that is important for image cropping to automatically and comprehensively learn the rules so that the high-sensitivity area is preserved as much as possible during cropping.

为此,本发明实施例提供一种自动图像裁剪方法。图1示例性地示出了图像自动裁剪方法的流程。如图1所示,该方法可以包括:To this end, an embodiment of the present invention provides an automatic image cropping method. FIG. 1 exemplarily shows the flow of an image automatic cropping method. As shown in FIG. 1, the method can include:

S100:提取待裁剪图像的美感响应图和梯度能量图。S100: Extract a beauty response map and a gradient energy map of the image to be cropped.

具体地,本步骤可以包括:Specifically, this step may include:

S101:利用深度卷积神经网络和类别响应映射方法,并采用如下公式提取待裁剪图像的美感响应图:S101: Using a deep convolutional neural network and a category response mapping method, and extracting a beauty response map of the image to be cropped by using the following formula:

Figure PCTCN2016106548-appb-000003
Figure PCTCN2016106548-appb-000003

其中,M(x,y)表示在空间位置(x,y)处的美感响应值;K表示训练好的深度卷积神经网络的最后一层卷积层的特征图f的总通道个数;k表示第k个通道;fk(x,y)表示第k个通道在空间位置(x,y)处的特征值;wk表示第k个通道的特征图池化后的结果到高美感类别的权值。Where M(x, y) represents the aesthetic response value at the spatial position (x, y); K represents the total number of channels of the feature map f of the last layer of the convolutional layer of the trained deep convolutional neural network; k represents the kth channel; f k (x, y) represents the eigenvalue of the kth channel at the spatial position (x, y); w k represents the result of the feature map of the kth channel to the high aesthetic The weight of the category.

上述步骤在提取美感响应图时可以根据实际需要训练深度卷积神经网络。深度卷积神经网络的训练可以通过以下方式进行:The above steps can train the deep convolutional neural network according to actual needs when extracting the aesthetic response map. The training of deep convolutional neural networks can be done in the following ways:

步骤1:在深度卷积神经网络结构的底层设置卷积层。Step 1: Set up the convolution layer at the bottom of the deep convolutional neural network structure.

步骤2:在深度卷积神经网络结构的最后一个卷积层之后通过全局平均池化的方法,将每一个特征图池化为一个点。Step 2: After the last convolutional layer of the deep convolutional neural network structure, each feature map is pooled into a point by a global average pooling method.

步骤3:连接一个与美感质量分类类别数相同的全连接层和损失函数。 Step 3: Connect a fully connected layer and loss function with the same number of aesthetic quality classification categories.

图2示例性地示出了一个深度卷积神经网络结构。Fig. 2 exemplarily shows a deep convolutional neural network structure.

通过步骤1-3可以训练一个在美感质量分类任务下的深度卷积神经网络模型。然后,利用为美感质量分类任务训练好的深度卷积神经网络和类别响应映射方法;再采用上述公式,可以计算在高美感类别下待裁剪图像的美感响应图M。A deep convolutional neural network model under the aesthetic quality classification task can be trained through steps 1-3. Then, the deep convolutional neural network and the category response mapping method trained for the aesthetic quality classification task are utilized; and the above formula is used to calculate the aesthetic response map M of the image to be cropped under the high aesthetic category.

S102:对待裁剪图像进行平滑处理,并计算每个像素点的梯度值,从而得到梯度能量图。S102: Smoothing the cropped image, and calculating a gradient value of each pixel, thereby obtaining a gradient energy map.

S110:对待裁剪图像密集提取候选裁剪图像。S110: The candidate cropped image is intensively extracted from the cropped image.

这里,可以采用小于图像大小的所有大小的滑动窗口,对待裁剪图像密集提取候选裁剪窗口,通过候选裁剪窗口提取出候选裁剪图像。Here, a sliding window of all sizes smaller than the image size may be employed, the candidate cropping window is intensively extracted for the cropped image, and the candidate cropping image is extracted by the candidate cropping window.

S120:基于美感响应图,筛选候选裁剪图像。S120: Filter the candidate cropped image based on the aesthetic response map.

具体地,本步骤可以包括:Specifically, this step may include:

S121:通过如下公式计算候选裁剪图像的美感保留分数:S121: Calculate the aesthetic retention score of the candidate cropped image by the following formula:

Figure PCTCN2016106548-appb-000004
Figure PCTCN2016106548-appb-000004

其中,Sa(C)表示候选裁剪图像的美感保留分数;C表示候选裁剪图像;(i,j)表示像素的位置;I表示原始图像;A(i,j)表示在(i,j)处的美感响应值。Where S a (C) represents the aesthetic retention score of the candidate cropped image; C represents the candidate cropped image; (i, j) represents the position of the pixel; I represents the original image; A (i, j) represents (i, j) The aesthetic response value.

通过本步骤可以构建美感保留模型。将候选裁剪窗口经过美感保留模型筛选出美感保留分数较高的候选窗口。Through this step, an aesthetic retention model can be constructed. The candidate cropping window is screened by the aesthetic retention model to select candidate windows with higher aesthetic retention scores.

S122:将所有候选裁剪图像按照美感保留分数从大到小进行排序。S122: Sort all candidate cropped images according to the aesthetic retention scores from large to small.

S123:选取得分最高的一部分候选裁剪图像。S123: Select a part of the candidate cropped image with the highest score.

例如:实际应用中可以设置保留前10000个候选裁剪窗口中的候选裁剪图像。For example, in a practical application, a candidate cropped image in the first 10,000 candidate cropping windows can be set.

S130:基于美感响应图和梯度能量图,估计筛选出的候选裁剪图像的构图分数,并将得分最高的候选裁剪图像确定为裁剪图像。S130: Estimating the composition score of the selected candidate cropped image based on the aesthetic response map and the gradient energy map, and determining the candidate cropped image with the highest score as the cropped image.

具体地,本步骤可以通过步骤S131至步骤S133来实现。Specifically, this step can be implemented by step S131 to step S133.

S131:基于美感响应图和梯度能量图建立构图模型。 S131: Establish a composition model based on the aesthetic response map and the gradient energy map.

本步骤在建立构图模型时可以根据实际情况训练构图模型。在训练构图模型的过程中,训练数据可以采用构图较好的图像作为正样本,而将有构图缺陷的图像作为负样本。This step can train the composition model according to the actual situation when building the composition model. In the process of training the composition model, the training data can use the image with better composition as the positive sample, and the image with the composition defect as the negative sample.

可以通过以下方式来训练构图模型:The composition model can be trained in the following ways:

步骤a:基于美感响应图和梯度能量图建立训练图像集。Step a: Establish a training image set based on the aesthetic response map and the gradient energy map.

步骤b:对训练图像进行美感质量类别的标注。Step b: labeling the training image with an aesthetic quality category.

步骤c:利用标注的训练图像训练深度卷积神经网络。Step c: Training the deep convolutional neural network with the labeled training images.

本步骤的训练过程可以参考上述步骤1至步骤3,在此不再赘述。For the training process of this step, refer to Step 1 to Step 3 above, and details are not described herein again.

步骤d:针对已标注的训练图像,利用训练好的深度卷积神经网络,提取美感响应图和梯度能量图的空间金字塔特征。Step d: extracting the spatial pyramid features of the aesthetic response map and the gradient energy map using the trained deep convolutional neural network for the labeled training images.

步骤e:将提取的空间金字塔特征拼接在一起。Step e: stitching the extracted spatial pyramid features together.

步骤f:利用分类器进行训练,自动学习构图规则,得到构图模型。Step f: using the classifier for training, automatically learning the composition rules, and obtaining the composition model.

其中,分类器例如可以采用支持向量机分类器。The classifier may be, for example, a support vector machine classifier.

S132:利用构图模型估计筛选出的候选裁剪图像的构图分数,并将得分最高的候选裁剪图像确定为裁剪图像。S132: Estimating the composition score of the selected candidate cropped image by using the composition model, and determining the candidate cropped image with the highest score as the cropped image.

图3a示例性地示出了待裁剪图像;图3b示例性地示出了裁剪后的图像。Fig. 3a exemplarily shows an image to be cropped; Fig. 3b exemplarily shows a cropped image.

下面再以一优选实施例来更好地说明本发明。The invention will now be better illustrated by a preferred embodiment.

步骤A:将标注有美感质量类别的图像数据集送入深度卷积神经网络进行美感质量类别模型训练。Step A: The image data set marked with the aesthetic quality category is sent to the deep convolutional neural network for aesthetic quality category model training.

步骤B:将标注有构图类别的图像数据集输入训练好的深度卷积神经网络,提取最后一层卷积层的特征图,并计算美感响应图,同时计算美感梯度图,然后采用支持向量机分类器训练构图模型。Step B: Input the image data set marked with the composition category into the trained deep convolutional neural network, extract the feature map of the last layer of the convolution layer, calculate the aesthetic response map, calculate the aesthetic gradient map, and then use the support vector machine. The classifier trains the composition model.

步骤C:对待测试图像提取美感响应图和梯度能量图。Step C: Extract the aesthetic response map and the gradient energy map from the test image.

本步骤的提取方法可参考训练阶段的方法。The extraction method of this step can refer to the method of the training phase.

步骤D:密集采集待测试图像的候选裁剪窗口。Step D: Intensively collecting candidate cropping windows of the image to be tested.

举例来说,在1000×1000的待测试图像上,利用间隔为30个像素的滑动窗口进行采集或提取。For example, on a 1000×1000 image to be tested, a sliding window with an interval of 30 pixels is used for acquisition or extraction.

步骤E:利用美感保留模型筛选候选裁剪窗口。 Step E: Filter the candidate cropping window using the aesthetic retention model.

本步骤利用美感保留模型计算密集采集到的候选裁剪窗口的美感保留分数,筛选出美感分类最高的一部分候选裁剪窗口,例如:筛选出10000个候选裁剪窗口。In this step, the aesthetic retention model is used to calculate the aesthetic retention score of the intensively collected candidate cropping window, and a part of the candidate cropping window with the highest aesthetic classification is selected, for example, 10,000 candidate cropping windows are selected.

步骤F:利用构图模型评估筛选出的候选裁剪窗口。Step F: Using the composition model to evaluate the selected candidate cropping window.

本步骤采集训练阶段训练好的构图模型去评估筛选出的候选裁剪窗口的构图分数,将得分最高的作为最后的裁剪窗口,从而得到裁剪图像。In this step, the composition model trained in the training phase is collected to evaluate the composition score of the selected candidate cropping window, and the highest score is taken as the final cropping window, thereby obtaining the cropped image.

综上所述,本发明实施例提供的方法很好地利用了美感响应图和梯度能量图来最大程度地保留美感质量和图像的构图规则,得到更加鲁棒,精度更高的图像的自动裁剪性能,进而说明了美感响应图和梯度能量图对于图像自动裁剪的有效性。In summary, the method provided by the embodiment of the present invention makes good use of the aesthetic response map and the gradient energy map to maximize the aesthetic quality and image composition rules, and obtains more robust and accurate image auto-cropping. Performance, in turn, illustrates the effectiveness of aesthetic response maps and gradient energy maps for automatic image cropping.

上述实施例中虽然按照上述先后次序描述了本发明实施例提供的方法,但是本领域技术人员可以理解,为了实现本实施例的效果,还可以以诸如并行或颠倒次序等不同的顺序来执行,这些简单的变化都在本发明的保护范围之内。Although the methods provided in the embodiments of the present invention are described in the foregoing embodiments, those skilled in the art may understand that, in order to implement the effects of the embodiments, they may also be executed in different orders, such as parallel or reverse order. These simple variations are all within the scope of the present invention.

以上所述,仅为本发明中的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉该技术的人在本发明所揭露的技术范围内,可理解想到的变换或替换,都应涵盖在本发明的包含范围之内,因此,本发明的保护范围应该以权利要求书的保护范围为准。 The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand the alteration or replacement within the scope of the technical scope of the present invention. The scope of the invention should be construed as being included in the scope of the invention.

Claims (6)

一种图像自动裁剪方法,其特征在于,所述方法包括:An image automatic cropping method, characterized in that the method comprises: 提取待裁剪图像的美感响应图和梯度能量图;Extracting a beauty response map and a gradient energy map of the image to be cropped; 对所述待裁剪图像密集提取候选裁剪图像;Extracting a candidate cropped image intensively to the image to be cropped; 基于所述美感响应图,筛选所述候选裁剪图像;Filtering the candidate cropped image based on the aesthetic response map; 基于所述美感响应图和所述梯度能量图,估计筛选出的候选裁剪图像的构图分数,并将得分最高的候选裁剪图像确定为裁剪图像。Based on the aesthetic response map and the gradient energy map, the composition score of the selected candidate cropped image is estimated, and the candidate cropped image with the highest score is determined as the cropped image. 根据权利要求1所述的方法,其特征在于,所述提取待裁剪图像的美感响应图和梯度能量图,具体包括:The method according to claim 1, wherein the extracting the aesthetic response map and the gradient energy map of the image to be cropped comprises: 利用深度卷积神经网络和类别响应映射方法,并采用如下公式提取所述待裁剪图像的所述美感响应图:Using the deep convolutional neural network and the category response mapping method, and extracting the aesthetic response map of the image to be cropped by using the following formula:
Figure PCTCN2016106548-appb-100001
Figure PCTCN2016106548-appb-100001
其中,所述M(x,y)表示在空间位置(x,y)处的美感响应值;所述K表示深度卷积神经网络的最后一层卷积层的特征图的总通道个数;所述k表示第k个通道;所述fk(x,y)表示所述第k个通道在所述空间位置(x,y)处的特征值;所述wk表示所述第k个通道的特征图池化后的结果到高美感类别的权值;Wherein, said M(x, y) represents a beauty response value at a spatial position (x, y); said K represents a total number of channels of a feature map of a last layer of the convolutional layer of the deep convolutional neural network; The k represents a kth channel; the f k (x, y) represents a feature value of the kth channel at the spatial position (x, y); the w k represents the kth The weight of the feature map of the channel to the high aesthetic category; 对所述待裁剪图像进行平滑处理,并计算每个像素点的梯度值,从而得到所述梯度能量图。Smoothing the image to be cropped, and calculating a gradient value of each pixel to obtain the gradient energy map.
根据权利要求2所述的方法,其特征在于,所述深度卷积神经网络通过以下方式训练得到:The method of claim 2 wherein said deep convolutional neural network is trained in the following manner: 在所述深度卷积神经网络结构的底层设置卷积层;Forming a convolution layer on the bottom layer of the deep convolutional neural network structure; 在所述深度卷积神经网络结构的最后一个卷积层之后通过全局平均池化的方法,将每一特征图池化为一个点;After the last convolutional layer of the deep convolutional neural network structure, each feature map is pooled into a point by a global average pooling method; 连接与美感质量分类类别数相同的全连接层和损失函数。Connects the full connectivity layer and loss function with the same number of aesthetic quality classification categories. 根据权利要求1所述的方法,其特征在于,所述基于所述美感响应图,筛选所述候选裁剪图像,具体包括:The method according to claim 1, wherein the screening the candidate cropped image based on the aesthetic response map comprises: 通过如下公式计算所述候选裁剪图像的美感保留分数: The aesthetic retention score of the candidate cropped image is calculated by the following formula:
Figure PCTCN2016106548-appb-100002
Figure PCTCN2016106548-appb-100002
其中,所述Sa(C)表示所述候选裁剪图像的所述美感保留分数;所述C表示所述候选裁剪图像;所述(i,j)表示像素的位置;所述I表示原始图像;所述A(i,j)表示在(i,j)位置处的美感响应值;Wherein said S a (C) represented by the candidate image of the cropped aesthetic retention score; C represents the cropped image of the candidate; the (i, j) represents the position of a pixel; I represents the original image The A (i, j) represents an aesthetic response value at the (i, j) position; 将所有候选裁剪图像按照所述美感保留分数从大到小进行排序;Sorting all candidate cropped images according to the aesthetic retention scores from large to small; 选取得分最高的一部分候选裁剪图像。Select the candidate crop image with the highest score.
根据权利要求1所述的方法,其特征在于,所述基于所述美感响应图和所述梯度能量图,估计筛选出的候选裁剪图像的构图分数,并将得分最高的候选裁剪图像确定为裁剪图像,具体包括:The method according to claim 1, wherein the estimating a composition score of the selected candidate crop image based on the aesthetic response map and the gradient energy map, and determining the candidate crop image with the highest score as the cropping The image specifically includes: 基于所述美感响应图和所述梯度能量图建立构图模型;Forming a composition model based on the aesthetic response map and the gradient energy map; 利用所述构图模型估计所述筛选出的候选裁剪图像的构图分数,并将所述得分最高的候选裁剪图像确定为所述裁剪图像。A composition score of the selected candidate cropped image is estimated by using the composition model, and the candidate cropped image with the highest score is determined as the cropped image. 根据权利要求5所述的方法,其特征在于,所述构图模型通过以下方式获得:The method according to claim 5, wherein the composition model is obtained by: 基于所述美感响应图和所述梯度能量图建立训练图像集;Establishing a training image set based on the aesthetic response map and the gradient energy map; 对训练图像进行美感质量类别的标注;Labeling the training image for aesthetic quality categories; 利用标注的训练图像训练深度卷积神经网络;Training a deep convolutional neural network with labeled training images; 针对所述已标注的训练图像,利用训练好的深度卷积神经网络,提取所述美感响应图和所述梯度能量图的空间金字塔特征;Extracting the aesthetic response map and the spatial pyramid feature of the gradient energy map by using the trained deep convolutional neural network for the labeled training image; 将提取的空间金字塔特征拼接在一起;Splicing the extracted spatial pyramid features together; 利用分类器进行训练,自动学习构图规则,得到构图模型。 The classifier is used for training, and the composition rules are automatically learned to obtain a composition model.
PCT/CN2016/106548 2016-11-21 2016-11-21 Method for auto-cropping of images Ceased WO2018090355A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/106548 WO2018090355A1 (en) 2016-11-21 2016-11-21 Method for auto-cropping of images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/106548 WO2018090355A1 (en) 2016-11-21 2016-11-21 Method for auto-cropping of images

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/034,366 Continuation US10813081B2 (en) 2016-01-13 2018-07-13 Reference signal configuration method and device

Publications (1)

Publication Number Publication Date
WO2018090355A1 true WO2018090355A1 (en) 2018-05-24

Family

ID=62145070

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/106548 Ceased WO2018090355A1 (en) 2016-11-21 2016-11-21 Method for auto-cropping of images

Country Status (1)

Country Link
WO (1) WO2018090355A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782021A (en) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 Image classification method, device, equipment and computer readable storage medium
CN113297514A (en) * 2020-04-13 2021-08-24 阿里巴巴集团控股有限公司 Image processing method, image processing device, electronic equipment and computer storage medium
CN113379749A (en) * 2021-06-10 2021-09-10 北京房江湖科技有限公司 Image processing method, readable storage medium, and computer program product
CN113763391A (en) * 2021-09-24 2021-12-07 华中科技大学 Intelligent image clipping method and system based on visual element relationship
WO2022160222A1 (en) * 2021-01-28 2022-08-04 京东方科技集团股份有限公司 Defect detection method and apparatus, model training method and apparatus, and electronic device
CN114882560A (en) * 2022-05-10 2022-08-09 福州大学 Intelligent image clipping method based on lightweight portrait detection
WO2023093683A1 (en) * 2021-11-24 2023-06-01 北京字节跳动网络技术有限公司 Image cropping method and apparatus, model training method and apparatus, electronic device, and medium
CN116309627A (en) * 2022-12-15 2023-06-23 北京航空航天大学 Image cropping method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100329588A1 (en) * 2009-06-24 2010-12-30 Stephen Philip Cheatle Autocropping and autolayout method for digital images
US20150213609A1 (en) * 2014-01-30 2015-07-30 Adobe Systems Incorporated Image Cropping Suggestion
CN105528786A (en) * 2015-12-04 2016-04-27 小米科技有限责任公司 Image processing method and device
CN105787966A (en) * 2016-03-21 2016-07-20 复旦大学 An aesthetic evaluation method for computer pictures

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100329588A1 (en) * 2009-06-24 2010-12-30 Stephen Philip Cheatle Autocropping and autolayout method for digital images
US20150213609A1 (en) * 2014-01-30 2015-07-30 Adobe Systems Incorporated Image Cropping Suggestion
CN105528786A (en) * 2015-12-04 2016-04-27 小米科技有限责任公司 Image processing method and device
CN105787966A (en) * 2016-03-21 2016-07-20 复旦大学 An aesthetic evaluation method for computer pictures

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NISHIYAMA, M. ET AL.: "Sensation-based Photo Cropping", PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'09, 24 October 2009 (2009-10-24), XP058271494 *
YAN, J. ET AL.: "Learning the Change for Automatic Image Cropping", IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 31 December 2013 (2013-12-31), XP032492828 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782021A (en) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 Image classification method, device, equipment and computer readable storage medium
CN110782021B (en) * 2019-10-25 2023-07-14 浪潮电子信息产业股份有限公司 Image classification method, device, equipment and computer-readable storage medium
CN113297514A (en) * 2020-04-13 2021-08-24 阿里巴巴集团控股有限公司 Image processing method, image processing device, electronic equipment and computer storage medium
WO2022160222A1 (en) * 2021-01-28 2022-08-04 京东方科技集团股份有限公司 Defect detection method and apparatus, model training method and apparatus, and electronic device
CN113379749A (en) * 2021-06-10 2021-09-10 北京房江湖科技有限公司 Image processing method, readable storage medium, and computer program product
CN113763391A (en) * 2021-09-24 2021-12-07 华中科技大学 Intelligent image clipping method and system based on visual element relationship
CN113763391B (en) * 2021-09-24 2024-03-19 华中科技大学 An intelligent image cropping method and system based on visual element relationships
WO2023093683A1 (en) * 2021-11-24 2023-06-01 北京字节跳动网络技术有限公司 Image cropping method and apparatus, model training method and apparatus, electronic device, and medium
CN114882560A (en) * 2022-05-10 2022-08-09 福州大学 Intelligent image clipping method based on lightweight portrait detection
CN116309627A (en) * 2022-12-15 2023-06-23 北京航空航天大学 Image cropping method and device
CN116309627B (en) * 2022-12-15 2023-09-15 北京航空航天大学 Image cropping method and device

Similar Documents

Publication Publication Date Title
CN106650737B (en) Image automatic cropping method
WO2018090355A1 (en) Method for auto-cropping of images
CN107665492B (en) A deep network-based tissue segmentation method for colorectal panoramic digital pathological images
CN105096259B (en) The depth value restoration methods and system of depth image
CN106846344B (en) A kind of image segmentation optimal identification method based on the complete degree in edge
WO2020007307A1 (en) Sky filter method for panoramic images and portable terminal
CN111027547A (en) An automatic detection method for multi-scale and polymorphic objects in two-dimensional images
CN110570435B (en) Method and device for carrying out damage segmentation on vehicle damage image
CN110569747A (en) A method to quickly count the number of rice ears in field rice using image pyramid and Faster-RCNN
CN108960404B (en) Image-based crowd counting method and device
CN104851086A (en) Image detection method for cable rope surface defect
CN108537751B (en) Thyroid ultrasound image automatic segmentation method based on radial basis function neural network
CN113554638A (en) Method and system for establishing chip surface defect detection model
CN108830856B (en) GA automatic segmentation method based on time series SD-OCT retina image
CN113313107A (en) Intelligent detection and identification method for multiple types of diseases on cable surface of cable-stayed bridge
CN109670501B (en) Object identification and grasping position detection method based on deep convolutional neural network
CN108615239A (en) Tongue image dividing method based on threshold technology and Gray Projection
CN115661187B (en) Image enhancement method for analysis of traditional Chinese medicine preparation
WO2021057395A1 (en) Heel type identification method, device, and storage medium
CN104933723B (en) Tongue Image Segmentation Method Based on Sparse Representation
CN111738931B (en) Shadow Removal Algorithm for Photovoltaic Array UAV Aerial Imagery
CN111882555A (en) Net clothes detection method, device, equipment and storage medium based on deep learning
CN109712116B (en) Fault identification method for power transmission line and accessories thereof
CN107315999A (en) A kind of tobacco plant recognition methods based on depth convolutional neural networks
CN117576121A (en) Automatic segmentation method, system, equipment and medium for microscope scanning area

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16922036

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16922036

Country of ref document: EP

Kind code of ref document: A1