WO2024230170A1

WO2024230170A1 - Quantization precision selection method and device for deep network model

Info

Publication number: WO2024230170A1
Application number: PCT/CN2023/139026
Authority: WO
Inventors: 谷涛; 连新涛
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2023-05-11
Filing date: 2023-12-15
Publication date: 2024-11-14
Anticipated expiration: 2025-11-11
Also published as: CN116579399A

Abstract

The present application relates to the technical field of deep learning, and in particular, to a quantization precision selection method and device for a deep network model. The quantization precision selection method comprises: S110, acquiring a deep network model to be processed, and letting the quantization precision of each layer in the deep network model be first precision; S120, determining the quantization sensitivity of each layer in the deep network model; S130, traversing, in the deep network model, layers the quantization precision of which is not changed, so as to select the layer having the lowest quantization sensitivity, and changing the quantization precision of the layer to be second precision, wherein the second precision is smaller than the first precision; and S140, performing inference by using the deep network model the quantization precision of which has been changed, and if the inference time is shorter than or equal to a preset target time, ending to obtain an expected deep network model. According to the present embodiment, the model quantization efficiency is improved while the precision of a network model is ensured.

Description

A method and device for selecting quantization accuracy of deep network model

本申请要求于2023年5月11日提交中国专利局，申请号为2023105342377，发明名称为“一种深度网络模型的量化精度选择方法及装置”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application filed with the China Patent Office on May 11, 2023, with application number 2023105342377, and invention name “A method and device for selecting quantization accuracy of a deep network model”, the entire contents of which are incorporated by reference in this application.

Technical Field

本申请涉及深度学习技术领域，尤其涉及一种深度网络模型的量化精度选择方法及装置。The present application relates to the field of deep learning technology, and in particular to a method and device for selecting quantization accuracy of a deep network model.

Background Art

现有的深度网络部署过程中，模型的量化是不可缺少的一步。常识上来说，量化位宽越高，量化效果越好，量化位宽越低则量化精度越差。但是高位宽则代表着推理时间变长，低位宽往往量化后模型的精度欠佳。自然地，人们开始研究混合精度量化工具。目前的混合精度量化大多采用强化学习的方法，去自动的决定每层的量化精度。In the existing deep network deployment process, model quantization is an indispensable step. Common sense says that the higher the quantization bit width, the better the quantization effect, and the lower the quantization bit width, the worse the quantization accuracy. However, high bit width means longer inference time, and low bit width often results in poor accuracy of the model after quantization. Naturally, people began to study mixed-precision quantization tools. Most of the current mixed-precision quantization uses reinforcement learning methods to automatically determine the quantization accuracy of each layer.

但是现有带有混合精度量化功能的工具，对于混合精度的关注过少。这些工具往往侧重于量化参数的选择，然后再决定量化位宽；或者量化位宽与量化参数一起进行选择，这显然没有对精度的选择给予高度的关注。同时基于深度学习的方法也面临着量化时间过长，工具结构复杂的问题。However, existing tools with mixed-precision quantization functions pay too little attention to mixed precision. These tools often focus on the selection of quantization parameters and then determine the quantization bit width; or the quantization bit width is selected together with the quantization parameter, which obviously does not pay much attention to the selection of precision. At the same time, methods based on deep learning also face the problems of long quantization time and complex tool structure.

发明内容Summary of the invention

有鉴于此，本申请实施例提供了一种深度网络模型的量化精度选择方法及装置，可以解决相关技术中的至少一个技术问题。In view of this, an embodiment of the present application provides a method and device for selecting quantization accuracy of a deep network model, which can solve at least one technical problem in the related art.

第一方面，本申请一实施例提供一种深度网络模型的量化精度选择方法，包括：S110，获取输入图像及待处理的深度网络模型，对深度网络模型的参数进行量化且令量化后的深度网络模型中每一层的量化精度为第一精度；S120，将输入图像输入量化后的深度网络模型得到深度网络模型每一层输出的特征图，利用深度网络模型每一层输出的特征图确定深度网络模型中每一层的量化敏感度；S130，遍历深度网络模型中未变更量化精度的层以选择量化敏感度最小的一层，并变更该层的量化精度为第二精度，第二精度小于第一精度；S140，利用变更量化精度后的所深度网络模型进行推理，若推理时间大于预设目标时间，则返回执行步骤S130；若推理时间小于或等于所述预设目标时间，则结束得到符合预期的深度网络模型。本实施例结合深度网络模型每一层的量化敏感度去调整网络模型的量化精度，在确保网络模型精度的同时，提高了模型量化效率。In the first aspect, an embodiment of the present application provides a method for selecting quantization accuracy of a deep network model, including: S110, obtaining an input image and a deep network model to be processed, quantizing the parameters of the deep network model and setting the quantization accuracy of each layer in the quantized deep network model to a first accuracy; S120, inputting the input image into the quantized deep network model to obtain a feature map output by each layer of the deep network model, and determining the quantization sensitivity of each layer in the deep network model using the feature map output by each layer of the deep network model; S130, traversing the layers in the deep network model whose quantization accuracy has not been changed to select a layer with the smallest quantization sensitivity, and changing the quantization accuracy of the layer to a second accuracy, which is less than the first accuracy; S140, performing reasoning using the deep network model after the quantization accuracy has been changed, and if the reasoning time is greater than the preset target time, returning to step S130; if the reasoning time is less than or equal to the preset target time, then ending to obtain a deep network model that meets expectations. This embodiment adjusts the quantization accuracy of the network model in combination with the quantization sensitivity of each layer of the deep network model, while ensuring the accuracy of the network model, it improves the model quantization efficiency.

在一些实施例中，确定深度网络模型中每一层的量化敏感度，包括：获取深度网络模型中每一层输出的特征图的量化值和浮点值；根据每一层的量化值和所述浮点值确定该层的量化敏感度。本实施例结合每一层输出的特征图的量化值和浮点值计算每层的量化敏感度，可以获得准确的量化敏感度的数值，进一步确保了网络模型的精度。In some embodiments, determining the quantization sensitivity of each layer in the deep network model includes: obtaining the quantization value and floating-point value of the feature map output by each layer in the deep network model; and determining the quantization sensitivity of the layer according to the quantization value and the floating-point value of each layer. This embodiment calculates the quantization sensitivity of each layer in combination with the quantization value and floating-point value of the feature map output by each layer, and can obtain an accurate value of the quantization sensitivity, further ensuring the accuracy of the network model.

在一些实施例中，根据每一层的所述量化值和所述浮点值确定该层的量化敏感度，包括：计算每一层输出的特征图的所述量化值和所述浮点值的误差值作为第一数值；计算每一层输出的特征图的所述浮点值的均值作为第二数值；计算每一层的第一数值与所述第二数值的比值作为该层的量化敏感度；其中，误差值包括：均方差、余弦值、均方根误差、或平均绝对误差。In some embodiments, determining the quantization sensitivity of each layer according to the quantization value and the floating point value of each layer includes: calculating each The error value between the quantized value of the feature map output by the layer and the floating-point value is taken as the first value; the mean of the floating-point values of the feature map output by each layer is calculated as the second value; the ratio of the first value to the second value of each layer is calculated as the quantization sensitivity of the layer; wherein the error value includes: mean square error, cosine value, root mean square error, or mean absolute error.

本实施例通过计算误差值(或第一数值)和均值(或第二数值)的比值获得每一层的量化敏感度，可以比较快速的获得量化敏感度，进一步提升了网络模型量化效率。另外，本实施中还根据统计的数据获得量化敏感度，可以获得较为准确的量化敏感度的数值，且计算简单易于本申请实施。This embodiment obtains the quantization sensitivity of each layer by calculating the ratio of the error value (or the first value) and the mean value (or the second value), which can obtain the quantization sensitivity relatively quickly, further improving the quantization efficiency of the network model. In addition, in this implementation, the quantization sensitivity is also obtained based on statistical data, and a more accurate value of the quantization sensitivity can be obtained, and the calculation is simple and easy to implement in this application.

第二方面，本申请一实施例提供一种深度网络模型的量化精度选择装置，包括：获取模块，用于获取输入图像及待处理的深度网络模型，对深度网络模型的参数进行量化且令量化后的深度网络模型中每一层的量化精度为第一精度；确定模块，用于将输入图像输入量化后的深度网络模型得到深度网络模型每一层输出的特征图，利用深度网络模型每一层输出的特征图确定深度网络模型中每一层的量化敏感度；选择模块，用于遍历深度网络模型中未变更量化精度的层以选择量化敏感度最小的一层，并变更该层的量化精度为第二精度，第二精度小于第一精度；推理模块，用于利用变更量化精度后的深度网络模型进行推理，若推理时间大于预设目标时间，则返回进入选择模块；若推理时间小于或等于预设目标时间，则结束得到符合预期的深度网络模型。In a second aspect, an embodiment of the present application provides a quantization accuracy selection device for a deep network model, comprising: an acquisition module, used to acquire an input image and a deep network model to be processed, quantize the parameters of the deep network model and set the quantization accuracy of each layer in the quantized deep network model to a first accuracy; a determination module, used to input the input image into the quantized deep network model to obtain a feature map output by each layer of the deep network model, and determine the quantization sensitivity of each layer in the deep network model using the feature map output by each layer of the deep network model; a selection module, used to traverse the layers in the deep network model whose quantization accuracy has not been changed to select a layer with the smallest quantization sensitivity, and change the quantization accuracy of the layer to a second accuracy, which is less than the first accuracy; an inference module, used to perform inference using the deep network model after the quantization accuracy has been changed, and if the inference time is greater than the preset target time, return to the selection module; if the inference time is less than or equal to the preset target time, the deep network model that meets expectations is obtained.

在一些实施例中，确定模块包括第一获取子模块和第二确定子模块。其中，第一获取子模块，用于获取深度网络模型中每一层输出的特征图的量化值和浮点值；第二确定子模块，用于计算每一层输出的特征图的量化值和浮点值的误差值作为第一数值；计算每一层输出的特征图的浮点值的均值作为第二数值；计算每一层的第一数值与第二数值的比值作为该层的量化敏感度；其中，误差值包括：均方差、余弦值、均方根误差、或平均绝对误差。In some embodiments, the determination module includes a first acquisition submodule and a second determination submodule. The first acquisition submodule is used to obtain the quantization value and floating-point value of the feature map output by each layer in the deep network model; the second determination submodule is used to calculate the error value of the quantization value and floating-point value of the feature map output by each layer as the first value; calculate the mean of the floating-point value of the feature map output by each layer as the second value; calculate the ratio of the first value to the second value of each layer as the quantization sensitivity of the layer; wherein the error value includes: mean square error, cosine value, root mean square error, or mean absolute error.

第三方面，本申请一实施例提供一种移植系统，该系统包括数据处理设备及电子设备，数据处理设备用于执行计算机程序时实现如第一方面任一实施例的深度网络模型的量化精度选择方法得到符合预期的深度网络模型，深度网络模型被移植至电子设备以供电子设备使用。In a third aspect, an embodiment of the present application provides a transplantation system, which includes a data processing device and an electronic device. The data processing device is used to implement a quantization accuracy selection method for a deep network model such as any embodiment of the first aspect when executing a computer program to obtain an expected deep network model. The deep network model is transplanted to the electronic device for use in the electronic device.

第四方面，本申请一实施例提供一种电子设备，包括处理器，如第三方面的移植系统移植深度网络模型移植至电子设备的处理器上，处理器通过搭载的深度网络模型对输入的数据进行处理以执行特定任务。In a fourth aspect, an embodiment of the present application provides an electronic device, comprising a processor, wherein the transplantation system transplants a deep network model, such as the transplantation system in the third aspect, to the processor of the electronic device, and the processor processes input data through the onboard deep network model to perform a specific task.

第五方面，本申请一实施例提供一种计算机可读存储介质，计算机可读存储介质存储有计算机程序，计算机程序被数据处理设备执行时实现如第一方面任一实施例的深度网络模型的量化精度选择方法。In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by a data processing device, it implements a method for selecting quantization accuracy of a deep network model as in any embodiment of the first aspect.

第六方面，本申请一实施例提供了一种计算机程序产品，当计算机程序产品在数据处理设备上运行时，使得数据处理设备执行如第一方面任一实施例的深度网络模型的量化精度选择方法。In a sixth aspect, an embodiment of the present application provides a computer program product. When the computer program product runs on a data processing device, the data processing device executes a method for selecting quantization accuracy of a deep network model as described in any embodiment of the first aspect.

应理解，第二方面至第六方面的有益效果可以参见第一方面实施例的相关描述，此处不再赘述。It should be understood that the beneficial effects of the second to sixth aspects can be found in the relevant description of the embodiment of the first aspect and will not be repeated here.

BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。 In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.

图1为根据本申请提供的一种深度网络模型的移植系统图；FIG1 is a diagram of a transplantation system of a deep network model provided by the present application;

图2是本申请一实施例提供的一种深度网络模型的量化精度选择方法的实现流程示意图；FIG2 is a schematic diagram of an implementation flow of a method for selecting quantization accuracy of a deep network model provided in an embodiment of the present application;

图3是本申请一实施例提供的一种步骤S120的过程示意图；FIG3 is a schematic diagram of a process of step S120 provided in an embodiment of the present application;

图4是本申请一实施例提供的一种步骤S122的过程示意图；FIG. 4 is a process diagram of step S122 provided in an embodiment of the present application;

图5是本申请一实施例提供的一种深度网络模型的量化精度选择装置的结构示意图；FIG5 is a schematic diagram of the structure of a device for selecting quantization accuracy of a deep network model provided by an embodiment of the present application;

图6是本申请一实施例提供的一种确定模块的结构示意图。FIG6 is a schematic diagram of the structure of a determination module provided in an embodiment of the present application.

DETAILED DESCRIPTION

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本申请实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本申请的描述。In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application may also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present application.

在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。The term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

在本申请说明书中描述的“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此，在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例，而是意味着“一个或多个但不是所有的实施例”，除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”，除非是以其他方式另外特别强调。"One embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of the present application include specific features, structures or characteristics described in conjunction with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. that appear in different places in this specification do not necessarily refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways. The terms "including", "comprising", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized in other ways.

此外，在本申请的描述中，“多个”的含义是两个或两个以上。术语“第一”和“第二”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In addition, in the description of the present application, "plurality" means two or more. The terms "first" and "second" are only used to distinguish the description and cannot be understood as indicating or implying relative importance.

为了说明本申请所述的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solution described in this application, a specific embodiment is provided below for illustration.

图1为根据本申请提供的一种深度网络模型的移植系统图，该系统包括电子设备及数据处理设备，数据处理设备用于根据本申请一个或多个实施例提供的量化精度选择方法对深度网络模型进行处理，得到量化后的深度网络模型；将量化后的模型移植至电子设备以供用户通过电子设备执行特定任务，如图像处理、语音识别、活体检测及人脸识别等任务。具体地，数据处理设备包括但不限于计算机、平板电脑、服务器或可穿戴设备等具有计算能力的电子设备，服务器包括但不限于独立服务器或云服务器等；电子设备包括深度相机、移动设备、支付终端及机器人等。FIG1 is a diagram of a transplantation system for a deep network model provided in the present application, the system comprising an electronic device and a data processing device, the data processing device is used to process the deep network model according to the quantization accuracy selection method provided in one or more embodiments of the present application to obtain a quantized deep network model; the quantized model is transplanted to an electronic device for the user to perform specific tasks through the electronic device, such as image processing, speech recognition, liveness detection, and face recognition. Specifically, the data processing device includes but is not limited to electronic devices with computing capabilities such as computers, tablet computers, servers, or wearable devices, and the server includes but is not limited to independent servers or cloud servers, etc.; the electronic devices include depth cameras, mobile devices, payment terminals, and robots, etc.

在一个实施例中，深度网络模型由多个层次组成，这些层次从功能上可以分为输入层、隐藏层和输出层，其中，输入层用于接收输入数据，隐藏层用于对输入数据进行一系列的非线性变化和特征提取以便更好地完成特定任务，输出层用于输出执行特定任务的结果。需要说明的是，本申请对输入层、隐藏层和输出层的数量不作限制，其可以是多层，也可以是一层，根据深度网络模型具体所执行的任务限定。In one embodiment, the deep network model is composed of multiple layers, which can be functionally divided into an input layer, a hidden layer, and an output layer, wherein the input layer is used to receive input data, the hidden layer is used to perform a series of nonlinear changes and feature extraction on the input data in order to better complete a specific task, and the output layer is used to output the result of performing a specific task. It should be noted that this application does not limit the number of input layers, hidden layers, and output layers, which can be multiple layers or one layer, depending on the specific task performed by the deep network model.

在本实施例中，深度网络模型中的深度是指网络模型的层数，其相比于传统的神经网络的层数更多，具有强大的特征提取和表示能力，但也意味着深度网络模型需要更高的推理成本和更大的存储。为了降低深度网络模型模型的推理成本和存储，在深度网络模型部署过程中需对其进行量化，而量化会损失模型的精度。基于此，本申请提供一种深度网络模型的量化精度选择方法，在保证量化精度高的同时，减少深度网络模型的推理成本和存储消耗，以便深度网络模型亦可在算力低、内存低等有限资源和能力的设备上运行。In this embodiment, the depth in the deep network model refers to the number of layers of the network model. Compared with the traditional neural network, the number of layers is greater and has powerful feature extraction and representation capabilities, but it also means that the deep network model requires higher inference costs and larger storage. The reasoning cost and storage of low-depth network models need to be quantized during the deployment of deep network models, and quantization will lose the accuracy of the model. Based on this, the present application provides a method for selecting the quantization accuracy of a deep network model, which reduces the reasoning cost and storage consumption of the deep network model while ensuring high quantization accuracy, so that the deep network model can also run on devices with limited resources and capabilities such as low computing power and low memory.

图2是本申请一实施例提供的一种深度网络模型的量化精度选择方法的实现流程示意图，该方法包括步骤S110至步骤S140。FIG2 is a schematic diagram of an implementation flow of a method for selecting quantization accuracy of a deep network model provided in an embodiment of the present application, the method comprising steps S110 to S140.

S110，获取输入图像及待处理的深度网络模型，对深度网络模型的参数进行量化且令量化后的深度网络模型中每一层的量化精度为第一精度。S110, obtaining an input image and a deep network model to be processed, quantizing parameters of the deep network model and setting the quantization accuracy of each layer in the quantized deep network model to a first accuracy.

在一个实施例中，输入图像可包括一张或多张图像，图像可以是人脸图像、人体图像、手势图像或其他图像，图像具体内容取决于深度网络模型执行的特定任务。In one embodiment, the input image may include one or more images, which may be face images, body images, gesture images, or other images, and the specific content of the image depends on the specific task performed by the deep network model.

在一个实施例中，深度网络模型中的参数为高精度的浮点数，为了减少模型的大小，提高推理的速度及降低内存和功耗，需要将其转换为低精度的定点数，即对深度网络模型的参数进行量化且令量化后的深度网络模型每一层的量化精度为第一精度(记为Q1)。相对应地，量化后的深度网络模型每一层的量化位宽为第一位宽，在一些可能的实现方式中，第一位宽例如可以取32bit。In one embodiment, the parameters in the deep network model are high-precision floating-point numbers. In order to reduce the size of the model, increase the speed of inference, and reduce memory and power consumption, they need to be converted into low-precision fixed-point numbers, that is, the parameters of the deep network model are quantized and the quantization accuracy of each layer of the quantized deep network model is the first accuracy (denoted as Q1). Correspondingly, the quantization bit width of each layer of the quantized deep network model is the first bit width. In some possible implementations, the first bit width can be 32 bits, for example.

S120，将输入图像输入量化后的深度网络模型得到深度网络模型每一层输出的特征图，利用深度网络模型每一层输出的特征图确定深度网络模型中每一层的量化敏感度。S120, inputting the input image into the quantized deep network model to obtain a feature map output by each layer of the deep network model, and using the feature map output by each layer of the deep network model to determine the quantization sensitivity of each layer in the deep network model.

其中，量化敏感度表示深度网络模型中每一层对以定点数运算近似原来的浮点数运算的敏感程度；深度网络模型中某一层的量化敏感度越大，则表示该层对量化越敏感；反之，深度网络模型中某一层的量化敏感度越小，则表示该层对量化越不敏感。Among them, quantization sensitivity indicates the sensitivity of each layer in the deep network model to the approximation of the original floating-point operations with fixed-point operations; the greater the quantization sensitivity of a layer in the deep network model, the more sensitive the layer is to quantization; conversely, the smaller the quantization sensitivity of a layer in the deep network model, the less sensitive the layer is to quantization.

在本申请一些实施例中，如图3所示，步骤S120可以包括：S121，获取深度网络模型中每一层输出的特征图的量化值和浮点值；S122，根据每一层的量化值和浮点值确定该层的量化敏感度。In some embodiments of the present application, as shown in Figure 3, step S120 may include: S121, obtaining the quantization value and floating-point value of the feature map output by each layer in the deep network model; S122, determining the quantization sensitivity of the layer based on the quantization value and floating-point value of each layer.

假设深度网络模型中第j层输出的特征图(feature map)为F_j，基于特征图获取第j层输出的特征图F_j的量化值F_quan _tj和浮点值F_floatj；根据第j层输出的特征图F_j的量化值F_quan _tj和浮点值F_floatj计算第j层的量化敏感度。Assume that the feature map output by the j-th layer in the deep network model is F _j , obtain the quantization value F _quan _tj and the floating-point value F _floatj of the feature map F _j output by the j-th layer based on the feature map; calculate the quantization sensitivity of the j-th layer according to the quantization value F _quan _tj and the floating-point value F _floatj of the feature map F _j output by the j-th layer.

本实施例结合量化值和浮点值确定量化敏感度，并在后续选择量化敏感度较小的层使用低比特位宽进行量化，以此来降低网络模型的推理时间，循环往复，直到推理时间满足目标值，在保证网络精度的同时，提高了量化的效率。This embodiment combines the quantization value and the floating-point value to determine the quantization sensitivity, and subsequently selects a layer with a smaller quantization sensitivity to use a low bit width for quantization, so as to reduce the inference time of the network model, and repeats the cycle until the inference time meets the target value, thereby improving the efficiency of quantization while ensuring the network accuracy.

作为本申请实施例的一可能实现方式，如图4所示，步骤S122可以包括：S1221，计算每一层输出的特征图的量化值和浮点值的误差值作为第一数值；S1222，计算每一层输出的特征图的浮点值的均值作为第二数值；S1223，计算每一层的第一数值与第二数值的比值作为该层的量化敏感度。As a possible implementation method of an embodiment of the present application, as shown in Figure 4, step S122 may include: S1221, calculating the error value between the quantization value and the floating-point value of the feature map output by each layer as the first numerical value; S1222, calculating the mean of the floating-point values of the feature map output by each layer as the second numerical value; S1223, calculating the ratio of the first numerical value to the second numerical value of each layer as the quantization sensitivity of the layer.

在本申请的一些示例中，量化值和浮点值的误差值可以包括但不限于均方差、余弦值、均方根误差、或平均绝对误差等，优选以均方差作为误差值。具体地，深度网络模型第j层输出的特征图F_j的量化值为F_quan _tj，浮点值为F_floatj，第一数值等于mse(F_quan _tj，F_floatj)，其中，第二数值等于即F_floatj的均值，则第j层的量化敏感度本实施例利用统计的均方差度量量化敏感度，可以得到更准确的量化敏感度的值，可以进一步确保网络模型的精度。In some examples of the present application, the error value between the quantized value and the floating point value may include but is not limited to the mean square error, cosine value, root mean square error, or mean absolute error, etc., preferably the mean square error is used as the error value. Specifically, the quantized value of the feature map F _j output by the jth layer of the deep network model is F _quan _tj , the floating point value is F _floatj , and the first value is equal to mse(F _quan _tj , F _floatj ), where, The second value is equal to That is, the mean value of F _floatj , then the quantization sensitivity of the jth layer is This embodiment uses the statistical mean square error to measure the quantization sensitivity, which can obtain a more accurate value of the quantization sensitivity and further ensure the network model accuracy.

S130，遍历深度网络模型中未变更量化精度的层以选择量化敏感度最小的一层，并变更该层的量化精度为第二精度；其中，第二精度小于第一精度。S130, traverse the layers in the deep network model whose quantization precision has not been changed to select a layer with the smallest quantization sensitivity, and change the quantization precision of the layer to a second precision; wherein the second precision is less than the first precision.

在本申请一些实施例中，第二精度可以记为Q2，Q2＜Q1，相对应地，其量化位宽为第二位宽。在一些可能的实现方式，第二位宽例如可以取16bit或8bit等。本实施例通过选择量化敏感度较小的层使用低比特位宽进行量化，即将这些量化敏感度较小的层由第一精度变更为第二精度，以此来降低网络模型的推理时间。In some embodiments of the present application, the second precision can be recorded as Q2, Q2 < Q1, and correspondingly, its quantization bit width is the second bit width. In some possible implementations, the second bit width can be, for example, 16 bits or 8 bits. This embodiment selects layers with less quantization sensitivity to use low bit width for quantization, that is, changes these layers with less quantization sensitivity from the first precision to the second precision, thereby reducing the inference time of the network model.

S140，利用变更量化精度后的深度网络模型进行推理，若推理时间大于预设目标时间，则返回执行步骤S130；若推理时间小于或等于预设目标时间，则结束得到符合预期的深度网络模型。S140, use the deep network model with changed quantization accuracy to perform inference. If the inference time is greater than the preset target time, return to step S130; if the inference time is less than or equal to the preset target time, the process ends and a deep network model that meets expectations is obtained.

在本申请实施例中，先确定各层的量化敏感度，再从未变更量化精度的层中选择量化敏感度最小的层，使用低比特位宽进行量化，循环往复，直到推理时间满足预设目标时间，则结束流程得到符合预期的深度网络模型，在保证网络模型精度的同时，提高了量化的效率。In an embodiment of the present application, the quantization sensitivity of each layer is first determined, and then the layer with the smallest quantization sensitivity is selected from the layers whose quantization accuracy has not been changed, and quantization is performed using a low bit width. This cycle is repeated until the inference time meets the preset target time. The process is terminated to obtain a deep network model that meets the expectations, thereby improving the efficiency of quantization while ensuring the accuracy of the network model.

需要说明的是，在其他一些实施例中，当推理时间等于预设目标时间时，也可以执行返回步骤S130的步骤，本领域技术人员可以根据实际情况进行选择，本申请实施例对此不予具体限制。It should be noted that, in some other embodiments, when the reasoning time is equal to the preset target time, the step of returning to step S130 may also be executed. Those skilled in the art may make a choice based on actual conditions, and the embodiments of the present application do not impose specific restrictions on this.

应理解，上述实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the serial numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

本申请一实施例提供一种深度网络模型的量化精度选择装置。该量化精度选择装置中未详细描述之处请详见前述量化精度选择方法实施例中的相关描述，此处不再赘述。An embodiment of the present application provides a quantization accuracy selection device for a deep network model. For details not described in the quantization accuracy selection device, please refer to the relevant description in the above-mentioned quantization accuracy selection method embodiment, which will not be repeated here.

图5是本申请一实施例提供的一种深度网络模型的量化精度选择装置的结构示意图。所述量化精度选择装置包括：获取模块41，用于获取输入图像及待处理的深度网络模型，对深度网络模型的参数进行量化且令量化后的深度网络模型中每一层的量化精度为第一精度；确定模块42，用于将输入图像输入量化后的深度网络模型得到深度网络模型每一层输出的特征图，利用深度网络模型每一层输出的特征图确定深度网络模型中每一层的量化敏感度；选择模块43，用于遍历深度网络模型中未变更量化精度的层以选择量化敏感度最小的一层，并变更该层的量化精度为第二精度，第二精度小于第一精度；推理模块44，用于利用变更量化精度后的深度网络模型进行推理，若推理时间大于预设目标时间，则返回进入选择模块43；若推理时间小于或等于预设目标时间，则结束得到符合预期的深度网络模型。FIG5 is a schematic diagram of the structure of a quantization precision selection device for a deep network model provided by an embodiment of the present application. The quantization precision selection device includes: an acquisition module 41, which is used to acquire an input image and a deep network model to be processed, quantize the parameters of the deep network model and set the quantization precision of each layer in the quantized deep network model to be the first precision; a determination module 42, which is used to input the input image into the quantized deep network model to obtain the feature map output by each layer of the deep network model, and determine the quantization sensitivity of each layer in the deep network model using the feature map output by each layer of the deep network model; a selection module 43, which is used to traverse the layers in the deep network model whose quantization precision has not been changed to select a layer with the smallest quantization sensitivity, and change the quantization precision of the layer to the second precision, which is less than the first precision; an inference module 44, which is used to perform inference using the deep network model after the quantization precision is changed, and if the inference time is greater than the preset target time, return to the selection module 43; if the inference time is less than or equal to the preset target time, the deep network model that meets the expectations is obtained.

在一些实施例中，如图6所示，确定模块42包括第一获取子模块421和第二确定子模块422。其中，第一获取子模块421，用于获取深度网络模型中每一层输出的特征图的量化值和浮点值。第二确定子模块422，用于根据每一层的量化值和浮点值确定该层的量化敏感度。In some embodiments, as shown in FIG6 , the determination module 42 includes a first acquisition submodule 421 and a second determination submodule 422. The first acquisition submodule 421 is used to obtain the quantization value and floating point value of the feature map output by each layer in the deep network model. The second determination submodule 422 is used to determine the quantization sensitivity of each layer according to the quantization value and floating point value of the layer.

在一些实施例中，所述第二确定子模块，具体用于：计算每一层输出的特征图的所述量化值和所述浮点值的误差值作为第一数值；计算每一层输出的特征图的所述浮点值的均值作为第二数值；计算每一层的所述第一数值与所述第二数值的比值作为该层的量化敏感度。In some embodiments, the second determination submodule is specifically used to: calculate the error value between the quantization value and the floating-point value of the feature map output by each layer as the first numerical value; calculate the mean of the floating-point values of the feature map output by each layer as the second numerical value; calculate the ratio of the first numerical value to the second numerical value of each layer as the quantization sensitivity of the layer.

本申请一实施例还提供一种电子设备，电子设备可以包括一个或多个处理器，通过如图1所示的移植系统移植深度网络模型移植至电子设备的处理器上，供处理器通过搭载的深度网络模型对输入的数据进行处理以执行特定任务。 An embodiment of the present application also provides an electronic device, which may include one or more processors. A deep network model is transplanted to the processor of the electronic device through a transplantation system as shown in FIG1 , so that the processor can process input data through the carried deep network model to perform a specific task.

在一个实施例中，电子设备还包括相机，相机用于采集二维图像或深度图像并传输至处理器；处理器基于深度网络模型对输入的图像进行处理以执行特定任务，如人脸识别、活体检测或支付任务等。需要说明的是，相机与电子设备可一体化设计或独立设计，两者通过有线或无线的方式进行数据通信，此处不作限制。In one embodiment, the electronic device further includes a camera, which is used to capture a two-dimensional image or a depth image and transmit it to a processor; the processor processes the input image based on a deep network model to perform specific tasks, such as face recognition, liveness detection, or payment tasks. It should be noted that the camera and the electronic device can be designed as an integrated whole or independently, and the two communicate data via wired or wireless means, which is not limited here.

在一个实施例中，处理器可以是中央处理单元(Central Processing Unit，CPU)，还可以是其他通用处理器、神经网络处理芯片、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。In one embodiment, the processor may be a central processing unit (CPU), or other general-purpose processors, neural network processing chips, digital signal processors (DSP), application specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.

本领域技术人员可以理解，上述仅仅是电子设备的示例，并不构成对电子设备的限定。电子设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如电子设备还可以包括输入输出设备、网络接入设备、总线等。Those skilled in the art will appreciate that the above are merely examples of electronic devices and do not constitute a limitation on electronic devices. Electronic devices may include more or fewer components than shown in the figure, or combine certain components, or different components. For example, electronic devices may also include input and output devices, network access devices, buses, etc.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。The technicians in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiment can be integrated in a processing unit, or each unit can exist physically separately, or two or more units can be integrated in one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application. The specific working process of the units and modules in the above-mentioned system can refer to the corresponding process in the aforementioned method embodiment, which will not be repeated here.

本申请一实施例还提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被执行时可实现深度网络模型的量化精度选择方法实施例中的步骤。An embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed, the steps in the embodiment of the method for selecting quantization accuracy of a deep network model can be implemented.

本申请一实施例提供一种计算机程序产品，当计算机程序产品运行时，可实现深度网络模型的量化精度选择方法实施例中的步骤。An embodiment of the present application provides a computer program product. When the computer program product is run, the steps in the embodiment of the method for selecting quantization accuracy of a deep network model can be implemented.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述或记载的部分，可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described or recorded in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

在本申请所提供的实施例中，应该理解到，所揭露的装置/电子设备和方法，可以通过其它的方式实现。例如，以上所描述的装置/电子设备实施例仅仅是示意性的，例如，模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口，装置或单元的间接耦合或通讯连接，可以是电性，机械或其它的形式。In the embodiments provided in the present application, it should be understood that the disclosed devices/electronic devices and methods can be implemented in other ways. For example, the device/electronic device embodiments described above are merely schematic. For example, the division of modules or units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实现上述实施例方法中的全部或部分流程，也可以通过计算机程序来指令相关的硬件来完成的计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述各个方法实施例的步骤。其中，计算机程序包括计算机程序代码，计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质可以包括：能够携带计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(read-only memory，ROM)、随机存取存储器(random access memory，RAM)、电载波信号、电信信号以及软件分发介质等。需要说明的是，计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括电载波信号和电信信号。If the integrated modules/units are implemented as software functional units and sold or used as independent products, they can be stored in a Computer-readable storage medium. Based on such an understanding, the present application implements all or part of the processes in the above-mentioned embodiment method, and the computer program that can be completed by instructing the relevant hardware through a computer program can be stored in a computer-readable storage medium, and the computer program can implement the steps of the above-mentioned various method embodiments when executed by the processor. Among them, the computer program includes computer program code, and the computer program code can be in source code form, object code form, executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device that can carry computer program code, recording medium, U disk, mobile hard disk, disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer-readable medium does not include electric carrier signals and telecommunication signals.

以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围，均应包含在本申请的保护范围之内。 The above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. These modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application, and should all be included in the protection scope of the present application.

Claims

A method for selecting quantization accuracy of a deep network model, characterized by comprising:

S110, obtaining an input image and a deep network model to be processed, quantizing parameters of the deep network model and setting the quantization accuracy of each layer in the quantized deep network model to a first accuracy;

S120, inputting the input image into the quantized deep network model to obtain a feature map output by each layer of the deep network model, and determining the quantization sensitivity of each layer in the deep network model using the feature map output by each layer of the deep network model;

S130, traversing the layers of the deep network model whose quantization precision has not been changed to select a layer with the smallest quantization sensitivity, and changing the quantization precision of the layer to a second precision, where the second precision is less than the first precision;

S140, performing reasoning using the deep network model after changing the quantization accuracy, and if the reasoning time is less than or equal to the preset target time, then the process ends to obtain a deep network model that meets expectations.

The quantization accuracy selection method according to claim 1, characterized in that the determining the quantization sensitivity of each layer in the deep network model using the feature map output by each layer of the deep network model comprises:

Obtaining quantized values and floating-point values of feature maps output by each layer in the deep network model;

The quantization sensitivity of each layer is determined according to the quantization value and the floating point value of the layer.

The quantization accuracy selection method according to claim 2, characterized in that the determining the quantization sensitivity of each layer according to the quantization value and the floating-point value of the layer comprises:

Calculate the error value between the quantized value and the floating point value of the feature map output by each layer as the first value;

Calculate the mean of the floating-point values of the feature map output by each layer as the second value;

The ratio of the first value to the second value of each layer is calculated as the quantization sensitivity of the layer.

The quantization accuracy selection method as described in claim 3 is characterized in that the error value includes: mean square error, cosine value, root mean square error, or mean absolute error.

The quantization accuracy selection method according to claim 4 is characterized in that when the error value is the mean square error, the quantization sensitivity calculation method is: assuming that the quantization value of the feature map F _j output by the jth layer of the deep network model is F _{quan tj} , the floating point value is F _floatj , and the first value is equal to mse(F _{quan tj} , F _floatj ), wherein, The second value is equal to That is, the mean value of F _floatj , then the quantization sensitivity of the jth layer is

The quantization accuracy selection method according to any one of claims 1 to 5 is characterized in that after performing reasoning using the neural network model after changing the quantization accuracy, it also includes: if the reasoning time is greater than the preset target time, returning to execute step S130.

A device for selecting quantization accuracy of a deep network model, characterized by comprising:

An acquisition module, used to acquire an input image and a deep network model to be processed, quantize the parameters of the deep network model and set the quantization accuracy of each layer in the quantized deep network model to be a first accuracy;

A determination module, configured to input the input image into the quantized deep network model to obtain a feature map output by each layer of the deep network model, and determine the quantization sensitivity of each layer in the deep network model using the feature map output by each layer of the deep network model;

A selection module is used to traverse the layers of the deep network model whose quantization precision is not changed to select a layer with the smallest quantization sensitivity, and changing the quantization accuracy of the layer to a second accuracy, where the second accuracy is less than the first accuracy;

The inference module is used to perform inference using the deep network model after the quantization accuracy is changed. If the inference time is less than or equal to the preset target time, the deep network model that meets the expectations is obtained.

A deep network model transplantation system, characterized in that it includes a data processing device and an electronic device, wherein the data processing device is used to execute the quantization accuracy selection method of the deep network model as described in any one of claims 1 to 6 to obtain a deep network model that meets the expectations, and the deep network model is transplanted to the electronic device for use by the electronic device.

An electronic device, comprising a processor, characterized in that the transplantation system according to claim 8 transplants the deep network model to the processor of the electronic device, and the processor processes input data by carrying the deep network model to perform a specific task.

A computer storage medium storing a computer program, wherein the computer program, when executed, implements the quantization accuracy selection method as described in any one of claims 1 to 6.