[go: up one dir, main page]

CN106250103A - A kind of convolutional neural networks cyclic convolution calculates the system of data reusing - Google Patents

A kind of convolutional neural networks cyclic convolution calculates the system of data reusing Download PDF

Info

Publication number
CN106250103A
CN106250103A CN201610633040.9A CN201610633040A CN106250103A CN 106250103 A CN106250103 A CN 106250103A CN 201610633040 A CN201610633040 A CN 201610633040A CN 106250103 A CN106250103 A CN 106250103A
Authority
CN
China
Prior art keywords
data
convolution
calculation
array
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610633040.9A
Other languages
Chinese (zh)
Inventor
刘波
朱智洋
陈壮
阮星
龚宇
曹鹏
杨军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201610633040.9A priority Critical patent/CN106250103A/en
Publication of CN106250103A publication Critical patent/CN106250103A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)

Abstract

本发明公开了一种面向粗粒度可重构系统的卷积神经网络循环卷积计算数据重用的系统,包括主控制器及连接控制模块、输入数据重用模块、卷积循环运算处理阵列、数据传输通路四部分。卷积循环运算时,本质为多个二维输入数据矩阵与多个二维权重矩阵相乘,一般这些矩阵规模都较大,相乘占据整个卷积计算的大部分时间。本发明利用粗粒度可重构阵列体系完成卷积计算过程,当接收到卷积运算请求指令后,利用寄存器轮转的方式充分发掘卷积循环计算过程的输入数据可重用性,提高了数据利用率并降低了带宽访存压力,且所设计的阵列单元是可配置的,可以完成不同循环卷积规模及步长时的卷积运算。

The invention discloses a coarse-grained reconfigurable system-oriented convolution neural network loop convolution calculation data reuse system, including a main controller and a connection control module, an input data reuse module, a convolution loop operation processing array, and a data transmission Pathways in four parts. During the convolution cycle operation, the essence is to multiply multiple two-dimensional input data matrices with multiple two-dimensional weight matrices. Generally, these matrices are large in size, and the multiplication takes up most of the time of the entire convolution calculation. The present invention utilizes a coarse-grained reconfigurable array system to complete the convolution calculation process. After receiving the convolution operation request instruction, the register rotation method is used to fully explore the reusability of the input data in the convolution cycle calculation process, and the data utilization rate is improved. It also reduces the bandwidth access pressure, and the designed array unit is configurable, and can complete convolution operations with different cyclic convolution scales and step sizes.

Description

一种卷积神经网络循环卷积计算数据重用的系统A Convolutional Neural Network Circular Convolution Computational Data Reuse System

技术领域technical field

本发明涉及嵌入式可重构设计领域,具体是一种面向粗粒度可重构系统的卷积神经网络循环卷积计算数据重用的系统,可用于高性能可重构系统,实现卷积神经网络进行大数量循环卷积运算,尽量使用已有数据,对数据进行重用,提高运算速率,减少数据读取带宽压力。The invention relates to the field of embedded reconfigurable design, in particular to a system for reusing convolutional neural network cyclic convolution calculation data for coarse-grained reconfigurable systems, which can be used in high-performance reconfigurable systems to realize convolutional neural networks Carry out a large number of circular convolution operations, use existing data as much as possible, reuse data, improve operation speed, and reduce data reading bandwidth pressure.

背景技术Background technique

可重构处理器体系结构是一种理想的应用加速平台,由于硬件结构可以根据程序的数据流图重新组织,可重构阵列已被证明其对于科学计算或多媒体应用具有良好的性能提升潜力。Reconfigurable processor architecture is an ideal platform for application acceleration. Because the hardware structure can be reorganized according to the data flow graph of the program, reconfigurable arrays have been proved to have good performance improvement potential for scientific computing or multimedia applications.

卷积运算在图像处理领域有着广泛的用途,例如在图像滤波、图像增强、图像分析等处理时都要用到卷积运算,图像卷积运算实质是一种矩阵运算,其特点是运算量大,并且数据复用率高,用软件计算图像卷积很难达到实时性的要求。Convolution operation has a wide range of uses in the field of image processing. For example, convolution operation is used in image filtering, image enhancement, image analysis and other processing. Image convolution operation is essentially a matrix operation, which is characterized by a large amount of calculation. , and the data reuse rate is high, it is difficult to use software to calculate image convolution to meet the real-time requirements.

卷积神经网络作为一种前馈多层神经网络,能够对大量有标签数据进行自动学习并从中提取复杂特征,卷积神经网络的优点在于只需要对输入图像进行较少的预处理就能够从像素图像中识别出视觉模式,并且对有较多变化的识别对象也有较好的识别效果,同时卷积神经网络的识别能力不易受到图像的畸变或简单几何变换的影响。作为多层人工神经网络研究的一个重要方向,卷积神经网络多年来一直是研究的热点。As a feed-forward multi-layer neural network, convolutional neural network can automatically learn a large amount of labeled data and extract complex features from it. The advantage of convolutional neural network is that it only needs less preprocessing of the input image. The visual pattern is recognized in the pixel image, and it also has a good recognition effect on the recognition objects with many changes. At the same time, the recognition ability of the convolutional neural network is not easily affected by the distortion of the image or simple geometric transformation. As an important direction of multi-layer artificial neural network research, convolutional neural network has been a research hotspot for many years.

将卷积模板放在图像点阵的左上角,则卷积模板必与图像点阵中的左上角的分割矩阵重合。把它们的重合项对应相乘,之后再全部求和,就得到了第一个结果点。然后,再将卷积模板右移一列,即可求出第二个结果点。如此这样,卷积模板在图像点阵中遍历一遍,就完全可以求出一帧图像的卷积。数据的复用率很高,可是传统方式的缓存或直接从外部直接读取,由于受到数据读取带宽的限制,以及没有可配置阵列,完成多层卷积循环运算,效率较低。If the convolution template is placed in the upper left corner of the image lattice, the convolution template must coincide with the segmentation matrix in the upper left corner of the image lattice. Multiply their coincident items correspondingly, and then sum them all up to get the first result point. Then, move the convolution template to the right by one column to obtain the second result point. In this way, the convolution template traverses the image lattice once, and the convolution of a frame of image can be completely calculated. The data reuse rate is very high, but the traditional way of caching or directly reading from the outside is limited by the data reading bandwidth and there is no configurable array to complete the multi-layer convolution cycle operation, which is inefficient.

发明内容Contents of the invention

发明目的:针对现有技术中存在的问题与不足,本发明提供一种面向粗粒度可重构系统的卷积神经网络循环卷积计算数据重用的系统,可以加速完成大数量卷积计算的要求,降低对宽带的压力,并且卷积运算阵列是可配置的。卷积神经网络的计算性能与硬件资源的占用,是卷积神经网络在粗粒度可重构体系实现中需要进行折衷的两个方面,基于可重构处理阵列的卷积神经网络的设计目标是在满足应用性能要求的前提下,充分利用可重构阵列提供的计算资源和存储资源,利用输入图像数据重用结构,利用循环卷积运算中的高重用率,加之粗粒度可重构阵列的可配置性,在数据读取带宽,计算资源限制的情况下,完成卷积计算,达成一个较优的折衷。Purpose of the invention: Aiming at the problems and deficiencies in the prior art, the present invention provides a system for reusing convolutional neural network data for coarse-grained reconfigurable systems, which can accelerate the completion of large-scale convolution calculations. , reducing the pressure on the broadband, and the array of convolution operations is configurable. The computing performance of convolutional neural network and the occupation of hardware resources are two aspects that need to be compromised in the realization of coarse-grained reconfigurable system of convolutional neural network. The design goal of convolutional neural network based on reconfigurable processing array is Under the premise of meeting the application performance requirements, make full use of the computing resources and storage resources provided by the reconfigurable array, use the input image data reuse structure, use the high reuse rate in the circular convolution operation, and the reconfigurable coarse-grained array Configurability, in the case of limited data read bandwidth and computing resources, the convolution calculation is completed to achieve a better compromise.

技术方案:一种面向粗粒度可重构系统的卷积神经网络循环卷积计算数据重用的系统,包括主控制器及连接控制模块、输入数据重用模块、卷积循环运算处理阵列和数据传输通路。Technical solution: A convolutional neural network cyclic convolution calculation data reuse system for coarse-grained reconfigurable systems, including a main controller and connection control module, input data reuse module, convolution cyclic operation processing array and data transmission path .

所述主控制器及连接控制模块,完成外界卷积运算请求的接收,计算阵列配置信息加载,计算结果返回及对循环运行状态的监控,控制外部存储器和输入数据重用模块之间数据传输。The main controller and the connection control module complete the reception of the external convolution operation request, load the calculation array configuration information, return the calculation result and monitor the cycle operation status, and control the data transmission between the external memory and the input data reuse module.

所述输入数据重用模块,是连接外部输入数据存储器与循环卷积运算处理阵列之间的数据重用模块,完成输入数据重用,其中模块上半部分是图像矩阵宽度数量FIFO,下半部分是图像矩阵宽度数量移位寄存器。FIFO从外界存储器不断加载输入数据,分别对应卷积计算的一列,当移位寄存器根据卷积步长移动,FIFO为移位寄存器更换其中一列,之后完成一次卷积运算,达到数据重用的效果。移位寄存器用于利用上半部分FIFO部分提供更新的邻域数据。由于多个移位寄存器采用环形寻址方式,来自FIFO的数据将总是替换环形移位寄存器中最旧的数据,之后把数据传输给运算阵列完成卷积运算。The input data reuse module is a data reuse module connected between the external input data memory and the circular convolution operation processing array to complete the input data reuse, wherein the upper part of the module is the image matrix width quantity FIFO, and the lower part is the image matrix Width Quantity Shift Register. FIFO continuously loads input data from the external memory, corresponding to a column of convolution calculation. When the shift register moves according to the convolution step size, FIFO replaces one of the columns for the shift register, and then completes a convolution operation to achieve the effect of data reuse. A shift register is used to provide updated neighborhood data using the upper half FIFO section. Since multiple shift registers adopt ring addressing mode, the data from FIFO will always replace the oldest data in the ring shift register, and then transfer the data to the operation array to complete the convolution operation.

此模块实现具体步骤如下:The specific steps to implement this module are as follows:

数据一次输入S(1<=S<最大图像矩阵宽度)个32位数据给FIFO,当卷积运算用过一个寄存器中数据,FIFO就会把自己的数据传输给移位寄存器,移位寄存器需更新一列K(1<=K<最大图像矩阵宽度,K为此次卷积计算卷积核矩阵宽度)个32位数据,加上原有K-1列数据,移位寄存器把K*K个数据传输给卷积计算矩阵,之后继续向后根据步长移动,同样只需更新一列,实现入输入数据重用。The data is input S (1<=S<maximum image matrix width) pieces of 32-bit data to FIFO at one time. When the convolution operation uses the data in a register, FIFO will transfer its own data to the shift register. The shift register needs Update a column of K (1<=K<maximum image matrix width, K is the convolution kernel matrix width for this convolution calculation) 32-bit data, plus the original K-1 column data, the shift register will K*K data Transfer to the convolution calculation matrix, and then continue to move backward according to the step size. Also only need to update one column to realize the reuse of input data.

所述循环卷积运算处理阵列,从输入数据重用模块里获取所需输入数据,完成卷积计算,并在计算完成后将数据送出的功能。The circular convolution operation processing array obtains the required input data from the input data reuse module, completes the convolution calculation, and sends the data out after the calculation is completed.

所述数据传输通路,是完成主控制器及接口控制模块,循环卷积运算处理阵列,输入数据重用模块之间的数据传输通道。The data transmission channel is a data transmission channel between the main controller and the interface control module, the circular convolution operation processing array, and the input data reuse module.

进一步,主控制器及连接控制模块包括主控制和连接控制器,连接控制器有预取判断及数据重用配置控制作用,预取判断应用来判断要进行卷积运算时所需的数据是否准备就位,如果数据就位,循环卷积运算处理阵列执行卷积循环计算,如果没有,那就等待数据就位。缓存中的数据是由外部存储器中读取的,本发明采用直接内存存取方式读取,当需要外部数据输入时,主控制器发出向外部存储器读取数据命令,之后主控制器就不对存储读取进行控制,连接控制器会发一个停止信号给主控制器,主控制器放弃对地址总线、数据总线和有关控制总线的使用权,输入数据重用模块的数据需要更新时,就通过连接控制器,直接读取外存中的数据。Further, the main controller and the connection control module include the main control and the connection controller. The connection controller has the functions of prefetch judgment and data reuse configuration control. The prefetch judgment is used to judge whether the data required for the convolution operation is ready. If the data is in place, the circular convolution operation processing array performs the convolution cycle calculation, if not, it waits for the data to be in place. The data in the cache is read from the external memory, and the present invention adopts direct memory access mode to read. When external data input is required, the main controller sends a command to read data from the external memory, and then the main controller does not store data. Read control, the connection controller will send a stop signal to the main controller, the main controller will give up the right to use the address bus, data bus and related control bus, and when the data of the input data reuse module needs to be updated, it will pass the connection control device, directly read the data in the external memory.

循环卷积运算处理阵列包括阵列配置模块,包括阵列配置模块、存储处理单元和计算处理单元,此模块应用在匹配数据重用模块时,根据卷积计算规模及步长,阵列配置模块对计算阵列进行配置,利用阵列可用的计算资源,每次计算完成一次后重新配置阵列,计算处理单元根据计算规模进行调整,进行下一次卷积运算。The circular convolution operation processing array includes an array configuration module, including an array configuration module, a storage processing unit, and a calculation processing unit. Configuration, using the available computing resources of the array, reconfigure the array after each calculation is completed, and the calculation processing unit is adjusted according to the calculation scale to perform the next convolution operation.

所述卷积运算处理阵列配置控制器,在接口控制模块加载配置信息之后,运算阵列根据循环卷积循环规模的大小以及步长信息,可使卷积图像矩阵规模变量为从1到最大图像矩阵宽度之间取值计算,每一次卷积运算都可以对运算阵列进行重新配置,卷积核规模较小时,卷积阵列还是可以利用整个卷积计算矩阵,以此来缩短卷积计算总时长。The convolution operation processing array configuration controller, after the interface control module loads the configuration information, the operation array can make the convolution image matrix scale variable from 1 to the maximum image matrix according to the size of the circular convolution cycle scale and the step size information For the value calculation between widths, each convolution operation can reconfigure the operation array. When the convolution kernel is small, the convolution array can still use the entire convolution calculation matrix to shorten the total time of convolution calculation.

存储计算单元结构存储指令与数据重用模块紧密关联,它在循环控制部件的驱动下,从地址队列中取地址或直接通过地址生成部件计算得到地址,向数据重用模块发出读数据请求,返回数据写入数据队列中,在循环结束部件的控制下,读取移位寄存器中数据。Storage and calculation unit structure The storage instruction is closely related to the data reuse module. Driven by the loop control unit, it takes the address from the address queue or directly calculates the address through the address generation unit, sends a read data request to the data reuse module, and returns the data to write Into the data queue, under the control of the loop end component, read the data in the shift register.

计算处理单元实现数据流动过程中的计算和选择功能,循环下标不断地从寄存器组中取得数据,并把数据传递给计算处理单元阵列,计算处理单元阵列按照固定的连接关系进行运算,运算的结果存储到指定的位置。The calculation and processing unit realizes the calculation and selection functions in the process of data flow. The cyclic subscript continuously obtains data from the register group and transfers the data to the calculation and processing unit array. The calculation and processing unit array performs operations according to the fixed connection relationship. The result is stored to the specified location.

循环卷积运算处理阵列应用持续流水线操作,此操作循环映射到阵列配置模块,阵列配置模块来配置循环控制变量的初值、终值和步进值,循环程序的执行不需要外部控制,各个计算阵列单元之间构成流水线链接,完成循环卷积在流水线上的调度。The cyclic convolution operation processes the continuous pipeline operation of the array application. This operation is cyclically mapped to the array configuration module. The array configuration module configures the initial value, final value and step value of the loop control variable. The execution of the loop program does not require external control. Each calculation The pipeline link is formed between the array units, and the scheduling of the circular convolution on the pipeline is completed.

附图说明Description of drawings

图1为本发明实施例中卷积计算的粗粒度可重构阵列体系结构图;FIG. 1 is a schematic diagram of a coarse-grained reconfigurable array architecture for convolution calculation in an embodiment of the present invention;

图2为本发明实施例中输入数据重用模块数据轮转调度硬件结构图;FIG. 2 is a hardware structural diagram of the data round-robin scheduling of the input data reuse module in an embodiment of the present invention;

图3为本发明实施例中粗粒度可重构卷积计算阵列中存储处理单元的结构框图;3 is a structural block diagram of a storage processing unit in a coarse-grained reconfigurable convolution computing array in an embodiment of the present invention;

图4为本发明实施例中粗粒度可重构卷积计算阵列计算处理单元的结构框图;4 is a structural block diagram of a coarse-grained reconfigurable convolution calculation array calculation processing unit in an embodiment of the present invention;

图5为本发明实施例中循环卷积在可重构阵列里实现的流程图。Fig. 5 is a flow chart of implementing circular convolution in a reconfigurable array in an embodiment of the present invention.

具体实施方式detailed description

下面结合具体实施例,进一步阐明本发明,应理解这些实施例仅用于说明本发明而不用于限制本发明的范围,在阅读了本发明之后,本领域技术人员对本发明的各种等价形式的修改均落于本申请所附权利要求所限定的范围。Below in conjunction with specific embodiment, further illustrate the present invention, should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various equivalent forms of the present invention All modifications fall within the scope defined by the appended claims of the present application.

面向粗粒度可重构系统的卷积神经网络循环卷积计算数据重用的系统,包括主控制器及连接控制模块、输入数据重用模块、卷积循环运算处理阵列和数据传输通路。Coarse-grained reconfigurable system-oriented convolution neural network loop convolution calculation data reuse system, including the main controller and connection control module, input data reuse module, convolution loop operation processing array and data transmission path.

主控制器及连接控制模块,完成外界卷积运算请求的接收,计算阵列配置信息加载,计算结果返回及对循环运行状态的监控,控制外部存储器和输入数据重用模块之间数据传输。The main controller and the connection control module complete the reception of external convolution operation requests, load the calculation array configuration information, return the calculation results and monitor the cycle operation status, and control the data transmission between the external memory and the input data reuse module.

输入数据重用模块,是连接外部输入数据存储器与循环卷积运算处理阵列之间的数据重用模块,其中模块上半部分是图像矩阵宽度数量FIFO,下半部分是图像矩阵宽度数量移位寄存器。The input data reuse module is a data reuse module connected between the external input data memory and the circular convolution operation processing array, wherein the upper part of the module is the image matrix width quantity FIFO, and the lower part is the image matrix width quantity shift register.

循环卷积运算处理阵列,从输入数据重用模块里获取所需输入数据,完成卷积计算,并在计算完成后将数据送出的功能。The circular convolution operation processes the array, obtains the required input data from the input data reuse module, completes the convolution calculation, and sends the data out after the calculation is completed.

数据传输通路,是完成主控制器及接口控制模块,循环卷积运算处理阵列,输入数据重用模块之间的数据传输通道。The data transmission channel is the data transmission channel between the main controller and the interface control module, the circular convolution operation processing array, and the input data reuse module.

主控制器及连接控制模块包括主控制和连接控制器,连接控制器有预取判断及数据重用配置控制作用,预取判断应用来判断要进行卷积运算时所需的数据是否准备就位,如果数据就位,循环卷积运算处理阵列执行卷积循环计算,如果没有,那就等待数据就位。缓存中的数据是由外部存储器中读取的,本发明采用直接内存存取方式读取,当需要外部数据输入时,主控制器发出向外部存储器读取数据命令,之后主控制器就不对存储读取进行控制,连接控制器会发一个停止信号给主控制器,主控制器放弃对地址总线、数据总线和有关控制总线的使用权,输入数据重用模块的数据需要更新时,就通过连接控制器,直接读取外存中的数据。The main controller and the connection control module include the main control and the connection controller. The connection controller has the functions of prefetch judgment and data reuse configuration control. The prefetch judgment is used to judge whether the data required for convolution operation is ready. If the data is in place, the circular convolution operation processing array performs the convolution cycle calculation, if not, it waits for the data to be in place. The data in the cache is read from the external memory, and the present invention adopts direct memory access mode to read. When external data input is required, the main controller sends a command to read data from the external memory, and then the main controller does not store data. Read control, the connection controller will send a stop signal to the main controller, the main controller will give up the right to use the address bus, data bus and related control bus, and when the data of the input data reuse module needs to be updated, it will pass the connection control device, directly read the data in the external memory.

如图1所示,具体计算阵列图及数据流的粗粒度可重构阵列图。可配置的PE单元占据了最主要部分,也是因为可重构阵列是完成卷积计算的具体部分,其余部分主要是为了把开始和结束的指令传输进来。通过图1可以看出,可配置阵列中存储处理单元直接连接输入数据重用模块(如图2),根据步长及卷积核规模信息,输入数据重用模块将卷积运算所需数据流传输给计算处理单元,路由器配置数据流通过互联网络路由到达各个计算处理单元,同时连接控制器担负一次卷积计算完成,将数据信息传出,并把计算处理单元重新配置,开始下一次新的运算。As shown in Figure 1, the coarse-grained reconfigurable array graph of the specific computing array graph and data flow. The configurable PE unit occupies the most important part, also because the reconfigurable array is the specific part that completes the convolution calculation, and the rest is mainly to transmit the start and end instructions. It can be seen from Figure 1 that the storage processing unit in the configurable array is directly connected to the input data reuse module (as shown in Figure 2). According to the step size and convolution kernel scale information, the input data reuse module transmits the data stream required for the convolution operation to The computing processing unit and the router configure the data stream to reach each computing processing unit through the Internet route, and at the same time, the connection controller is responsible for completing a convolution calculation, transmitting the data information, and reconfiguring the computing processing unit to start the next new operation.

输入数据重用模块的数据轮转调度硬件图如图2所示,以卷积核大小为K*K(K为卷积核宽度)为例,在外部存储器和移位寄存器之间加上了FIFO,数据一次输入S个32位数据给FIFO,当卷积运算用过一个寄存器中数据,FIFO就会把自己的数据传输给移位寄存器,移位寄存器需更新一列K个32位数据,加上原有K-1列数据,移位寄存器把K*K个数据传输给卷积计算矩阵,这样的输入图像数据重用结构,为高效率卷积运算提供了支撑。The data round-robin scheduling hardware diagram of the input data reuse module is shown in Figure 2. Taking the convolution kernel size as K*K (K is the convolution kernel width) as an example, a FIFO is added between the external memory and the shift register. The data is input S pieces of 32-bit data to the FIFO at a time. When the convolution operation uses the data in a register, the FIFO will transfer its own data to the shift register. The shift register needs to update a column of K 32-bit data, plus the original K-1 columns of data, the shift register transmits K*K data to the convolution calculation matrix. This kind of input image data reuse structure provides support for high-efficiency convolution operations.

如图3所示,对应的是存储处理单元的结构框图,在输入通道接收到地址信号时,此时就对应存储处理单元在阵列中的位置,这些存储处理单元完成对应数据的地址的生成,生成了地址就会对应会用到输入图像数据重用模块中的数据,此时把数据输出给计算处理单元。循环控制运算数据对应地址的生成,以及卷积运算的结束,把计算所得数据同步传输到外部存储器中。而且循环判断结构在数据不对或不足时,结束当前运算,把信息传给外部存储器,进行数据更新。As shown in Figure 3, it corresponds to the structural block diagram of the storage processing unit. When the input channel receives the address signal, it corresponds to the position of the storage processing unit in the array. These storage processing units complete the generation of the address of the corresponding data. When the address is generated, it will correspond to the data in the input image data reuse module, and then output the data to the computing processing unit. The loop controls the generation of the address corresponding to the operation data, and the end of the convolution operation, and synchronously transmits the calculated data to the external memory. Moreover, when the data in the loop judging structure is wrong or insufficient, the current operation is ended, and the information is transmitted to the external memory for data update.

如图4所示,对应的是计算处理单元的结构图,计算处理单元在接收到输入数据时,应用内部乘法器及加法器完成卷积运算,完成一次运算,根据配置控制器,重新配置运算所需要的计算处理单元,完成可配置控制,当外部循环大小,步长变换时,还是能够很好完成运算。As shown in Figure 4, it corresponds to the structure diagram of the calculation processing unit. When the calculation processing unit receives the input data, it uses the internal multiplier and adder to complete the convolution operation, completes an operation, and reconfigures the operation according to the configuration controller. The required calculation and processing unit can complete the configurable control, and when the size and step size of the outer loop are changed, the operation can still be completed well.

结合图1、图2,卷积循环计算的具体步骤如图5所示,包括如下步骤:Combined with Figure 1 and Figure 2, the specific steps of convolution cycle calculation are shown in Figure 5, including the following steps:

1)如果需要粗粒度可重构阵列体系完成大量卷积运算,首先要对这个卷积控制体系发出请求,当主处理器接收到请求,就会向连接处理单元发出指令;1) If the coarse-grained reconfigurable array system is required to complete a large number of convolution operations, it must first send a request to the convolution control system. When the main processor receives the request, it will send instructions to the connection processing unit;

2)连接处理单元首先判断输入数据重用模块中所需数据是非已经就位,如果没有就会发出等待信号,同时用直接存储存取对缓冲器进行数据传输;2) The connection processing unit first judges whether the required data in the input data reuse module is in place, if not, it will send a waiting signal, and at the same time use direct storage access to transfer data to the buffer;

3)在数据就续后,通知正在等待的运算指令,控制循环开始,卷积循环运算处理阵列中配置控制单元就会对阵列进行配置,计算阵列里的访存配置模块就会计算数剧所处位置,之后计算阵列对此位置的数据进行卷积计算,依次向后面流水进行。3) After the data is completed, the waiting operation instruction is notified, the control cycle starts, the configuration control unit in the convolution cycle operation processing array will configure the array, and the memory access configuration module in the calculation array will calculate the location of the data. position, and then the calculation array performs convolution calculation on the data at this position, which is performed sequentially.

4)Y(最大图像矩阵宽度)个FIFO缓存通过直接存储读取方式不断更新寄存器中已用过数据,当再进入此位置时,数据已完成更新,不间断进行运算,也不用每次卷积运算到外存去访问数据。4) Y (maximum image matrix width) FIFO buffers continuously update the used data in the register through direct storage and reading. When entering this position again, the data has been updated, and the operation is performed without interruption, and there is no need for each convolution Operation to external storage to access data.

5)连接控制器控制循环完成,当计算完成,将最终数据输出到外部存储器中,这次卷积运算阵列完成。5) The control cycle of the connection controller is completed. When the calculation is completed, the final data is output to the external memory, and the convolution operation array is completed this time.

在具体进行大数量循环卷积运算时,当计算资源有限时,应用数据重用的方法,加上可配置的可重构阵列,流水线完成卷积运算,我们提高了运算效率和速度。设置了对比试验,分别为对比验证系统A、对比验证系统B。其中,对比验证系统A,即传统的不支持阵列配置与重用的可重构系统。对比验证系统B,即本发明所提出的支持数据预取与重用的可重构系统。选取16x16的输入数据矩阵,3x3的卷积矩阵,步长为1,设置了10个输入数据,10个卷积权重矩阵,同时进行卷积运算。实验结果表明,对比验证系统B可以获得对比验证系统A的平均1.76倍的性能提升。When performing a large number of circular convolution operations, when computing resources are limited, the method of data reuse is applied, coupled with configurable and reconfigurable arrays, and the pipeline completes convolution operations, which improves the efficiency and speed of operations. A comparative test is set up, which are comparative verification system A and comparative verification system B respectively. Among them, system A is compared and verified, that is, a traditional reconfigurable system that does not support array configuration and reuse. Compare and verify system B, which is the reconfigurable system that supports data prefetching and reuse proposed by the present invention. Select a 16x16 input data matrix, a 3x3 convolution matrix, and a step size of 1, set 10 input data, 10 convolution weight matrices, and perform convolution operations at the same time. The experimental results show that the comparison verification system B can obtain an average performance improvement of 1.76 times that of the comparison verification system A.

Claims (5)

1.一种面向粗粒度可重构系统的卷积神经网络循环卷积计算数据重用的系统,其特征在于:包括主控制器及连接控制模块、输入数据重用模块、卷积循环运算处理阵列和数据传输通路;1. A system for reusing convolutional neural network cyclic convolution computing data for coarse-grained reconfigurable systems, characterized in that it includes a main controller and a connection control module, an input data reuse module, a convolutional cyclic operation processing array and data transmission path; 所述主控制器及连接控制模块,完成外界卷积运算请求的接收,计算阵列配置信息加载,计算结果返回及对循环运行状态的监控,控制外部存储器和输入数据重用模块之间数据传输;The main controller and the connection control module complete the reception of the external convolution operation request, load the calculation array configuration information, return the calculation result and monitor the cycle operation state, and control the data transmission between the external memory and the input data reuse module; 所述输入数据重用模块,是连接外部输入数据存储器与循环卷积运算处理阵列之间的数据重用模块,其中模块上半部分是图像矩阵宽度数量FIFO,下半部分是图像矩阵宽度数量移位寄存器;The input data reuse module is a data reuse module connected between the external input data memory and the circular convolution operation processing array, wherein the upper part of the module is an image matrix width quantity FIFO, and the lower part is an image matrix width quantity shift register ; 所述循环卷积运算处理阵列,从输入数据重用模块里获取所需输入数据,完成卷积计算,并在计算完成后将数据送出的功能。The circular convolution operation processing array obtains the required input data from the input data reuse module, completes the convolution calculation, and sends the data out after the calculation is completed. 2.所述数据传输通路,是完成主控制器及接口控制模块,循环卷积运算处理阵列,输入数据重用模块之间的数据传输通道。2. The data transmission channel is a data transmission channel between the main controller and the interface control module, the circular convolution operation processing array, and the input data reuse module. 3.如权利要求1所述的面向粗粒度可重构系统的卷积神经网络循环卷积计算数据重用的系统,其特征在于:主控制器及连接控制模块包括主控制和连接控制器,连接控制器有预取判断及数据重用配置控制作用,预取判断应用来判断要进行卷积运算时所需的数据是否准备就位,如果数据就位,循环卷积运算处理阵列执行卷积循环计算,如果没有,那就等待数据就位;缓存中的数据是由外部存储器中读取的,采用直接内存存取方式读取,当需要外部数据输入时,主控制器发出向外部存储器读取数据命令,之后主控制器就不对存储读取进行控制,连接控制器会发一个停止信号给主控制器,主控制器放弃对地址总线、数据总线和有关控制总线的使用权,输入数据重用模块的数据需要更新时,就通过连接控制器,直接读取外存中的数据。3. The system for reusing convolutional neural network cyclic convolution calculation data for coarse-grained reconfigurable systems as claimed in claim 1, wherein: the main controller and the connection control module include a main control and a connection controller, and the connection The controller has the functions of prefetch judgment and data reuse configuration control. The prefetch judgment is used to judge whether the data required for convolution operation is ready. If the data is in place, the circular convolution operation processing array performs convolution cycle calculation. , if not, then wait for the data to be in place; the data in the cache is read from the external memory, and the direct memory access method is used to read. When external data input is required, the main controller sends a read data command, after which the main controller will not control the memory reading, the connection controller will send a stop signal to the main controller, the main controller will give up the right to use the address bus, data bus and related control bus, and input data reuse module When the data needs to be updated, it directly reads the data in the external storage by connecting to the controller. 4.如权利要求1所述的面向粗粒度可重构系统的卷积神经网络循环卷积计算数据重用的系统,其特征在于:循环卷积运算处理阵列包括阵列配置模块,包括阵列配置模块、存储处理单元和计算处理单元,此模块应用在匹配输入数据重用模块时,根据卷积计算规模及步长,阵列配置模块对计算阵列进行配置,利用阵列可用的计算资源,每次计算完成一次后重新配置阵列,计算处理单元根据计算规模进行调整,进行下一次卷积运算;循环卷积运算处理阵列应用持续流水线操作,此操作循环映射到阵列配置模块,阵列配置模块来配置循环控制变量的初值、终值和步进值,循环程序的执行不需要外部控制,各个计算阵列单元之间构成流水线链接,完成循环卷积在流水线上的调度。4. The system for reusing convolutional neural network cyclic convolution calculation data for coarse-grained reconfigurable systems as claimed in claim 1, wherein the cyclic convolution operation processing array includes an array configuration module, including an array configuration module, Storage processing unit and calculation processing unit. This module is used to match the input data reuse module. According to the convolution calculation scale and step size, the array configuration module configures the calculation array and uses the available computing resources of the array. After each calculation is completed once The array is reconfigured, and the calculation processing unit is adjusted according to the calculation scale, and the next convolution operation is performed; the circular convolution operation processes the continuous pipeline operation of the array application, and this operation is cyclically mapped to the array configuration module, and the array configuration module configures the initial value of the loop control variable. Value, final value, and step value, the execution of the cyclic program does not require external control, and a pipeline link is formed between each calculation array unit to complete the scheduling of the cyclic convolution on the pipeline. 5.如权利要求1所述的面向粗粒度可重构系统的卷积神经网络循环卷积计算数据重用的系统,其特征在于:所述输入数据重用模块实现具体步骤如下:5. The system for reusing convolutional neural network cyclic convolution calculation data for coarse-grained reconfigurable systems as claimed in claim 1, wherein: the input data reuse module realizes specific steps as follows: 数据一次输入S个32位数据给FIFO,当卷积运算用过一个寄存器中数据,FIFO就会把自己的数据传输给移位寄存器,移位寄存器需更新一列K个32位数据,加上原有K-1列数据,移位寄存器把K*K个数据传输给卷积计算矩阵,之后继续向后根据步长移动,同样只需更新一列,实现入输入数据重用。The data is input S pieces of 32-bit data to the FIFO at a time. When the convolution operation uses the data in a register, the FIFO will transfer its own data to the shift register. The shift register needs to update a column of K 32-bit data, plus the original K-1 columns of data, the shift register transmits K*K data to the convolution calculation matrix, and then continues to move backward according to the step size, and only needs to update one column to realize the reuse of input data.
CN201610633040.9A 2016-08-04 2016-08-04 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing Pending CN106250103A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610633040.9A CN106250103A (en) 2016-08-04 2016-08-04 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610633040.9A CN106250103A (en) 2016-08-04 2016-08-04 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing

Publications (1)

Publication Number Publication Date
CN106250103A true CN106250103A (en) 2016-12-21

Family

ID=58079364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610633040.9A Pending CN106250103A (en) 2016-08-04 2016-08-04 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing

Country Status (1)

Country Link
CN (1) CN106250103A (en)

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106775599A (en) * 2017-01-09 2017-05-31 南京工业大学 Multi-computing-unit coarse-grained reconfigurable system and method for recurrent neural network
CN106844294A (en) * 2016-12-29 2017-06-13 华为机器有限公司 Convolution operation chip and communication equipment
CN107103754A (en) * 2017-05-10 2017-08-29 华南师范大学 A kind of road traffic condition Forecasting Methodology and system
CN107229598A (en) * 2017-04-21 2017-10-03 东南大学 A kind of low power consumption voltage towards convolutional neural networks is adjustable convolution computing module
CN107590085A (en) * 2017-08-18 2018-01-16 浙江大学 A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN107635138A (en) * 2017-10-19 2018-01-26 珠海格力电器股份有限公司 Image processing apparatus
CN107832262A (en) * 2017-10-19 2018-03-23 珠海格力电器股份有限公司 Convolution operation method and device
CN107862650A (en) * 2017-11-29 2018-03-30 中科亿海微电子科技(苏州)有限公司 The method of speed-up computation two dimensional image CNN convolution
CN108009126A (en) * 2017-12-15 2018-05-08 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108182471A (en) * 2018-01-24 2018-06-19 上海岳芯电子科技有限公司 A kind of convolutional neural networks reasoning accelerator and method
CN108198125A (en) * 2017-12-29 2018-06-22 深圳云天励飞技术有限公司 A kind of image processing method and device
CN108241890A (en) * 2018-01-29 2018-07-03 清华大学 A reconfigurable neural network acceleration method and architecture
WO2018137177A1 (en) * 2017-01-25 2018-08-02 北京大学 Method for convolution operation based on nor flash array
CN108564524A (en) * 2018-04-24 2018-09-21 开放智能机器(上海)有限公司 A kind of convolutional calculation optimization method of visual pattern
CN108595379A (en) * 2018-05-08 2018-09-28 济南浪潮高新科技投资发展有限公司 A kind of parallelization convolution algorithm method and system based on multi-level buffer
CN108596331A (en) * 2018-04-16 2018-09-28 浙江大学 A kind of optimization method of cell neural network hardware structure
CN108665063A (en) * 2018-05-18 2018-10-16 南京大学 Two-way simultaneous for BNN hardware accelerators handles convolution acceleration system
CN108681984A (en) * 2018-07-26 2018-10-19 珠海市微半导体有限公司 A kind of accelerating circuit of 3*3 convolution algorithms
CN108701015A (en) * 2017-11-30 2018-10-23 深圳市大疆创新科技有限公司 Computing device, chip, device and related method for neural network
CN108717571A (en) * 2018-06-01 2018-10-30 阿依瓦(北京)技术有限公司 A kind of acceleration method and device for artificial intelligence
CN108764182A (en) * 2018-06-01 2018-11-06 阿依瓦(北京)技术有限公司 A kind of acceleration method and device for artificial intelligence of optimization
WO2018232615A1 (en) * 2017-06-21 2018-12-27 华为技术有限公司 METHOD AND DEVICE FOR PROCESSING SIGNALS
CN109272112A (en) * 2018-07-03 2019-01-25 北京中科睿芯科技有限公司 A kind of data reusing command mappings method, system and device towards neural network
CN109284475A (en) * 2018-09-20 2019-01-29 郑州云海信息技术有限公司 A matrix convolution calculation module and matrix convolution calculation method
CN109375952A (en) * 2018-09-29 2019-02-22 北京字节跳动网络技术有限公司 Method and apparatus for storing data
CN109460813A (en) * 2018-09-10 2019-03-12 中国科学院深圳先进技术研究院 Accelerated method, device, equipment and the storage medium that convolutional neural networks calculate
CN109711533A (en) * 2018-12-20 2019-05-03 西安电子科技大学 FPGA-based convolutional neural network module
CN109754359A (en) * 2017-11-01 2019-05-14 腾讯科技(深圳)有限公司 A method and system for pooling processing applied to convolutional neural networks
CN109816093A (en) * 2018-12-17 2019-05-28 北京理工大学 A One-way Convolution Implementation Method
CN109992541A (en) * 2017-12-29 2019-07-09 深圳云天励飞技术有限公司 A data handling method, related product and computer storage medium
CN110069444A (en) * 2019-06-03 2019-07-30 南京宁麒智能计算芯片研究院有限公司 A kind of computing unit, array, module, hardware system and implementation method
CN110325963A (en) * 2017-02-28 2019-10-11 微软技术许可有限责任公司 Multi-function unit for programmable hardware nodes for neural network processing
CN110383237A (en) * 2017-02-28 2019-10-25 德克萨斯仪器股份有限公司 Reconfigurable matrix multiplier system and method
CN110377874A (en) * 2019-07-23 2019-10-25 江苏鼎速网络科技有限公司 Convolution algorithm method and system
CN110413561A (en) * 2018-04-28 2019-11-05 北京中科寒武纪科技有限公司 Data accelerate processing system
WO2019231254A1 (en) * 2018-05-30 2019-12-05 Samsung Electronics Co., Ltd. Processor, electronics apparatus and control method thereof
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
WO2020051751A1 (en) * 2018-09-10 2020-03-19 中国科学院深圳先进技术研究院 Convolution neural network computing acceleration method and apparatus, device, and storage medium
CN111045958A (en) * 2018-10-11 2020-04-21 展讯通信(上海)有限公司 Acceleration engine and processor
WO2020077565A1 (en) * 2018-10-17 2020-04-23 北京比特大陆科技有限公司 Data processing method and apparatus, electronic device, and computer readable storage medium
CN111095242A (en) * 2017-07-24 2020-05-01 特斯拉公司 Vector calculation unit
CN111176727A (en) * 2017-07-20 2020-05-19 上海寒武纪信息科技有限公司 Computing device and computing method
CN111291880A (en) * 2017-10-30 2020-06-16 上海寒武纪信息科技有限公司 Computing device and computing method
CN111465924A (en) * 2017-12-12 2020-07-28 特斯拉公司 System and method for converting matrix input to vectorized input of a matrix processor
US10733742B2 (en) 2018-09-26 2020-08-04 International Business Machines Corporation Image labeling
CN111523642A (en) * 2020-04-10 2020-08-11 厦门星宸科技有限公司 Data reuse method, operation method and device, and chip for convolution operation
CN109800867B (en) * 2018-12-17 2020-09-29 北京理工大学 Data calling method based on FPGA off-chip memory
CN111859797A (en) * 2020-07-14 2020-10-30 Oppo广东移动通信有限公司 A data processing method and device, and storage medium
CN112204585A (en) * 2018-05-30 2021-01-08 三星电子株式会社 Processor, electronic device and control method thereof
WO2021007037A1 (en) * 2019-07-09 2021-01-14 MemryX Inc. Matrix data reuse techniques in processing systems
US10928456B2 (en) 2017-08-17 2021-02-23 Samsung Electronics Co., Ltd. Method and apparatus for estimating state of battery
CN112992248A (en) * 2021-03-12 2021-06-18 西安交通大学深圳研究院 PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register
US11176427B2 (en) 2018-09-26 2021-11-16 International Business Machines Corporation Overlapping CNN cache reuse in high resolution and streaming-based deep learning inference engines
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
WO2022179075A1 (en) * 2021-02-26 2022-09-01 成都商汤科技有限公司 Data processing method and apparatus, computer device and storage medium
CN115168284A (en) * 2022-07-06 2022-10-11 中国科学技术大学 Coarse-grained reconfigurable array system and computing method for deep learning
US11694074B2 (en) 2018-09-07 2023-07-04 Samsung Electronics Co., Ltd. Integrated circuit that extracts data, neural network processor including the integrated circuit, and neural network device
CN116842307A (en) * 2023-08-28 2023-10-03 腾讯科技(深圳)有限公司 Data processing method, device, equipment, chip and storage medium
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
CN118093018A (en) * 2023-12-19 2024-05-28 北京理工大学 In-memory computing core, in-memory computing method, in-memory processor and processing method
US12216610B2 (en) 2017-07-24 2025-02-04 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090927A1 (en) * 2000-05-19 2001-11-29 Philipson Lars H G Method and device in a convolution process
CN102208005A (en) * 2011-05-30 2011-10-05 华中科技大学 2-dimensional (2-D) convolver
CN104077233A (en) * 2014-06-18 2014-10-01 百度在线网络技术(北京)有限公司 Single-channel convolution layer and multi-channel convolution layer handling method and device
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090927A1 (en) * 2000-05-19 2001-11-29 Philipson Lars H G Method and device in a convolution process
CN102208005A (en) * 2011-05-30 2011-10-05 华中科技大学 2-dimensional (2-D) convolver
CN104077233A (en) * 2014-06-18 2014-10-01 百度在线网络技术(北京)有限公司 Single-channel convolution layer and multi-channel convolution layer handling method and device
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
窦勇等: "支持循环自动流水线的粗粒度可重构阵列体系结构", 《中国科学E辑:信息科学》 *
陆志坚: "基于FPGA的卷积神经网络并行结构研究", 《中国博士学位论文全文数据库,信息科技辑》 *

Cited By (103)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844294A (en) * 2016-12-29 2017-06-13 华为机器有限公司 Convolution operation chip and communication equipment
CN106844294B (en) * 2016-12-29 2019-05-03 华为机器有限公司 Convolution arithmetic chips and communication equipment
CN106775599A (en) * 2017-01-09 2017-05-31 南京工业大学 Multi-computing-unit coarse-grained reconfigurable system and method for recurrent neural network
US11309026B2 (en) 2017-01-25 2022-04-19 Peking University Convolution operation method based on NOR flash array
WO2018137177A1 (en) * 2017-01-25 2018-08-02 北京大学 Method for convolution operation based on nor flash array
US12307355B2 (en) 2017-02-28 2025-05-20 Microsoft Technology Licensing, Llc Neural network processing with chained instructions
CN110383237B (en) * 2017-02-28 2023-05-26 德克萨斯仪器股份有限公司 Reconfigurable matrix multiplier system and method
CN110325963B (en) * 2017-02-28 2023-05-23 微软技术许可有限责任公司 Multifunctional unit for programmable hardware nodes for neural network processing
CN110383237A (en) * 2017-02-28 2019-10-25 德克萨斯仪器股份有限公司 Reconfigurable matrix multiplier system and method
CN110325963A (en) * 2017-02-28 2019-10-11 微软技术许可有限责任公司 Multi-function unit for programmable hardware nodes for neural network processing
US11663450B2 (en) 2017-02-28 2023-05-30 Microsoft Technology Licensing, Llc Neural network processing with chained instructions
CN107229598A (en) * 2017-04-21 2017-10-03 东南大学 A kind of low power consumption voltage towards convolutional neural networks is adjustable convolution computing module
CN107103754A (en) * 2017-05-10 2017-08-29 华南师范大学 A kind of road traffic condition Forecasting Methodology and system
WO2018232615A1 (en) * 2017-06-21 2018-12-27 华为技术有限公司 METHOD AND DEVICE FOR PROCESSING SIGNALS
CN111176727A (en) * 2017-07-20 2020-05-19 上海寒武纪信息科技有限公司 Computing device and computing method
CN111221578A (en) * 2017-07-20 2020-06-02 上海寒武纪信息科技有限公司 Computing device and computing method
CN111176727B (en) * 2017-07-20 2022-05-31 上海寒武纪信息科技有限公司 Computing device and computing method
CN111221578B (en) * 2017-07-20 2022-07-15 上海寒武纪信息科技有限公司 Computing device and computing method
US12216610B2 (en) 2017-07-24 2025-02-04 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
CN111095242A (en) * 2017-07-24 2020-05-01 特斯拉公司 Vector calculation unit
US12086097B2 (en) 2017-07-24 2024-09-10 Tesla, Inc. Vector computational unit
CN111095242B (en) * 2017-07-24 2024-03-22 特斯拉公司 vector calculation unit
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US10928456B2 (en) 2017-08-17 2021-02-23 Samsung Electronics Co., Ltd. Method and apparatus for estimating state of battery
CN107590085B (en) * 2017-08-18 2018-05-29 浙江大学 A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN107590085A (en) * 2017-08-18 2018-01-16 浙江大学 A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN107635138A (en) * 2017-10-19 2018-01-26 珠海格力电器股份有限公司 Image processing apparatus
CN107832262A (en) * 2017-10-19 2018-03-23 珠海格力电器股份有限公司 Convolution operation method and device
CN111291880B (en) * 2017-10-30 2024-05-14 上海寒武纪信息科技有限公司 Computing device and computing method
CN111291880A (en) * 2017-10-30 2020-06-16 上海寒武纪信息科技有限公司 Computing device and computing method
CN109754359A (en) * 2017-11-01 2019-05-14 腾讯科技(深圳)有限公司 A method and system for pooling processing applied to convolutional neural networks
US11734554B2 (en) 2017-11-01 2023-08-22 Tencent Technology (Shenzhen) Company Limited Pooling processing method and system applied to convolutional neural network
US11537857B2 (en) 2017-11-01 2022-12-27 Tencent Technology (Shenzhen) Company Limited Pooling processing method and system applied to convolutional neural network
CN107862650B (en) * 2017-11-29 2021-07-06 中科亿海微电子科技(苏州)有限公司 Method for accelerating calculation of CNN convolution of two-dimensional image
CN107862650A (en) * 2017-11-29 2018-03-30 中科亿海微电子科技(苏州)有限公司 The method of speed-up computation two dimensional image CNN convolution
CN108701015A (en) * 2017-11-30 2018-10-23 深圳市大疆创新科技有限公司 Computing device, chip, device and related method for neural network
CN111465924B (en) * 2017-12-12 2023-11-17 特斯拉公司 System and method for converting matrix input into vectorized input for matrix processor
CN111465924A (en) * 2017-12-12 2020-07-28 特斯拉公司 System and method for converting matrix input to vectorized input of a matrix processor
CN108009126A (en) * 2017-12-15 2018-05-08 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108198125B (en) * 2017-12-29 2021-10-08 深圳云天励飞技术有限公司 Image processing method and device
CN108198125A (en) * 2017-12-29 2018-06-22 深圳云天励飞技术有限公司 A kind of image processing method and device
CN109992541A (en) * 2017-12-29 2019-07-09 深圳云天励飞技术有限公司 A data handling method, related product and computer storage medium
CN108182471A (en) * 2018-01-24 2018-06-19 上海岳芯电子科技有限公司 A kind of convolutional neural networks reasoning accelerator and method
CN108182471B (en) * 2018-01-24 2022-02-15 上海岳芯电子科技有限公司 Convolutional neural network reasoning accelerator and method
CN108241890B (en) * 2018-01-29 2021-11-23 清华大学 Reconfigurable neural network acceleration method and architecture
CN108241890A (en) * 2018-01-29 2018-07-03 清华大学 A reconfigurable neural network acceleration method and architecture
CN108596331A (en) * 2018-04-16 2018-09-28 浙江大学 A kind of optimization method of cell neural network hardware structure
CN108564524A (en) * 2018-04-24 2018-09-21 开放智能机器(上海)有限公司 A kind of convolutional calculation optimization method of visual pattern
CN110413561B (en) * 2018-04-28 2021-03-30 中科寒武纪科技股份有限公司 Data acceleration processing system
CN110413561A (en) * 2018-04-28 2019-11-05 北京中科寒武纪科技有限公司 Data accelerate processing system
CN108595379A (en) * 2018-05-08 2018-09-28 济南浪潮高新科技投资发展有限公司 A kind of parallelization convolution algorithm method and system based on multi-level buffer
CN108665063A (en) * 2018-05-18 2018-10-16 南京大学 Two-way simultaneous for BNN hardware accelerators handles convolution acceleration system
CN108665063B (en) * 2018-05-18 2022-03-18 南京大学 Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator
WO2019231254A1 (en) * 2018-05-30 2019-12-05 Samsung Electronics Co., Ltd. Processor, electronics apparatus and control method thereof
US11244027B2 (en) 2018-05-30 2022-02-08 Samsung Electronics Co., Ltd. Processor, electronics apparatus and control method thereof
CN112204585A (en) * 2018-05-30 2021-01-08 三星电子株式会社 Processor, electronic device and control method thereof
CN112204585B (en) * 2018-05-30 2024-09-17 三星电子株式会社 Processor, electronic device and control method thereof
CN108764182B (en) * 2018-06-01 2020-12-08 阿依瓦(北京)技术有限公司 Optimized acceleration method and device for artificial intelligence
CN108717571A (en) * 2018-06-01 2018-10-30 阿依瓦(北京)技术有限公司 A kind of acceleration method and device for artificial intelligence
CN108717571B (en) * 2018-06-01 2020-09-15 阿依瓦(北京)技术有限公司 Acceleration method and device for artificial intelligence
CN108764182A (en) * 2018-06-01 2018-11-06 阿依瓦(北京)技术有限公司 A kind of acceleration method and device for artificial intelligence of optimization
CN109272112A (en) * 2018-07-03 2019-01-25 北京中科睿芯科技有限公司 A kind of data reusing command mappings method, system and device towards neural network
CN109272112B (en) * 2018-07-03 2021-08-27 北京中科睿芯科技集团有限公司 Data reuse instruction mapping method, system and device for neural network
CN108681984A (en) * 2018-07-26 2018-10-19 珠海市微半导体有限公司 A kind of accelerating circuit of 3*3 convolution algorithms
CN108681984B (en) * 2018-07-26 2023-08-15 珠海一微半导体股份有限公司 Acceleration circuit of 3*3 convolution algorithm
US11694074B2 (en) 2018-09-07 2023-07-04 Samsung Electronics Co., Ltd. Integrated circuit that extracts data, neural network processor including the integrated circuit, and neural network device
US12198053B2 (en) 2018-09-07 2025-01-14 Samsung Electronics Co., Ltd. Integrated circuit that extracts data, neural network processor including the integrated circuit, and neural network device
CN109460813A (en) * 2018-09-10 2019-03-12 中国科学院深圳先进技术研究院 Accelerated method, device, equipment and the storage medium that convolutional neural networks calculate
WO2020051751A1 (en) * 2018-09-10 2020-03-19 中国科学院深圳先进技术研究院 Convolution neural network computing acceleration method and apparatus, device, and storage medium
CN109284475A (en) * 2018-09-20 2019-01-29 郑州云海信息技术有限公司 A matrix convolution calculation module and matrix convolution calculation method
CN109284475B (en) * 2018-09-20 2021-10-29 郑州云海信息技术有限公司 A matrix convolution computing device and matrix convolution computing method
US10733742B2 (en) 2018-09-26 2020-08-04 International Business Machines Corporation Image labeling
US11176427B2 (en) 2018-09-26 2021-11-16 International Business Machines Corporation Overlapping CNN cache reuse in high resolution and streaming-based deep learning inference engines
US12039769B2 (en) 2018-09-26 2024-07-16 International Business Machines Corporation Identifying a type of object in a digital image based on overlapping areas of sub-images
CN109375952A (en) * 2018-09-29 2019-02-22 北京字节跳动网络技术有限公司 Method and apparatus for storing data
CN109375952B (en) * 2018-09-29 2021-01-26 北京字节跳动网络技术有限公司 Method and apparatus for storing data
CN111045958A (en) * 2018-10-11 2020-04-21 展讯通信(上海)有限公司 Acceleration engine and processor
CN111045958B (en) * 2018-10-11 2022-09-16 展讯通信(上海)有限公司 Acceleration engine and processor
WO2020077565A1 (en) * 2018-10-17 2020-04-23 北京比特大陆科技有限公司 Data processing method and apparatus, electronic device, and computer readable storage medium
CN109800867B (en) * 2018-12-17 2020-09-29 北京理工大学 Data calling method based on FPGA off-chip memory
CN109816093B (en) * 2018-12-17 2020-12-04 北京理工大学 A One-way Convolution Implementation Method
CN109816093A (en) * 2018-12-17 2019-05-28 北京理工大学 A One-way Convolution Implementation Method
CN109711533B (en) * 2018-12-20 2023-04-28 西安电子科技大学 FPGA-based Convolutional Neural Network Acceleration System
CN109711533A (en) * 2018-12-20 2019-05-03 西安电子科技大学 FPGA-based convolutional neural network module
CN110069444A (en) * 2019-06-03 2019-07-30 南京宁麒智能计算芯片研究院有限公司 A kind of computing unit, array, module, hardware system and implementation method
WO2021007037A1 (en) * 2019-07-09 2021-01-14 MemryX Inc. Matrix data reuse techniques in processing systems
US12353846B2 (en) 2019-07-09 2025-07-08 MemryX Matrix data reuse techniques in multiply and accumulate units of processing system
US11537535B2 (en) 2019-07-09 2022-12-27 Memryx Incorporated Non-volatile memory based processors and dataflow techniques
CN110377874B (en) * 2019-07-23 2023-05-02 江苏鼎速网络科技有限公司 Convolution operation method and system
CN110377874A (en) * 2019-07-23 2019-10-25 江苏鼎速网络科技有限公司 Convolution algorithm method and system
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN111523642B (en) * 2020-04-10 2023-03-28 星宸科技股份有限公司 Data reuse method, operation method and device and chip for convolution operation
CN111523642A (en) * 2020-04-10 2020-08-11 厦门星宸科技有限公司 Data reuse method, operation method and device, and chip for convolution operation
CN111859797A (en) * 2020-07-14 2020-10-30 Oppo广东移动通信有限公司 A data processing method and device, and storage medium
WO2022179075A1 (en) * 2021-02-26 2022-09-01 成都商汤科技有限公司 Data processing method and apparatus, computer device and storage medium
CN112992248A (en) * 2021-03-12 2021-06-18 西安交通大学深圳研究院 PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register
CN114780910B (en) * 2022-06-16 2022-09-06 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
CN115168284A (en) * 2022-07-06 2022-10-11 中国科学技术大学 Coarse-grained reconfigurable array system and computing method for deep learning
CN116842307B (en) * 2023-08-28 2023-11-28 腾讯科技(深圳)有限公司 Data processing methods, devices, equipment, chips and storage media
CN116842307A (en) * 2023-08-28 2023-10-03 腾讯科技(深圳)有限公司 Data processing method, device, equipment, chip and storage medium
CN118093018A (en) * 2023-12-19 2024-05-28 北京理工大学 In-memory computing core, in-memory computing method, in-memory processor and processing method
CN118093018B (en) * 2023-12-19 2025-04-11 北京理工大学 In-memory computing core, in-memory computing method, in-memory processor and processing method

Similar Documents

Publication Publication Date Title
CN106250103A (en) A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
CN104915322B (en) A kind of hardware-accelerated method of convolutional neural networks
CN107657581B (en) A convolutional neural network CNN hardware accelerator and acceleration method
CN109284817B (en) Deep separable convolutional neural network processing architecture/method/system and medium
CN110516801A (en) A High Throughput Dynamically Reconfigurable Convolutional Neural Network Accelerator Architecture
CN110007961B (en) RISC-V-based edge computing hardware architecture
CN107688853B (en) Device and method for executing neural network operation
CN108537331A (en) A kind of restructural convolutional neural networks accelerating circuit based on asynchronous logic
CN103345461B (en) Based on the polycaryon processor network-on-a-chip with accelerator of FPGA
CN109711533B (en) FPGA-based Convolutional Neural Network Acceleration System
CN108388537B (en) A convolutional neural network acceleration device and method
CN109447241B (en) A Dynamic Reconfigurable Convolutional Neural Network Accelerator Architecture for the Internet of Things
CN110210610A (en) Convolutional calculation accelerator, convolutional calculation method and convolutional calculation equipment
CN101089840A (en) Multi-FPGA-based Parallel Computing System for Matrix Multiplication
CN104899182A (en) Matrix multiplication acceleration method for supporting variable blocks
WO2021115163A1 (en) Neural network processor, chip and electronic device
US11789733B2 (en) Instruction processing apparatus, acceleration unit, and server
CN107403117A (en) Three dimensional convolution device based on FPGA
WO2022001550A1 (en) Address generation method, related device and storage medium
CN111797982A (en) Image processing system based on convolutional neural network
CN112905530A (en) On-chip architecture, pooled computational accelerator array, unit and control method
CN102306371A (en) Hierarchical parallel modular sequence image real-time processing device
CN117632844A (en) Reconfigurable AI algorithm hardware accelerator
CN105955896B (en) A reconfigurable DBF algorithm hardware accelerator and control method
CN111047035B (en) Neural network processor, chip and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161221