WO2019136751A1 - Procédé et appareil de traitement parallèle d'intelligence artificielle, support d'informations lisible par ordinateur et terminal - Google Patents
Procédé et appareil de traitement parallèle d'intelligence artificielle, support d'informations lisible par ordinateur et terminal Download PDFInfo
- Publication number
- WO2019136751A1 WO2019136751A1 PCT/CN2018/072663 CN2018072663W WO2019136751A1 WO 2019136751 A1 WO2019136751 A1 WO 2019136751A1 CN 2018072663 W CN2018072663 W CN 2018072663W WO 2019136751 A1 WO2019136751 A1 WO 2019136751A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- module
- artificial intelligence
- storage module
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- the present invention relates to the field of artificial intelligence, and in particular to an artificial intelligence parallel processing method, apparatus, readable storage medium, and terminal.
- AI Artificial Intelligence
- the artificial intelligence algorithm is a neural network model algorithm that simulates the human brain. Its computational complexity is very large. AlphaGo, which also uses artificial intelligence algorithms, requires thousands of traditional processors (CPUs) and hundreds of graphics processors (GPUs). It is clear that today, as artificial intelligence ushers in a new wave of revival, traditional processors are becoming a bottleneck that hinders the spread of artificial intelligence.
- the object of the present invention is to provide an artificial intelligence parallel processing method and an artificial intelligence processing device for solving the technical problems such as insufficient parallelism of the artificial intelligence algorithm processing in the prior art.
- the present invention provides an artificial intelligence parallel processing method, which is applied to a processing module, and the method includes: causing a data transmission module to take out multiple channel data from an external storage module according to a preset data size; And causing the data transmission module to transmit the channel data extracted according to the preset data size to the convolution operation module; wherein the convolution operation module includes a plurality of convolution kernel matrices for parallelizing the channel data Convolution operation.
- the data transmission module is configured to take out the plurality of channel data from the external storage module according to the preset data size, and specifically includes: using each of the channel data according to a 1*1 data size.
- the external storage module is taken out to the first storage module; each of the channel data is taken out from the first storage module to the second storage module according to a pv*1 data size; wherein, pv is a data transmission parallelism,
- the number of columns of channel data is an integer multiple of pv; each channel data is extracted from the second storage module to the matrix module according to a pv*k data size; wherein k is a size of the convolution kernel matrix;
- Each of the channel data is fetched from the matrix module in accordance with a pv*k*k data size to perform a parallel convolution operation with the plurality of convolution kernel matrices.
- extracting each of the channel data from the second storage module to the matrix module according to a pv*k data size specifically, including: causing the channel data to perform a set of data per k
- the data transmission module sequentially performs the following operations on each group of data: in each clock cycle, the first to-be-processed data of the data size pv*k is sequentially taken out from the group of data until all the data of the group is taken out.
- each of the channel data is taken out from the matrix module according to a data size of pv*k*k, and specifically includes: a second location taken out from each group of data Starting with the first to-be-processed data, each of the first to-be-processed data is combined with the last two columns of the previous first to-be-processed data to form a second to-be-processed data of (pv+2)*k data size; Each of the second to-be-processed data is matrix-extracted with a step size of 1, to obtain pv k*k third to-be-processed data; wherein each of the third to-be-processed data is used for convolution with the plurality of The kernel matrix performs parallel convolution operations.
- the plurality of convolution kernel matrices includes a plurality of weight matrices with different weights, and respectively perform convolution operations simultaneously with the third to-be-processed data.
- an artificial intelligence parallel processing apparatus including: an external storage module that stores a plurality of channel data; a processing module that communicatively connects the external storage module; and a data transmission module Extracting and transmitting the plurality of channel data from the external storage module according to a preset data size; the convolution operation module includes a plurality of convolution kernel matrices for paralleling the channel data taken according to the preset data size Convolution operation.
- the artificial intelligence parallel processing device includes a first storage module for storing the channel data from the external storage module.
- the artificial intelligence parallel processing device includes a second storage module for storing the channel data from the first storage module.
- the artificial intelligence parallel processing device includes a matrix module for storing the channel data from the second storage module.
- the present invention provides a computer readable storage medium having stored thereon a computer program that implements the artificial intelligence parallel processing method when executed by a processor.
- an artificial intelligence processing terminal including: a processor and a memory; the memory is for storing a computer program, and the processor is configured to execute the computer program of the memory storage, And causing the terminal to execute the artificial intelligence parallel processing method.
- the artificial intelligence parallel processing method, apparatus, readable storage medium, and terminal of the present invention have the following advantageous effects: the present invention does not need to wait for the convolution operation of a convolution kernel matrix to complete the next convolution kernel
- the convolution operation of the matrix, and the present invention realizes the parallel convolution operation by hardware devices such as a convolution operation circuit, especially in the face of a large amount of data calculation, and the convolution operation efficiency is greatly improved compared with the software calculation. Therefore, the present invention greatly improves the processing parallelism and improves the calculation efficiency by the artificial intelligence parallel processing method.
- FIG. 1 is a flow chart showing a method for parallel processing of artificial intelligence according to an embodiment of the present invention.
- FIG. 2 is a schematic diagram showing a data matrix to be processed in an embodiment of the present invention.
- FIG. 3 is a schematic diagram showing data to be processed by a data transmission module according to an embodiment of the present invention.
- FIG. 4 is a schematic diagram showing data to be processed by a data transmission module according to an embodiment of the present invention.
- FIG. 5 is a schematic diagram showing an artificial intelligence parallel processing apparatus according to an embodiment of the present invention.
- the artificial intelligence parallel processing method is applied to a processing module, and the processing module may be, for example, an ARM module, an MCU module, or a Soc module or the like.
- the artificial intelligence parallel processing method specifically includes:
- the data transmission module is configured to take out multiple channel data from the external storage module according to a preset data size.
- the data transmission module can transmit data by means of DMA.
- the DMA is called Direct Memory Access, which is a direct memory access, and is used for data transmission between the external memory and the Programmable Logic terminal.
- DMA transfer is a high-speed data transfer operation that allows direct read and write operations between external devices and memory without the need for CPU intervention.
- the external storage module may be, for example, a DDR memory, and is disposed outside the Programmable Logic terminal for storing a plurality of channel data.
- the channel data is data to be processed, and is usually stored in a memory in the form of a data matrix.
- the data transmission module is configured to transmit the extracted channel data to a convolution operation module for parallel convolution operation with multiple convolution kernel matrices.
- the convolution operation module is a convolution operation circuit, and may be a circuit composed of a multiplier and an adder.
- the convolution operation module includes a plurality of convolution kernel matrices, and each of the convolution kernel matrices has different weights.
- the image has three channel data of R, G, and B, that is, three two-dimensional matrices, each of which has a length and width set to K*K, assuming that K is an odd number 3; further, assuming the data transmission
- the module fetches the channel data according to the data size of the 8*3*3 matrix, that is, the data transmission module takes out 8 3*3 matrices at a time.
- the three two-dimensional matrices of R, G, and B are not subjected to parallel convolution operations, it is necessary to undergo three consecutive calculations to complete the calculation, which is time-consuming and computationally inefficient.
- the three two-dimensional matrices of R, G, and B are convoluted in parallel with the eight 3*3 matrices so that each set of 8 3*3 matrices obtains 8*3. Convolution result value.
- the present invention does not need to wait for the convolution operation of a convolution kernel matrix to complete the convolution operation of the next convolution kernel matrix, and the present invention implements a parallel convolution operation by a hardware device such as a convolution operation circuit, especially in the face of a large amount of
- the data calculation greatly improves the efficiency of convolution calculations compared to software calculations. Therefore, the present invention greatly improves the processing parallelism and improves the calculation efficiency by the artificial intelligence parallel processing method.
- the data transmission module is taken out from the external storage module to the first storage module according to a 1*1 data size.
- the first storage module may be a RAM or a ROM memory, such as three generations, four generations of DDR SDRAM, or the like.
- FIG. 2 a schematic diagram of channel data in an embodiment of the present invention is shown.
- the data transmission module is taken out from the first storage module to the second storage module according to a pv*1 data size.
- pv is a data transmission parallelism, and is used to indicate the number of columns of the data transmission module to be processed each time, and the size thereof is associated with the efficiency of the artificial intelligence parallel processing method; the number of columns of the channel data is an integer of pv Times.
- the schematic diagram of the transmission module extracting channel data according to the 8*1 data size is described below in conjunction with a specific illustration.
- FIG. 3 a schematic diagram of the data transmission module taking out channel data in an embodiment of the present invention is shown.
- the data transmission module starts from the leftmost side of the first row of data to be processed, and takes out 8*1 data each time until all the data to be processed in the first row is taken out. Based on the same principle, the data transmission module continues to take the second row, the third row... until the entire 34*40 matrix is taken out.
- the data transmission module After the data transmission module stores the 34*40 matrix in the first storage module, and according to the pv*k data size, k is the size of the convolution kernel matrix, and the convolution kernel matrix is A weight matrix for a convolution operation; the convolution kernel matrix may be set to an odd-order matrix, and in the present embodiment, the convolution kernel matrix is set to a 3*3 matrix. That is, the data transmission module takes out the 34*40 matrix from the second storage module in batches and puts it into the matrix module in an 8*3 matrix for data combination.
- the data transmission module sequentially extracts 8*3 matrices from the first three rows of the 34*40 matrix in order from left to right in each clock cycle. That is, a total of five 8*3 matrices can be taken out in the first three rows. Based on the same principle as described above, the data transmission module continues to fetch the pending data of the subsequent row after the first three rows are taken.
- the rectangular dotted frame R1 R R5 in FIG. 2 represents a total of five 8*3 matrices in the first three rows.
- FIG. 4 a schematic diagram of data removal by the data transmission module in an embodiment of the present invention is shown.
- the first 8*3 matrix M1 taken out by the data transmission module from the second storage module is generally used to improve the pipeline of the artificial intelligence calculation, and the first one is taken out of each row.
- the 8*3 matrix can only obtain convolution result values less than 8 by convolution operation. Therefore, the first 8*3 matrix extracted per line is set as invalid data to improve the pipeline operation degree of artificial intelligence processing.
- the convolution result of the 8*3 matrix M1 is an invalid value.
- the data transmission module takes out a second 8*3 matrix M2, and the 8*3 matrix M2 and the last two columns of the 8*3 matrix M1 are combined into a 10*3 matrix M12.
- a line L1 is used to represent matrix data combined with each other.
- the data matrix M2 is combined with the last two columns of the data matrix M1 to obtain a data matrix M12 of (pv+2), that is, 10 columns.
- the 10*3 matrix M12 can perform matrix extraction according to the step size 1, thereby obtaining eight 3*3 matrices.
- the rectangular dotted frame R6 takes the matrix covered in FIG. 4 as a starting position, moves to the right column by column according to the step size 1, and obtains a matrix of size 3*3 for each column moved.
- the rectangular dashed box R6 can be moved a total of 7 times in the 10*3 matrix M12, for a total of 8 3*3 matrices, that is, pv k*k matrices.
- the eight 3*3 matrices are used for transmission to the convolution operation module to perform parallel convolution operations with the three 3*3 convolution kernel matrices respectively, thereby obtaining 3*8 calculation result values.
- the data transmission module takes out a third 8*3 matrix M3, and the 8*3 matrix M3 and the last two columns of the 8*3 matrix M2 are combined into 10* 3 matrix M23, in which the line L2 represents matrix data combined with each other.
- the data matrix M3 is combined with the last two columns of the data matrix M2 to obtain a data matrix M23 having a column number of 10.
- the 10*3 matrix M23 can perform matrix extraction according to the step size 1 to obtain 8 3*3 matrices; the 8 3*3 fifth to-be-processed data matrices are used for transmission to the convolution operation module, A convolution operation is performed with three of the 3*3 convolution kernel matrices and 3*8 calculation result values are obtained.
- the data transmission module is based on the same principle, and can process data processing of the entire 34*40 matrix after a plurality of clock cycles.
- an artificial intelligence parallel processing apparatus includes: a first storage module 51, a second storage module 52, a data transmission module 53, a processing module 54, and a matrix module 55.
- the first storage module 51, the second storage module 52, the data transmission module 53, the matrix module 55, and the convolution operation module 56 are collectively disposed on the Programmable Logic terminal 50 of the FPGA, which is generally referred to as a PL terminal.
- the data transmission module is specifically configured to transmit the channel data from the external storage module 57 to the first storage module 51 according to the 1*1 data size through the system bus, and then take out the first storage module 51 and follow the pv*1 data.
- the size is transferred to the second storage module 52, and is taken out from the second storage module 52 and transmitted to the matrix module according to the pv*k data size, and then taken out from the matrix module and transmitted to the pv*k 2 data size to Convolution operation module 56.
- the convolution operation module 56 is provided with a plurality of convolution kernel matrices for parallel convolution operations.
- the plurality of convolution kernel matrices are specifically: a convolution kernel matrix 1, a convolution kernel matrix 2, ..., a convolution kernel matrix n.
- the first storage module 51 may be, for example, a BRAM memory, that is, a block RAM, which is a RAM storage resource of an FPGA (Field-Programmable Gate Array) field programmable gate array.
- the processing module 54 can be, for example, an ARM module, an MCU module, or a Soc module, and the like.
- the implementation manner of the artificial intelligence processing device is similar to the implementation manner of the artificial intelligence parallel processing method, and therefore will not be described again. Those skilled in the art should be able to understand the artificial intelligence processing based on the artificial intelligence parallel processing method. The principle and implementation of the device.
- the aforementioned computer program can be stored in a computer readable storage medium.
- the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
- the present invention also provides an artificial intelligence processing terminal, comprising: a processor and a memory; the memory is for storing a computer program, the processor is configured to execute the computer program stored by the memory, so that the terminal performs the manual Intelligent parallel processing method.
- the above memory may include random access memory (RAM), and may also include non-volatile memory, such as at least one disk storage.
- RAM random access memory
- non-volatile memory such as at least one disk storage.
- the above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP for short), and the like; or a digital signal processor (DSP), an application specific integrated circuit (DSP). ApplicationSpecificIntegratedCircuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- CPU central processing unit
- NP Network Processor
- DSP digital signal processor
- DSP application specific integrated circuit
- ASIC ApplicationSpecificIntegratedCircuit
- FPGA Field-Programmable Gate Array
- the present invention does not need to wait for the convolution operation of a convolution kernel matrix to complete the convolution operation of the next convolution kernel matrix, and the present invention implements a parallel convolution operation by a hardware device such as a convolution operation circuit, especially in the face of a large amount of
- the data calculation greatly improves the efficiency of convolution calculations compared to software calculations. Therefore, the present invention greatly improves the processing parallelism and improves the calculation efficiency by the artificial intelligence parallel processing method. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
- Multi Processors (AREA)
Abstract
L'invention concerne un procédé de traitement parallèle d'intelligence artificielle, destiné à être utilisé dans un module de traitement (54), le procédé consistant : à amener un module de transmission de données à extraire une pluralité de données de canal à partir d'un module de stockage externe, selon une taille de données prédéfinie (S101) ; à amener le module de transmission de données à transmettre les données de canal extraites vers un module de convolution, pour mettre en œuvre des opérations parallèles de convolution avec une pluralité de matrices de noyau de convolution (S102). Le présent procédé n'a pas besoin d'attendre l'opération de convolution d'une matrice de noyau de convolution pour finir de mettre en œuvre l'opération de convolution de la matrice de noyau de convolution suivante et met en œuvre des opérations parallèles de convolution au moyen d'un dispositif matériel tel qu'un circuit d'opération de convolution et, en particulier lorsqu'elle est confrontée à une grande quantité de calculs de données, améliore considérablement l'efficacité d'opérations de convolution par rapport à un calcul de logiciel. Ainsi, le parallélisme de traitement est considérablement amélioré et l'efficacité de calcul est améliorée au moyen du présent procédé de traitement parallèle d'intelligence artificielle.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201880002151.7A CN109416755B (zh) | 2018-01-15 | 2018-01-15 | 人工智能并行处理方法、装置、可读存储介质、及终端 |
| PCT/CN2018/072663 WO2019136751A1 (fr) | 2018-01-15 | 2018-01-15 | Procédé et appareil de traitement parallèle d'intelligence artificielle, support d'informations lisible par ordinateur et terminal |
| US16/929,819 US11874898B2 (en) | 2018-01-15 | 2020-07-15 | Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2018/072663 WO2019136751A1 (fr) | 2018-01-15 | 2018-01-15 | Procédé et appareil de traitement parallèle d'intelligence artificielle, support d'informations lisible par ordinateur et terminal |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/072665 Continuation-In-Part WO2019136752A1 (fr) | 2018-01-15 | 2018-01-15 | Procédé et dispositif de traitement de convolution d'intelligence artificielle, support de stockage et terminal |
Related Child Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/072665 Continuation-In-Part WO2019136752A1 (fr) | 2018-01-15 | 2018-01-15 | Procédé et dispositif de traitement de convolution d'intelligence artificielle, support de stockage et terminal |
| US16/929,819 Continuation-In-Part US11874898B2 (en) | 2018-01-15 | 2020-07-15 | Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019136751A1 true WO2019136751A1 (fr) | 2019-07-18 |
Family
ID=65462117
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/072663 Ceased WO2019136751A1 (fr) | 2018-01-15 | 2018-01-15 | Procédé et appareil de traitement parallèle d'intelligence artificielle, support d'informations lisible par ordinateur et terminal |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN109416755B (fr) |
| WO (1) | WO2019136751A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112132275A (zh) * | 2020-09-30 | 2020-12-25 | 南京风兴科技有限公司 | 一种并行计算方法及装置 |
| CN112306949A (zh) * | 2019-07-31 | 2021-02-02 | 中科寒武纪科技股份有限公司 | 数据处理方法及装置以及相关产品 |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110298441B (zh) * | 2019-05-24 | 2022-01-11 | 深圳云天励飞技术有限公司 | 一种数据处理方法、电子装置及计算机可读存储介质 |
| CN110928216B (zh) * | 2019-11-14 | 2020-12-15 | 深圳云天励飞技术有限公司 | 人工智能装置 |
| CN113705795B (zh) * | 2021-09-16 | 2024-12-17 | 深圳思谋信息科技有限公司 | 卷积处理方法、装置、卷积神经网络加速器和存储介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090132794A1 (en) * | 2007-11-16 | 2009-05-21 | Paul Michael Ebert | Method and apparatus for performing complex calculations in a multiprocessor array |
| CN106530210A (zh) * | 2016-10-31 | 2017-03-22 | 北京大学 | 基于阻变存储器件阵列实现并行卷积计算的设备和方法 |
| CN106875012A (zh) * | 2017-02-09 | 2017-06-20 | 武汉魅瞳科技有限公司 | 一种基于fpga的深度卷积神经网络的流水化加速系统 |
| CN106909970A (zh) * | 2017-01-12 | 2017-06-30 | 南京大学 | 一种基于近似计算的二值权重卷积神经网络硬件加速器计算模块 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160328644A1 (en) * | 2015-05-08 | 2016-11-10 | Qualcomm Incorporated | Adaptive selection of artificial neural networks |
| CN106228238B (zh) * | 2016-07-27 | 2019-03-22 | 中国科学技术大学苏州研究院 | 现场可编程门阵列平台上加速深度学习算法的方法和系统 |
| CN106845635A (zh) * | 2017-01-24 | 2017-06-13 | 东南大学 | 基于级联形式的cnn卷积核硬件设计方法 |
| CN106951395B (zh) * | 2017-02-13 | 2018-08-17 | 上海客鹭信息技术有限公司 | 面向压缩卷积神经网络的并行卷积运算方法及装置 |
| CN106970896B (zh) * | 2017-03-30 | 2020-05-12 | 中国人民解放军国防科学技术大学 | 面向向量处理器的二维矩阵卷积的向量化实现方法 |
-
2018
- 2018-01-15 WO PCT/CN2018/072663 patent/WO2019136751A1/fr not_active Ceased
- 2018-01-15 CN CN201880002151.7A patent/CN109416755B/zh active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090132794A1 (en) * | 2007-11-16 | 2009-05-21 | Paul Michael Ebert | Method and apparatus for performing complex calculations in a multiprocessor array |
| CN106530210A (zh) * | 2016-10-31 | 2017-03-22 | 北京大学 | 基于阻变存储器件阵列实现并行卷积计算的设备和方法 |
| CN106909970A (zh) * | 2017-01-12 | 2017-06-30 | 南京大学 | 一种基于近似计算的二值权重卷积神经网络硬件加速器计算模块 |
| CN106875012A (zh) * | 2017-02-09 | 2017-06-20 | 武汉魅瞳科技有限公司 | 一种基于fpga的深度卷积神经网络的流水化加速系统 |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112306949A (zh) * | 2019-07-31 | 2021-02-02 | 中科寒武纪科技股份有限公司 | 数据处理方法及装置以及相关产品 |
| CN112306949B (zh) * | 2019-07-31 | 2022-11-01 | 中科寒武纪科技股份有限公司 | 数据处理方法及装置以及相关产品 |
| CN112132275A (zh) * | 2020-09-30 | 2020-12-25 | 南京风兴科技有限公司 | 一种并行计算方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109416755B (zh) | 2021-11-23 |
| CN109416755A (zh) | 2019-03-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109992743B (zh) | 矩阵乘法器 | |
| CN106445471B (zh) | 处理器和用于在处理器上执行矩阵乘运算的方法 | |
| CN112214726B (zh) | 运算加速器 | |
| WO2019136751A1 (fr) | Procédé et appareil de traitement parallèle d'intelligence artificielle, support d'informations lisible par ordinateur et terminal | |
| CN108388537B (zh) | 一种卷积神经网络加速装置和方法 | |
| US11550586B2 (en) | Method and tensor traversal engine for strided memory access during execution of neural networks | |
| CN108090565A (zh) | 一种卷积神经网络并行化训练加速方法 | |
| WO2017185389A1 (fr) | Dispositif et procédé servant à exécuter des opérations de multiplication de matrices | |
| WO2018107383A1 (fr) | Procédé et dispositif de calcul de convolution d'un réseau de neurones artificiels, et support d'enregistrement lisible par ordinateur | |
| CN102053948A (zh) | 在单指令多数据多核处理器架构上转置矩阵的方法和系统 | |
| WO2019136764A1 (fr) | Convoluteur et dispositif de traitement intelligent artificiel appliqué à celui-ci | |
| WO2017185393A1 (fr) | Appareil et procédé d'exécution d'une opération de produit interne de vecteurs | |
| CN108388527A (zh) | 直接存储器存取引擎及其方法 | |
| CN111353575A (zh) | 用于卷积神经网络的图块化格式 | |
| CN110929854B (zh) | 一种数据处理方法、装置及硬件加速器 | |
| CN114995782B (zh) | 数据处理方法、装置、设备和可读存储介质 | |
| WO2019136750A1 (fr) | Dispositif et procédé de traitement assisté par ordinateur basé sur l'intelligence artificielle, support de stockage, et terminal | |
| CN109313723B (zh) | 人工智能卷积处理方法、装置、可读存储介质、及终端 | |
| WO2021083101A1 (fr) | Procédé et appareil de traitement de données, et produit connexe | |
| KR20210014561A (ko) | 다수 컨벌루션 윈도우 중의 이미지 데이터를 추출하는 방법, 장치, 기기 및 컴퓨터 판독 가능한 저장매체 | |
| US11409840B2 (en) | Dynamically adaptable arrays for vector and matrix operations | |
| WO2020103883A1 (fr) | Procédé d'exécution de multiplication de matrice, circuit et soc | |
| CN110837483A (zh) | 张量维度变换的方法以及装置 | |
| US11874898B2 (en) | Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal | |
| CN111047021A (zh) | 一种计算装置及相关产品 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18899322 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.11.2020) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18899322 Country of ref document: EP Kind code of ref document: A1 |