Meng et al., 2021 - Google Patents
Dynamap: Dynamic algorithm mapping framework for low latency cnn inferenceMeng et al., 2021
View PDF- Document ID
- 3061403050136301784
- Author
- Meng Y
- Kuppannagari S
- Kannan R
- Prasanna V
- Publication year
- Publication venue
- The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
External Links
Snippet
Most of the existing work on FPGA acceleration of Convolutional Neural Network (CNN) focuses on employing a single strategy (algorithm, dataflow, etc.) across all the layers. Such an approach does not achieve optimal latency on complex and deep CNNs. Emerging …
- 238000005457 optimization 0 abstract description 11
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
- G06F17/505—Logic synthesis, e.g. technology mapping, optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/12—Simultaneous equations, e.g. systems of linear equations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5068—Physical circuit design, e.g. layout for integrated circuits or printed circuit boards
- G06F17/5072—Floorplanning, e.g. partitioning, placement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2217/00—Indexing scheme relating to computer aided design [CAD]
- G06F2217/78—Power analysis and optimization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Meng et al. | Dynamap: Dynamic algorithm mapping framework for low latency cnn inference | |
| Zhang et al. | DNNExplorer: a framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator | |
| US12468924B2 (en) | Parallel computing scheme generation for neural networks | |
| US11347480B2 (en) | Transpose operations using processing element array | |
| Shen et al. | Toward an efficient deep pipelined template-based architecture for accelerating the entire 2-D and 3-D CNNs on FPGA | |
| Liu et al. | WinoCNN: Kernel sharing Winograd systolic array for efficient convolutional neural network acceleration on FPGAs | |
| Kästner et al. | Hardware/software codesign for convolutional neural networks exploiting dynamic partial reconfiguration on PYNQ | |
| US20220253683A1 (en) | Implementing Fully-Connected Neural-Network Layers in Hardware | |
| Lee et al. | NP-CGRA: Extending CGRAs for efficient processing of light-weight deep neural networks | |
| Wang et al. | Systolic cube: A spatial 3D CNN accelerator architecture for low power video analysis | |
| Yang et al. | Aim: Accelerating arbitrary-precision integer multiplication on heterogeneous reconfigurable computing platform versal acap | |
| US11488066B2 (en) | Efficient convolution of multi-channel input samples with multiple kernels | |
| Chen et al. | Exploiting on-chip heterogeneity of versal architecture for gnn inference acceleration | |
| Lee et al. | Specializing CGRAs for light-weight convolutional neural networks | |
| Zhang et al. | Low-latency mini-batch gnn inference on cpu-fpga heterogeneous platform | |
| Liang et al. | FCNNLib: A flexible convolution algorithm library for deep learning on FPGAs | |
| Roorda et al. | FPGA architecture exploration for DNN acceleration | |
| Arredondo-Velazquez et al. | A streaming architecture for Convolutional Neural Networks based on layer operations chaining | |
| Meng et al. | PPOAccel: A high-throughput acceleration framework for proximal policy optimization | |
| Hamdi et al. | MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices | |
| Meng et al. | How to avoid zero-spacing in fractionally-strided convolution? a hardware-algorithm co-design methodology | |
| Meng et al. | Accelerator design and exploration for deformable convolution networks | |
| Zhang et al. | VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision Tasks | |
| Chen et al. | High throughput and low bandwidth demand: Accelerating CNN inference block-by-block on FPGAs | |
| Chowdhury et al. | Demystifying the 7-D Convolution Loop Nest for Data and Instruction Streaming in Reconfigurable AI Accelerators |