Stock et al., 2009 - Google Patents
A fast GPU implementation for solving sparse ill-posed linear equation systemsStock et al., 2009
View PDF- Document ID
- 1235304996248000737
- Author
- Stock F
- Koch A
- Publication year
- Publication venue
- International Conference on Parallel Processing and Applied Mathematics
External Links
Snippet
Image reconstruction, a very compute-intense process in general, can often be reduced to large linear equation systems represented as sparse under-determined matrices. Solvers for these equation systems (not restricted to image reconstruction) spend most of their time in …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Monakov et al. | Automatically tuning sparse matrix-vector multiplication for GPU architectures | |
| Ashari et al. | On optimizing machine learning workloads via kernel fusion | |
| US8364739B2 (en) | Sparse matrix-vector multiplication on graphics processor units | |
| Giles | Efficient sparse matrix-vector multiplication on cache-based GPUs | |
| Daga et al. | Structural agnostic SpMV: Adapting CSR-adaptive for irregular matrices | |
| US12412068B2 (en) | Power-efficient hybrid traversal apparatus and method for convolutional neural network accelerator architecture | |
| You et al. | Mic-svm: Designing a highly efficient support vector machine for advanced modern multi-core and many-core architectures | |
| Yi et al. | CUDAMicroBench: Microbenchmarks to assist CUDA performance programming | |
| Lin et al. | GCN inference acceleration using high-level synthesis | |
| Jiang et al. | GLARE: Accelerating Sparse DNN Inference Kernels with Global Memory Access Reduction | |
| Chen et al. | tpSpMV: A two-phase large-scale sparse matrix-vector multiplication kernel for manycore architectures | |
| US20240127056A1 (en) | Computational storage for an energy-efficient deep neural network training system | |
| Krishnan et al. | Multi-stage memory efficient strassen's matrix multiplication on GPU | |
| Walden et al. | Memory Optimizations for Sparse Linear Algebra on GPU Hardware | |
| Bylina et al. | Performance analysis of multicore and multinodal implementation of SpMV operation | |
| Limonova et al. | Special aspects of matrix operation implementations for low-precision neural network model on the elbrus platform | |
| Stock et al. | A fast GPU implementation for solving sparse ill-posed linear equation systems | |
| Zhai et al. | Lit: A high performance massive data computing framework based on CPU/GPU cluster | |
| US20230325464A1 (en) | Hpc framework for accelerating sparse cholesky factorization on fpgas | |
| Hupca et al. | Spherical harmonic transform with GPUs | |
| Nisa et al. | Optimizing irregular dense operators of heterogeneous gnn models on gpu | |
| Zhang et al. | Implementing sparse matrix-vector multiplication with QCSR on GPU | |
| Wozniak et al. | Parallel implementation of conjugate gradient method on graphics processors | |
| Popescu et al. | Python-based programming framework for a heterogeneous MapReduce architecture | |
| Favaro et al. | Evaluation of dense and sparse linear algebra kernels in FPGAs |