Stock et al., 2009 - Google Patents

A fast GPU implementation for solving sparse ill-posed linear equation systems

Stock et al., 2009

Document ID: 1235304996248000737
Author: Stock F; Koch A
Publication year: 2009
Publication venue: International Conference on Parallel Processing and Applied Mathematics

External Links

Cited by

Snippet

Image reconstruction, a very compute-intense process in general, can often be reduced to large linear equation systems represented as sparse under-determined matrices. Solvers for these equation systems (not restricted to image reconstruction) spend most of their time in …

Continue reading at www.esa.informatik.tu-darmstadt.de (PDF) (other versions)

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication	Publication Date	Title
Monakov et al.	2010	Automatically tuning sparse matrix-vector multiplication for GPU architectures
Ashari et al.	2015	On optimizing machine learning workloads via kernel fusion
US8364739B2 (en)	2013-01-29	Sparse matrix-vector multiplication on graphics processor units
Giles	2012	Efficient sparse matrix-vector multiplication on cache-based GPUs
Daga et al.	2015	Structural agnostic SpMV: Adapting CSR-adaptive for irregular matrices
US12412068B2 (en)	2025-09-09	Power-efficient hybrid traversal apparatus and method for convolutional neural network accelerator architecture
You et al.	2014	Mic-svm: Designing a highly efficient support vector machine for advanced modern multi-core and many-core architectures
Yi et al.	2021	CUDAMicroBench: Microbenchmarks to assist CUDA performance programming
Lin et al.	2021	GCN inference acceleration using high-level synthesis
Jiang et al.	2023	GLARE: Accelerating Sparse DNN Inference Kernels with Global Memory Access Reduction
Chen et al.	2020	tpSpMV: A two-phase large-scale sparse matrix-vector multiplication kernel for manycore architectures
US20240127056A1 (en)	2024-04-18	Computational storage for an energy-efficient deep neural network training system
Krishnan et al.	2021	Multi-stage memory efficient strassen's matrix multiplication on GPU
Walden et al.	2021	Memory Optimizations for Sparse Linear Algebra on GPU Hardware
Bylina et al.	2014	Performance analysis of multicore and multinodal implementation of SpMV operation
Limonova et al.	2020	Special aspects of matrix operation implementations for low-precision neural network model on the elbrus platform
Stock et al.	2009	A fast GPU implementation for solving sparse ill-posed linear equation systems
Zhai et al.	2013	Lit: A high performance massive data computing framework based on CPU/GPU cluster
US20230325464A1 (en)	2023-10-12	Hpc framework for accelerating sparse cholesky factorization on fpgas
Hupca et al.	2011	Spherical harmonic transform with GPUs
Nisa et al.	2023	Optimizing irregular dense operators of heterogeneous gnn models on gpu
Zhang et al.	2013	Implementing sparse matrix-vector multiplication with QCSR on GPU
Wozniak et al.	2009	Parallel implementation of conjugate gradient method on graphics processors
Popescu et al.	2022	Python-based programming framework for a heterogeneous MapReduce architecture
Favaro et al.	2024	Evaluation of dense and sparse linear algebra kernels in FPGAs