[go: up one dir, main page]

Stock et al., 2009 - Google Patents

A fast GPU implementation for solving sparse ill-posed linear equation systems

Stock et al., 2009

View PDF
Document ID
1235304996248000737
Author
Stock F
Koch A
Publication year
Publication venue
International Conference on Parallel Processing and Applied Mathematics

External Links

Snippet

Image reconstruction, a very compute-intense process in general, can often be reduced to large linear equation systems represented as sparse under-determined matrices. Solvers for these equation systems (not restricted to image reconstruction) spend most of their time in …
Continue reading at www.esa.informatik.tu-darmstadt.de (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/80Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication Publication Date Title
Monakov et al. Automatically tuning sparse matrix-vector multiplication for GPU architectures
Ashari et al. On optimizing machine learning workloads via kernel fusion
US8364739B2 (en) Sparse matrix-vector multiplication on graphics processor units
Giles Efficient sparse matrix-vector multiplication on cache-based GPUs
Daga et al. Structural agnostic SpMV: Adapting CSR-adaptive for irregular matrices
US12412068B2 (en) Power-efficient hybrid traversal apparatus and method for convolutional neural network accelerator architecture
You et al. Mic-svm: Designing a highly efficient support vector machine for advanced modern multi-core and many-core architectures
Yi et al. CUDAMicroBench: Microbenchmarks to assist CUDA performance programming
Lin et al. GCN inference acceleration using high-level synthesis
Jiang et al. GLARE: Accelerating Sparse DNN Inference Kernels with Global Memory Access Reduction
Chen et al. tpSpMV: A two-phase large-scale sparse matrix-vector multiplication kernel for manycore architectures
US20240127056A1 (en) Computational storage for an energy-efficient deep neural network training system
Krishnan et al. Multi-stage memory efficient strassen's matrix multiplication on GPU
Walden et al. Memory Optimizations for Sparse Linear Algebra on GPU Hardware
Bylina et al. Performance analysis of multicore and multinodal implementation of SpMV operation
Limonova et al. Special aspects of matrix operation implementations for low-precision neural network model on the elbrus platform
Stock et al. A fast GPU implementation for solving sparse ill-posed linear equation systems
Zhai et al. Lit: A high performance massive data computing framework based on CPU/GPU cluster
US20230325464A1 (en) Hpc framework for accelerating sparse cholesky factorization on fpgas
Hupca et al. Spherical harmonic transform with GPUs
Nisa et al. Optimizing irregular dense operators of heterogeneous gnn models on gpu
Zhang et al. Implementing sparse matrix-vector multiplication with QCSR on GPU
Wozniak et al. Parallel implementation of conjugate gradient method on graphics processors
Popescu et al. Python-based programming framework for a heterogeneous MapReduce architecture
Favaro et al. Evaluation of dense and sparse linear algebra kernels in FPGAs