CN106681960A - Acceleration method for solution of linear equation set with Monte Carlo method based on shared memory - Google Patents
Acceleration method for solution of linear equation set with Monte Carlo method based on shared memory Download PDFInfo
- Publication number
- CN106681960A CN106681960A CN201710005292.1A CN201710005292A CN106681960A CN 106681960 A CN106681960 A CN 106681960A CN 201710005292 A CN201710005292 A CN 201710005292A CN 106681960 A CN106681960 A CN 106681960A
- Authority
- CN
- China
- Prior art keywords
- quasi
- shared memory
- random
- vector
- monte carlo
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/12—Simultaneous equations, e.g. systems of linear equations
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Operations Research (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Complex Calculations (AREA)
Abstract
Description
技术领域technical field
本发明涉及数值计算领域,更具体地,涉及一种基于共享内存的蒙特卡罗方法求解线性方程组的加速方法。The invention relates to the field of numerical calculation, and more specifically, relates to an acceleration method for solving linear equations based on a shared memory Monte Carlo method.
背景技术Background technique
对于大规模科学和工程计算来说,因为许多问题最后都可归结为线性方程组,故大规模稀疏线性方程组的求解方法是其核心问题之一,具有重要的理论意义和实际应用价值。蒙特卡罗方法是一种以概率统计理论为基础的数值计算方法,通过利用随机数来解决许多的实际问题。自蒙特卡罗方法于20世纪50年代被提出用于求解线性方程组以来,已经有了许多改进,并且被广泛应用于金融工程、计算物理学等领域。For large-scale scientific and engineering computing, because many problems can be attributed to linear equations, the solution method of large-scale sparse linear equations is one of the core issues, which has important theoretical significance and practical application value. The Monte Carlo method is a numerical calculation method based on the theory of probability and statistics, which can solve many practical problems by using random numbers. Since the Monte Carlo method was proposed to solve linear equations in the 1950s, many improvements have been made and it has been widely used in financial engineering, computational physics, and other fields.
目前对于线性方程组主要有以下几种方法:(1)直接法,它是直接通过一定的算术运算后得到线性方程组的真实解的一种方法,例如高斯消去法和克莱姆法;(2)迭代法,是从一个初始估计出发,在求解过程中通过迭代的方法去逐步逼近真实解的一种方法,例如雅克比迭代法、高斯-赛德尔迭代法;(3)蒙特卡罗方法,是一种基于雅克比法发展而来的概率方法,它通过使用随机数模拟某一过程,通过求其特定的随机变量的数学期望的统计来估计问题的解。At present, there are mainly the following methods for linear equations: (1) direct method, which is a method to obtain the real solution of linear equations directly through certain arithmetic operations, such as Gaussian elimination method and Clem method; ( 2) The iterative method is a method that starts from an initial estimate and gradually approaches the real solution through an iterative method during the solution process, such as Jacobi iterative method and Gauss-Seidel iterative method; (3) Monte Carlo method , is a probability method developed based on the Jacobi method, which simulates a process by using random numbers, and estimates the solution of the problem by finding the statistics of the mathematical expectation of its specific random variable.
目前,针对大规模稀疏线性方程组的求解计算量巨大,同时实际并不需要求解全部解。利用蒙特卡罗方法可求解大规模稀疏线性方程组的单个近似解,并且蒙特卡罗方法的收敛速度与线性方程组的规模大小无关,效率更高。同时,因为该问题本身具有天然的并行性,故可利用基于GPU的并行算法来对求解线性方程组进行加速。At present, the calculation for solving large-scale sparse linear equations is huge, and it is not actually necessary to solve all the solutions. The Monte Carlo method can be used to solve a single approximate solution of large-scale sparse linear equations, and the convergence speed of the Monte Carlo method has nothing to do with the size of the linear equations, which is more efficient. At the same time, because the problem itself has natural parallelism, GPU-based parallel algorithms can be used to accelerate the solution of linear equations.
发明内容Contents of the invention
本发明提供一种基于共享内存的蒙特卡罗方法求解线性方程组的加速方法,该方法可加快并行算法的速度,达到更高的效率。The invention provides an acceleration method for solving linear equations based on a shared memory Monte Carlo method, which can accelerate the speed of parallel algorithms and achieve higher efficiency.
为了达到上述技术效果,本发明的技术方案如下:In order to achieve the above-mentioned technical effect, the technical scheme of the present invention is as follows:
一种基于共享内存的蒙特卡罗方法求解线性方程组的加速方法,包括以下步骤:An acceleration method for solving a linear equation system based on a shared memory Monte Carlo method, comprising the following steps:
S1:对输入的数据进行预处理,得到求解线性方程组所需要的数据;S1: Preprocess the input data to obtain the data needed to solve the linear equation system;
S2:根据S1中得到的数据,利用CUDA架构中的cuRAND库自带的拟随机数生成算法来生成需要拟随机序列和目标求解分量仿真结果;S2: According to the data obtained in S1, use the quasi-random number generation algorithm that comes with the cuRAND library in the CUDA architecture to generate simulation results that require quasi-random sequences and target solution components;
S3:将拟随机序列和目标求解分量仿真结果存储到GPU分配的共享内存中,利用GPU并行的进行求解,最后利用GPU进行归约求和得到所需要的分量。S3: Store the simulation results of the quasi-random sequence and the target solution components in the shared memory allocated by the GPU, use the GPU to solve them in parallel, and finally use the GPU to perform reduction and summation to obtain the required components.
进一步地,所述步骤S1的过程如下:Further, the process of step S1 is as follows:
S11:输入系数矩阵A,向量b,截断误差δ,概然误差ε,和拟随机序列数量N即马尔科夫链数,S11: Input coefficient matrix A, vector b, truncation error δ, probability error ε, and the number of quasi-random sequences N is the number of Markov chains,
其中,一个需要求解的有n个未知数的线性方程组表示为:Ax=b,A是一个n×n的非奇异稀疏矩阵,b是一个n×1维的给定已知向量,而x是一个n×1维的未知数向量,即为所需求解的向量,Among them, a linear equation system with n unknowns that needs to be solved is expressed as: Ax=b, A is a non-singular sparse matrix of n×n, b is a given known vector of n×1 dimension, and x is An n×1-dimensional unknown vector, which is the vector to be solved,
通过变换得到如下的雅克比迭代形式:x=Gx+f,G为一个n×n的矩阵,f为一个n×1的向量;The following Jacobian iterative form is obtained by transformation: x=Gx+f, G is a matrix of n×n, and f is a vector of n×1;
S12:计算矩G:S12: Calculate moment G:
其中,aij为系数矩阵A中的元素;Among them, a ij is the element in the coefficient matrix A;
S13:计算向量f:S13: Calculate the vector f:
其中,aij为系数矩阵A中的元素,bi为向量b中的元素;Among them, a ij is the element in the coefficient matrix A, and b i is the element in the vector b;
S14:计算行范数:S14: Calculate the row norm:
其中,gij为矩阵G中的元素;Among them, g ij is an element in the matrix G;
S15:计算转移概率矩阵P:S15: Calculate transition probability matrix P:
进一步地,所述步骤S2的过程如下:Further, the process of step S2 is as follows:
S21:声明随机数指针,利用GPU中的CUDA架构的cuRAND库,指定拟随机数生成算法;S21: Declare a random number pointer, and use the cuRAND library of the CUDA architecture in the GPU to specify a quasi-random number generation algorithm;
S22:为拟随机序列生成算法设定随机数种子;S22: Set a random number seed for the quasi-random sequence generation algorithm;
S23:指定拟随机序列步长,生成相应数目的拟随机序列,并保存在全局内存中。S23: Specify the step size of the quasi-random sequence, generate a corresponding number of quasi-random sequences, and save them in the global memory.
进一步地,所述步骤S3的过程如下:Further, the process of step S3 is as follows:
S31:首先申请GPU分配共享内存,存储步骤S2中生成的拟随机序列和目标求解分量的一个仿真结果;S31: first apply for the GPU to allocate shared memory, and store the quasi-random sequence generated in step S2 and a simulation result of the target solution component;
S32:利用GPU并行的对每条马尔科夫链进行仿真;S32: use GPU to simulate each Markov chain in parallel;
S33:对所有block中的马尔科夫链仿真结果求和sum并除以马尔科夫链数目N,最后得到需要求解的目标分量的值,即 S33: sum the Markov chain simulation results in all blocks and divide by the Markov chain number N, and finally obtain the value of the target component that needs to be solved, namely
进一步地,所述步骤S32的过程具体如下:Further, the process of step S32 is specifically as follows:
S321:对于每一个线程束warp,使其中的每一线程并行的读取存储于共享内存中的拟随机序列,每一个block对应的共享内存中存储部分拟随机序列;S321: For each thread warp, make each thread read the quasi-random sequence stored in the shared memory in parallel, and store a part of the quasi-random sequence in the shared memory corresponding to each block;
S322:使初始状态为求解的分量对应的点;S322: Make the initial state the point corresponding to the component to be solved;
S323:按序根据拟随机序列中对应的拟随机数和概率转移矩阵对应状态转移概率作比较,选取拟随机数落入的概率区间,选取该转移概率对应的下一个状态为马尔科夫链的下一个状态;S323: Compare the corresponding quasi-random numbers in the quasi-random sequence with the state transition probability corresponding to the probability transition matrix, select the probability interval that the quasi-random number falls into, and select the next state corresponding to the transition probability as the Markov chain next state;
S324:根据C22方法仿真一条马尔科夫链,得到一条链的仿真结果,并将其存储于共享内存中;S324: simulate a Markov chain according to the C22 method, obtain the simulation result of a chain, and store it in the shared memory;
S325:对每个block中的所有马尔科夫链的结果求和并存储于共享内存中。S325: sum the results of all Markov chains in each block and store in the shared memory.
与现有技术相比,本发明技术方案的有益效果是:Compared with the prior art, the beneficial effects of the technical solution of the present invention are:
本发明提出的基于共享内存的蒙特卡罗方法求解线性方程组的加速方法,对于拟随机序列存储于共享内存中,并且通过优化方法存储,使得不同线程对于共享内存的访问不会产生存储体冲突(Bank Conflict),提高了读取速度,使得整个算法加速。The Monte Carlo method based on the shared memory proposed by the present invention is an acceleration method for solving linear equations. The quasi-random sequence is stored in the shared memory, and is stored through an optimization method, so that different threads do not have memory conflicts when accessing the shared memory. (Bank Conflict), which improves the reading speed and speeds up the entire algorithm.
附图说明Description of drawings
图1为本发明基于共享内存的蒙特卡罗方法求解线性方程组的加速方法流程示意图;Fig. 1 is the schematic flow chart of the accelerated method for solving linear equations based on the Monte Carlo method of shared memory in the present invention;
图2为本发明共享内存的具体存储方式;Fig. 2 is the concrete storage mode of shared memory of the present invention;
图3为共享内存存储体冲突说明。Figure 3 is an illustration of shared memory bank conflicts.
具体实施方式detailed description
附图仅用于示例性说明,不能理解为对本专利的限制;The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;
为了更好说明本实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;
对于本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.
下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.
实施例1Example 1
如图1所示,本发明先对输入的数据惊醒了预处理,然后利用Kernel启动GPU并分配对应所需的资源,之后生成固定步长和数目的拟随机序列,将拟随机序列优化存储进共享内存中,利用线程束warp并行读取共享内存中的拟随机序列,并模拟马尔科夫链来获得其仿真结果,对所有的马尔科夫链的仿真结果求和(sum)除以马尔科夫链数目N来获得需要求取的目标分量的值 As shown in Figure 1, the present invention first wakes up the preprocessing of the input data, then uses the Kernel to start the GPU and allocates the corresponding required resources, then generates a quasi-random sequence with a fixed step size and number, and optimizes the quasi-random sequence into In the shared memory, use the thread warp to read the quasi-random sequence in the shared memory in parallel, and simulate the Markov chain to obtain its simulation results, and divide the sum of the simulation results of all Markov chains by the Markov chain The number of husband chains N to obtain the value of the target component that needs to be obtained
如图2所示,32个线程为一个线程束warp,下面为共享存储,共享存储由交替排列的存储体(bank)构成。一般来说,线程束warp内线程数量和共享存储内的存储体(bank)数量是一致的,因此一个线程束warp中的线程可以并行访问共享存储中存储数据。我们将步长为k的拟随机序列按序成列的存储进共享内存的同一存储体中,同一线程束warp中的每条线程模拟一条马尔科夫链需要一个完整的拟随机序列,而一个线程只需访问同一个存储体bank的不同地址即可完成一个拟随机序列的读取,不同的线程只需要访问读取不同的存储体bank,在GPU中,不同的存储体bank可以同时访问读取,所以同一线程束warp中的每条线程可以并行的从本发明中优化存储的拟随机序列,即可以为并行的模拟马尔科夫链做准备而不发生存储体冲突(bank conflict)。As shown in Figure 2, 32 threads form a thread warp, and the following is a shared storage, and the shared storage is composed of alternately arranged storage bodies (banks). Generally speaking, the number of threads in a warp is the same as the number of banks in shared storage, so threads in a warp can access data stored in shared storage in parallel. We store the quasi-random sequences with a step size of k into the same storage bank of the shared memory in sequence. Each thread in the same thread warp needs a complete quasi-random sequence to simulate a Markov chain, and a Threads only need to access different addresses of the same memory bank to complete a quasi-random sequence of reading. Different threads only need to access and read different memory banks. In GPU, different memory banks can simultaneously access and read Therefore, each thread in the same thread warp can optimize the stored quasi-random sequence from the present invention in parallel, that is, prepare for a parallel simulated Markov chain without bank conflict.
如图3所示,一个共享存储由许多交替排列的存储体bank组成。假设图中存储体0(bank 0),即灰色的一列,存储的为部分拟随机序列的第一个拟随机数,这样,不同的线程就需要同时访问存储体0(bank 0)的不同地址,这样就会造成存储体冲突(bankconflict)。然而为了完成拟随机数的读取,就会产生多个线程要求访问同一个存储体bank,但是每次只有一个线程可以访问存储体0(bank0),多个线程只能串行访问来获取结果,从而导致了性能的损失。As shown in Figure 3, a shared memory consists of many alternately arranged memory banks. Assume that bank 0 (bank 0) in the figure, that is, the gray column, stores the first quasi-random number of a partial quasi-random sequence, so that different threads need to access different addresses of bank 0 (bank 0) at the same time , which will cause a bank conflict. However, in order to complete the reading of quasi-random numbers, multiple threads will be generated to request access to the same bank, but only one thread can access bank 0 (bank0) at a time, and multiple threads can only access serially to obtain results , resulting in a performance loss.
本发明基于共享内存的蒙特卡罗求解线性方程组的加速方法的具体步骤如下:The specific steps of the acceleration method based on the Monte Carlo solution of the linear equation system based on the shared memory of the present invention are as follows:
1、首先输入需要计算的线性方程组的数据。包括系数矩阵A,向量b,马尔科夫链的截断误差δ,概然误差ε,以及马尔科夫链数目N,即为拟随机数序列的数目,同时需要知道需要求解的目标分量xm的下标Idx。1. First, input the data of the linear equation system to be calculated. Including the coefficient matrix A, the vector b, the truncation error δ of the Markov chain, the probability error ε, and the number N of the Markov chain, which is the number of quasi-random number sequences. At the same time, it is necessary to know the target component x m to be solved Subscript Idx.
2、数据预处理部分:为了使得原始输入的形式为Ax=b的序列通过一点的变换变化为雅克比形式x=Gx+f,利用输入的系数矩阵A和向量b,来计算雅克比形式中的矩阵G和向量f,公式分别为: 其中aij为系数矩阵A中的元素,bi为向量b中的元素,即通过原有形式数据获得雅克比形式中的数据。之后即可计算矩阵G的行范数:其中gij为矩阵G中的元素。得到行范数之后,即可计算之后需要利用的概率转移矩阵P:其中gis为矩阵G中的元素。2. Data preprocessing part: In order to make the sequence of the original input form Ax=b change into the Jacobian form x=Gx+f through a point transformation, use the input coefficient matrix A and vector b to calculate the Jacobian form The matrix G and vector f of , the formulas are: Among them, a ij is the element in the coefficient matrix A, and b i is the element in the vector b, that is, the data in the Jacobian form is obtained through the data in the original form. Then the row norm of the matrix G can be calculated: Where g ij is an element in the matrix G. After obtaining the row norm, you can calculate the probability transition matrix P that needs to be used later: Where g is the element in the matrix G.
3、生成拟随机数序列。首先声明随机数指针,利用GPU中的CUDA(Compute UnifiedDevice Architecture,统一计算架构)架构的cuRAND库(一个用于生成你随机数列的算法库),通过需要来指定拟随机数生成算法,需要为拟随机序列生成算法设定随机数种子,同时指定拟随机序列步长k,利用拟随机数生成算法生成数目为N的拟随机序列,并将其保存在全局内存中。3. Generate a quasi-random number sequence. First declare the random number pointer, use the cuRAND library of the CUDA (Compute Unified Device Architecture, Unified Computing Architecture) architecture in the GPU (an algorithm library for generating your random number sequence), and specify the quasi-random number generation algorithm as needed. The random sequence generation algorithm sets the random number seed, and at the same time specifies the quasi-random sequence step size k, uses the quasi-random number generation algorithm to generate a quasi-random sequence with a number of N, and saves it in the global memory.
4、为了实现并行化,需要利用Kernel启动GPU并对相应的块(block)分配资源,例如每个块block拥有的共享内存。每个块block中有许多的线程,其中每32个线程组成一个线程束warp,并以线程束warp为单位进行工作。4. In order to achieve parallelization, it is necessary to use the Kernel to start the GPU and allocate resources to the corresponding blocks, such as the shared memory owned by each block. There are many threads in each block block, and every 32 threads form a thread warp, and work in units of thread warp.
5、将拟随机序列优化存储进共享内存。首先从全局内存中读取出拟随机序列,许多拟随机序列组成为一个2D数据。如图2所示,共享内存由许多的存储体(bank)交替排列构成,将拟随机序列,同一个序列成列的存储进存储体bank中,使得对于一个线程束warp中的一组线程来说,例如线程0就可以不受到限制的读取出存储体索引为0的那一列拟随机序列,而不会产生存储体冲突(bank conflict),这样即可使得一个线程束warp可以同时访问多个不同的存储体bank来同时读取出所需要的拟随机序列,而不会产生存储体冲突(bankconflict),即可提高效率,获得更好的并行性。5. Optimize the storage of quasi-random sequences into shared memory. First, the quasi-random sequence is read from the global memory, and many quasi-random sequences form a 2D data. As shown in Figure 2, the shared memory is composed of many memory banks (banks) arranged alternately. The quasi-random sequence and the same sequence are stored in the memory bank in columns, so that for a group of threads in a warp warp In other words, for example, thread 0 can read the quasi-random sequence with bank index 0 without restriction, without generating bank conflicts, so that a thread warp can simultaneously access multiple different memory banks to simultaneously read out the required quasi-random sequence without bank conflict, which can improve efficiency and obtain better parallelism.
6、通过步骤5的优化共享内存的优化存储之后,线程束warp内的线程可以并行的访问读取拟随机序列,对于每一个线程束warp,使其中的每一线程并行的读取存储于共享内存中的拟随机序列,每一个block对应的共享内存中存储部分拟随机序列。在获取到拟随机序列之后,即可开始仿真马尔科夫链。通过获得需要求解的目标分量的下标Idx,使得该条马尔科夫链的初始状态为需要求解的目标分享对应的状态(state←Idx),同时令初始期望E为0,之后根据拟随机序列的顺序,按照顺序的与概率转移矩阵P[state][j],j=1,…,n进行比较,获得拟随机数落入的概率区间,另其对应的状态s为该条马尔科夫链的下一状态(new_state←s),一直循环,直到完成一条马尔科夫链的仿真,得到仿真结果,同时对每个block中的所有马尔科夫链的结果求和,并将其存储于共享内存中。6. After optimizing the shared memory in step 5, the threads in the warp can access and read quasi-random sequences in parallel. For each warp, each thread in it can read and store in parallel in the shared memory. The quasi-random sequence in the memory stores part of the quasi-random sequence in the shared memory corresponding to each block. After obtaining the quasi-random sequence, the Markov chain can be simulated. By obtaining the subscript Idx of the target component that needs to be solved, the initial state of the Markov chain is shared with the corresponding state (state←Idx) for the target to be solved, and the initial expectation E is 0, and then according to the quasi-random sequence In order, compare with the probability transition matrix P[state][j], j=1,...,n in order to obtain the probability interval that the quasi-random number falls into, and the corresponding state s is the Markov The next state of the chain (new_state←s), keeps looping until the simulation of a Markov chain is completed, and the simulation result is obtained. At the same time, the results of all Markov chains in each block are summed and stored in in shared memory.
7、最后当所有的(N条)马尔科夫链都仿真完成之后,对所有的结果进行求和取其平均值,最后即可获得需要求解的目标分量xm。7. Finally, after all (N) Markov chains are simulated, all the results are summed to get the average value, and finally the target component x m to be solved can be obtained.
相同或相似的标号对应相同或相似的部件;The same or similar reference numerals correspond to the same or similar components;
附图中描述位置关系的用于仅用于示例性说明,不能理解为对本专利的限制;The positional relationship described in the drawings is only for illustrative purposes and cannot be construed as a limitation to this patent;
显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.
Claims (5)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710005292.1A CN106681960A (en) | 2017-01-04 | 2017-01-04 | Acceleration method for solution of linear equation set with Monte Carlo method based on shared memory |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710005292.1A CN106681960A (en) | 2017-01-04 | 2017-01-04 | Acceleration method for solution of linear equation set with Monte Carlo method based on shared memory |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN106681960A true CN106681960A (en) | 2017-05-17 |
Family
ID=58850200
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710005292.1A Pending CN106681960A (en) | 2017-01-04 | 2017-01-04 | Acceleration method for solution of linear equation set with Monte Carlo method based on shared memory |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106681960A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107423030A (en) * | 2017-07-28 | 2017-12-01 | 郑州云海信息技术有限公司 | Markov Monte carlo algorithm accelerated method based on FPGA heterogeneous platforms |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070282575A1 (en) * | 2006-06-05 | 2007-12-06 | Cambridge Research & Instrumentation, Inc. | Monte Carlo simulation using GPU units on personal computers |
| CN102819664A (en) * | 2012-07-18 | 2012-12-12 | 中国人民解放军国防科学技术大学 | Influence maximization parallel accelerating method based on graphic processing unit |
| US20140292774A1 (en) * | 2013-03-26 | 2014-10-02 | Nvidia Corporation | System and method for performing sample-based rendering in a parallel processor |
-
2017
- 2017-01-04 CN CN201710005292.1A patent/CN106681960A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070282575A1 (en) * | 2006-06-05 | 2007-12-06 | Cambridge Research & Instrumentation, Inc. | Monte Carlo simulation using GPU units on personal computers |
| CN102819664A (en) * | 2012-07-18 | 2012-12-12 | 中国人民解放军国防科学技术大学 | Influence maximization parallel accelerating method based on graphic processing unit |
| US20140292774A1 (en) * | 2013-03-26 | 2014-10-02 | Nvidia Corporation | System and method for performing sample-based rendering in a parallel processor |
Non-Patent Citations (3)
| Title |
|---|
| MICHAEL MASCAGNI ET AL: "A Parallel Quasi-Monte Carlo Method for Solving Systems of Linear Equations", 《INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE》 * |
| SIYAN LAI ET AL: "GPU-Based Monte Carlo Methods for Solving Linear Algebraic Equations", 《PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CLOUD COMPUTING (ISCC2015)》 * |
| 赖斯䶮等: "蒙特卡罗方法与拟蒙特卡罗方法解线性方程组", 《东华大学学报》 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107423030A (en) * | 2017-07-28 | 2017-12-01 | 郑州云海信息技术有限公司 | Markov Monte carlo algorithm accelerated method based on FPGA heterogeneous platforms |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108170639B (en) | A Realization Method of Tensor CP Decomposition Based on Distributed Environment | |
| Dang et al. | CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations | |
| CN116128019A (en) | Parallel training method and device for transducer model | |
| Zou et al. | Massively simulating adiabatic bifurcations with FPGA to solve combinatorial optimization | |
| CN114911619B (en) | A GPU-based batch parallel LU decomposition method for small and medium-sized dense matrices for simulation systems | |
| CN109901164B (en) | Distributed back projection imaging method of synthetic aperture radar | |
| Sanfui et al. | A two-kernel based strategy for performing assembly in FEA on the graphics processing unit | |
| Ahamed et al. | Schwarz method with two-sided transmission conditions for the gravity equations on graphics processing unit | |
| Bramas | Optimization and parallelization of the boundary element method for the wave equation in time domain | |
| Tang et al. | A Memory-efficient Simulation Method of Grover's Search Algorithm. | |
| Heien et al. | Understanding long-term earthquake behavior through simulation | |
| CN106681960A (en) | Acceleration method for solution of linear equation set with Monte Carlo method based on shared memory | |
| US9600446B2 (en) | Parallel multicolor incomplete LU factorization preconditioning processor and method of use thereof | |
| Abreu et al. | PIC codes in new processors: A full relativistic PIC code in CUDA-enabled hardware with direct visualization | |
| Capozzoli et al. | The success of GPU computing in applied electromagnetics | |
| Magoulès et al. | Green computing on graphics processing units | |
| Maringanti et al. | Acceleration of conjugate gradient method for circuit simulation using CUDA | |
| Kochurov et al. | GPU implementation of Jacobi Method and Gauss-Seidel Method for Data Arrays that Exceed GPU-dedicated Memory Size | |
| Lai et al. | Parallel computations of local pagerank problem based on graphics processing unit | |
| US20200293331A1 (en) | Systems and methods for simulation of dynamic systems | |
| Aksari et al. | Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method | |
| Sun et al. | Cuda based fast implementation of very large matrix computation | |
| Zabelok et al. | GPU accelerated kinetic solvers for rarefied gas dynamics | |
| Chen et al. | Implementation of block algorithm for LU factorization | |
| Islam et al. | A Hierarchical Jacobi Iteration for Structured Matrices on GPUs using Shared Memory |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170517 |