CN106681960A

CN106681960A - Acceleration method for solution of linear equation set with Monte Carlo method based on shared memory

Info

Publication number: CN106681960A
Application number: CN201710005292.1A
Authority: CN
Inventors: 徐颖; 赖斯; 邵博; 林小拉
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2017-01-04
Filing date: 2017-01-04
Publication date: 2017-05-17

Abstract

The invention provides an acceleration method for solution of a linear equation set with a Monte Carlo method based on shared memory. The method is characterized in that a sequence to be randomized is stored in the shared memory and the storage is conducted through optimization. Hence, Bank Conflict will not be generated from access of different threads to the shared memory; the reading speed is increased; and the whole algorithm is accelerated.

Description

An Accelerated Method for Solving Linear Equations Based on Shared Memory Monte Carlo Method

技术领域technical field

本发明涉及数值计算领域，更具体地，涉及一种基于共享内存的蒙特卡罗方法求解线性方程组的加速方法。The invention relates to the field of numerical calculation, and more specifically, relates to an acceleration method for solving linear equations based on a shared memory Monte Carlo method.

背景技术Background technique

对于大规模科学和工程计算来说，因为许多问题最后都可归结为线性方程组，故大规模稀疏线性方程组的求解方法是其核心问题之一，具有重要的理论意义和实际应用价值。蒙特卡罗方法是一种以概率统计理论为基础的数值计算方法，通过利用随机数来解决许多的实际问题。自蒙特卡罗方法于20世纪50年代被提出用于求解线性方程组以来，已经有了许多改进，并且被广泛应用于金融工程、计算物理学等领域。For large-scale scientific and engineering computing, because many problems can be attributed to linear equations, the solution method of large-scale sparse linear equations is one of the core issues, which has important theoretical significance and practical application value. The Monte Carlo method is a numerical calculation method based on the theory of probability and statistics, which can solve many practical problems by using random numbers. Since the Monte Carlo method was proposed to solve linear equations in the 1950s, many improvements have been made and it has been widely used in financial engineering, computational physics, and other fields.

目前对于线性方程组主要有以下几种方法：(1)直接法，它是直接通过一定的算术运算后得到线性方程组的真实解的一种方法，例如高斯消去法和克莱姆法；(2)迭代法，是从一个初始估计出发，在求解过程中通过迭代的方法去逐步逼近真实解的一种方法，例如雅克比迭代法、高斯-赛德尔迭代法；(3)蒙特卡罗方法，是一种基于雅克比法发展而来的概率方法，它通过使用随机数模拟某一过程，通过求其特定的随机变量的数学期望的统计来估计问题的解。At present, there are mainly the following methods for linear equations: (1) direct method, which is a method to obtain the real solution of linear equations directly through certain arithmetic operations, such as Gaussian elimination method and Clem method; ( 2) The iterative method is a method that starts from an initial estimate and gradually approaches the real solution through an iterative method during the solution process, such as Jacobi iterative method and Gauss-Seidel iterative method; (3) Monte Carlo method , is a probability method developed based on the Jacobi method, which simulates a process by using random numbers, and estimates the solution of the problem by finding the statistics of the mathematical expectation of its specific random variable.

目前，针对大规模稀疏线性方程组的求解计算量巨大，同时实际并不需要求解全部解。利用蒙特卡罗方法可求解大规模稀疏线性方程组的单个近似解，并且蒙特卡罗方法的收敛速度与线性方程组的规模大小无关，效率更高。同时，因为该问题本身具有天然的并行性，故可利用基于GPU的并行算法来对求解线性方程组进行加速。At present, the calculation for solving large-scale sparse linear equations is huge, and it is not actually necessary to solve all the solutions. The Monte Carlo method can be used to solve a single approximate solution of large-scale sparse linear equations, and the convergence speed of the Monte Carlo method has nothing to do with the size of the linear equations, which is more efficient. At the same time, because the problem itself has natural parallelism, GPU-based parallel algorithms can be used to accelerate the solution of linear equations.

发明内容Contents of the invention

本发明提供一种基于共享内存的蒙特卡罗方法求解线性方程组的加速方法，该方法可加快并行算法的速度，达到更高的效率。The invention provides an acceleration method for solving linear equations based on a shared memory Monte Carlo method, which can accelerate the speed of parallel algorithms and achieve higher efficiency.

为了达到上述技术效果，本发明的技术方案如下：In order to achieve the above-mentioned technical effect, the technical scheme of the present invention is as follows:

一种基于共享内存的蒙特卡罗方法求解线性方程组的加速方法，包括以下步骤：An acceleration method for solving a linear equation system based on a shared memory Monte Carlo method, comprising the following steps:

S1：对输入的数据进行预处理，得到求解线性方程组所需要的数据；S1: Preprocess the input data to obtain the data needed to solve the linear equation system;

S2：根据S1中得到的数据，利用CUDA架构中的cuRAND库自带的拟随机数生成算法来生成需要拟随机序列和目标求解分量仿真结果；S2: According to the data obtained in S1, use the quasi-random number generation algorithm that comes with the cuRAND library in the CUDA architecture to generate simulation results that require quasi-random sequences and target solution components;

S3：将拟随机序列和目标求解分量仿真结果存储到GPU分配的共享内存中，利用GPU并行的进行求解，最后利用GPU进行归约求和得到所需要的分量。S3: Store the simulation results of the quasi-random sequence and the target solution components in the shared memory allocated by the GPU, use the GPU to solve them in parallel, and finally use the GPU to perform reduction and summation to obtain the required components.

进一步地，所述步骤S1的过程如下：Further, the process of step S1 is as follows:

S11：输入系数矩阵A，向量b，截断误差δ，概然误差ε，和拟随机序列数量N即马尔科夫链数，S11: Input coefficient matrix A, vector b, truncation error δ, probability error ε, and the number of quasi-random sequences N is the number of Markov chains,

其中，一个需要求解的有n个未知数的线性方程组表示为：Ax＝b，A是一个n×n的非奇异稀疏矩阵，b是一个n×1维的给定已知向量，而x是一个n×1维的未知数向量，即为所需求解的向量，Among them, a linear equation system with n unknowns that needs to be solved is expressed as: Ax=b, A is a non-singular sparse matrix of n×n, b is a given known vector of n×1 dimension, and x is An n×1-dimensional unknown vector, which is the vector to be solved,

通过变换得到如下的雅克比迭代形式：x＝Gx+f，G为一个n×n的矩阵，f为一个n×1的向量；The following Jacobian iterative form is obtained by transformation: x=Gx+f, G is a matrix of n×n, and f is a vector of n×1;

S12：计算矩G：S12: Calculate moment G:

其中，a_ij为系数矩阵A中的元素；Among them, a _ij is the element in the coefficient matrix A;

S13：计算向量f：S13: Calculate the vector f:

其中，a_ij为系数矩阵A中的元素，b_i为向量b中的元素；Among them, a _ij is the element in the coefficient matrix A, and b _i is the element in the vector b;

S14：计算行范数：S14: Calculate the row norm:

其中，g_ij为矩阵G中的元素；Among them, g _ij is an element in the matrix G;

S15：计算转移概率矩阵P：S15: Calculate transition probability matrix P:

进一步地，所述步骤S2的过程如下：Further, the process of step S2 is as follows:

S21：声明随机数指针，利用GPU中的CUDA架构的cuRAND库，指定拟随机数生成算法；S21: Declare a random number pointer, and use the cuRAND library of the CUDA architecture in the GPU to specify a quasi-random number generation algorithm;

S22：为拟随机序列生成算法设定随机数种子；S22: Set a random number seed for the quasi-random sequence generation algorithm;

S23：指定拟随机序列步长，生成相应数目的拟随机序列，并保存在全局内存中。S23: Specify the step size of the quasi-random sequence, generate a corresponding number of quasi-random sequences, and save them in the global memory.

进一步地，所述步骤S3的过程如下：Further, the process of step S3 is as follows:

S31：首先申请GPU分配共享内存，存储步骤S2中生成的拟随机序列和目标求解分量的一个仿真结果；S31: first apply for the GPU to allocate shared memory, and store the quasi-random sequence generated in step S2 and a simulation result of the target solution component;

S32：利用GPU并行的对每条马尔科夫链进行仿真；S32: use GPU to simulate each Markov chain in parallel;

S33:对所有block中的马尔科夫链仿真结果求和sum并除以马尔科夫链数目N，最后得到需要求解的目标分量的值，即 S33: sum the Markov chain simulation results in all blocks and divide by the Markov chain number N, and finally obtain the value of the target component that needs to be solved, namely

进一步地，所述步骤S32的过程具体如下：Further, the process of step S32 is specifically as follows:

S321：对于每一个线程束warp，使其中的每一线程并行的读取存储于共享内存中的拟随机序列，每一个block对应的共享内存中存储部分拟随机序列；S321: For each thread warp, make each thread read the quasi-random sequence stored in the shared memory in parallel, and store a part of the quasi-random sequence in the shared memory corresponding to each block;

S322：使初始状态为求解的分量对应的点；S322: Make the initial state the point corresponding to the component to be solved;

S323：按序根据拟随机序列中对应的拟随机数和概率转移矩阵对应状态转移概率作比较，选取拟随机数落入的概率区间，选取该转移概率对应的下一个状态为马尔科夫链的下一个状态；S323: Compare the corresponding quasi-random numbers in the quasi-random sequence with the state transition probability corresponding to the probability transition matrix, select the probability interval that the quasi-random number falls into, and select the next state corresponding to the transition probability as the Markov chain next state;

S324:根据C22方法仿真一条马尔科夫链，得到一条链的仿真结果，并将其存储于共享内存中；S324: simulate a Markov chain according to the C22 method, obtain the simulation result of a chain, and store it in the shared memory;

S325:对每个block中的所有马尔科夫链的结果求和并存储于共享内存中。S325: sum the results of all Markov chains in each block and store in the shared memory.

与现有技术相比，本发明技术方案的有益效果是：Compared with the prior art, the beneficial effects of the technical solution of the present invention are:

本发明提出的基于共享内存的蒙特卡罗方法求解线性方程组的加速方法，对于拟随机序列存储于共享内存中，并且通过优化方法存储，使得不同线程对于共享内存的访问不会产生存储体冲突(Bank Conflict)，提高了读取速度，使得整个算法加速。The Monte Carlo method based on the shared memory proposed by the present invention is an acceleration method for solving linear equations. The quasi-random sequence is stored in the shared memory, and is stored through an optimization method, so that different threads do not have memory conflicts when accessing the shared memory. (Bank Conflict), which improves the reading speed and speeds up the entire algorithm.

附图说明Description of drawings

图1为本发明基于共享内存的蒙特卡罗方法求解线性方程组的加速方法流程示意图；Fig. 1 is the schematic flow chart of the accelerated method for solving linear equations based on the Monte Carlo method of shared memory in the present invention;

图2为本发明共享内存的具体存储方式；Fig. 2 is the concrete storage mode of shared memory of the present invention;

图3为共享内存存储体冲突说明。Figure 3 is an illustration of shared memory bank conflicts.

具体实施方式detailed description

附图仅用于示例性说明，不能理解为对本专利的限制；The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;

为了更好说明本实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;

对于本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.

下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

实施例1Example 1

如图1所示，本发明先对输入的数据惊醒了预处理，然后利用Kernel启动GPU并分配对应所需的资源，之后生成固定步长和数目的拟随机序列，将拟随机序列优化存储进共享内存中，利用线程束warp并行读取共享内存中的拟随机序列，并模拟马尔科夫链来获得其仿真结果，对所有的马尔科夫链的仿真结果求和(sum)除以马尔科夫链数目N来获得需要求取的目标分量的值 As shown in Figure 1, the present invention first wakes up the preprocessing of the input data, then uses the Kernel to start the GPU and allocates the corresponding required resources, then generates a quasi-random sequence with a fixed step size and number, and optimizes the quasi-random sequence into In the shared memory, use the thread warp to read the quasi-random sequence in the shared memory in parallel, and simulate the Markov chain to obtain its simulation results, and divide the sum of the simulation results of all Markov chains by the Markov chain The number of husband chains N to obtain the value of the target component that needs to be obtained

如图2所示，32个线程为一个线程束warp，下面为共享存储，共享存储由交替排列的存储体(bank)构成。一般来说，线程束warp内线程数量和共享存储内的存储体(bank)数量是一致的，因此一个线程束warp中的线程可以并行访问共享存储中存储数据。我们将步长为k的拟随机序列按序成列的存储进共享内存的同一存储体中，同一线程束warp中的每条线程模拟一条马尔科夫链需要一个完整的拟随机序列，而一个线程只需访问同一个存储体bank的不同地址即可完成一个拟随机序列的读取，不同的线程只需要访问读取不同的存储体bank，在GPU中，不同的存储体bank可以同时访问读取，所以同一线程束warp中的每条线程可以并行的从本发明中优化存储的拟随机序列，即可以为并行的模拟马尔科夫链做准备而不发生存储体冲突(bank conflict)。As shown in Figure 2, 32 threads form a thread warp, and the following is a shared storage, and the shared storage is composed of alternately arranged storage bodies (banks). Generally speaking, the number of threads in a warp is the same as the number of banks in shared storage, so threads in a warp can access data stored in shared storage in parallel. We store the quasi-random sequences with a step size of k into the same storage bank of the shared memory in sequence. Each thread in the same thread warp needs a complete quasi-random sequence to simulate a Markov chain, and a Threads only need to access different addresses of the same memory bank to complete a quasi-random sequence of reading. Different threads only need to access and read different memory banks. In GPU, different memory banks can simultaneously access and read Therefore, each thread in the same thread warp can optimize the stored quasi-random sequence from the present invention in parallel, that is, prepare for a parallel simulated Markov chain without bank conflict.

如图3所示，一个共享存储由许多交替排列的存储体bank组成。假设图中存储体0(bank 0)，即灰色的一列，存储的为部分拟随机序列的第一个拟随机数，这样，不同的线程就需要同时访问存储体0(bank 0)的不同地址，这样就会造成存储体冲突(bankconflict)。然而为了完成拟随机数的读取，就会产生多个线程要求访问同一个存储体bank，但是每次只有一个线程可以访问存储体0(bank0)，多个线程只能串行访问来获取结果，从而导致了性能的损失。As shown in Figure 3, a shared memory consists of many alternately arranged memory banks. Assume that bank 0 (bank 0) in the figure, that is, the gray column, stores the first quasi-random number of a partial quasi-random sequence, so that different threads need to access different addresses of bank 0 (bank 0) at the same time , which will cause a bank conflict. However, in order to complete the reading of quasi-random numbers, multiple threads will be generated to request access to the same bank, but only one thread can access bank 0 (bank0) at a time, and multiple threads can only access serially to obtain results , resulting in a performance loss.

本发明基于共享内存的蒙特卡罗求解线性方程组的加速方法的具体步骤如下：The specific steps of the acceleration method based on the Monte Carlo solution of the linear equation system based on the shared memory of the present invention are as follows:

1、首先输入需要计算的线性方程组的数据。包括系数矩阵A，向量b，马尔科夫链的截断误差δ，概然误差ε，以及马尔科夫链数目N，即为拟随机数序列的数目，同时需要知道需要求解的目标分量x_m的下标Idx。1. First, input the data of the linear equation system to be calculated. Including the coefficient matrix A, the vector b, the truncation error δ of the Markov chain, the probability error ε, and the number N of the Markov chain, which is the number of quasi-random number sequences. At the same time, it is necessary to know the target component x _m to be solved Subscript Idx.

2、数据预处理部分：为了使得原始输入的形式为Ax＝b的序列通过一点的变换变化为雅克比形式x＝Gx+f，利用输入的系数矩阵A和向量b，来计算雅克比形式中的矩阵G和向量f，公式分别为：其中a_ij为系数矩阵A中的元素，b_i为向量b中的元素，即通过原有形式数据获得雅克比形式中的数据。之后即可计算矩阵G的行范数：其中g_ij为矩阵G中的元素。得到行范数之后，即可计算之后需要利用的概率转移矩阵P：其中g_is为矩阵G中的元素。2. Data preprocessing part: In order to make the sequence of the original input form Ax=b change into the Jacobian form x=Gx+f through a point transformation, use the input coefficient matrix A and vector b to calculate the Jacobian form The matrix G and vector f of , the formulas are: Among them, a _ij is the element in the coefficient matrix A, and b _i is the element in the vector b, that is, the data in the Jacobian form is obtained through the data in the original form. Then the row norm of the matrix G can be calculated: Where g _ij is an element in the matrix G. After obtaining the row norm, you can calculate the probability transition matrix P that needs to be used later: Where g _is the element in the matrix G.

3、生成拟随机数序列。首先声明随机数指针，利用GPU中的CUDA(Compute UnifiedDevice Architecture，统一计算架构)架构的cuRAND库(一个用于生成你随机数列的算法库)，通过需要来指定拟随机数生成算法，需要为拟随机序列生成算法设定随机数种子，同时指定拟随机序列步长k，利用拟随机数生成算法生成数目为N的拟随机序列，并将其保存在全局内存中。3. Generate a quasi-random number sequence. First declare the random number pointer, use the cuRAND library of the CUDA (Compute Unified Device Architecture, Unified Computing Architecture) architecture in the GPU (an algorithm library for generating your random number sequence), and specify the quasi-random number generation algorithm as needed. The random sequence generation algorithm sets the random number seed, and at the same time specifies the quasi-random sequence step size k, uses the quasi-random number generation algorithm to generate a quasi-random sequence with a number of N, and saves it in the global memory.

4、为了实现并行化，需要利用Kernel启动GPU并对相应的块(block)分配资源，例如每个块block拥有的共享内存。每个块block中有许多的线程，其中每32个线程组成一个线程束warp,并以线程束warp为单位进行工作。4. In order to achieve parallelization, it is necessary to use the Kernel to start the GPU and allocate resources to the corresponding blocks, such as the shared memory owned by each block. There are many threads in each block block, and every 32 threads form a thread warp, and work in units of thread warp.

5、将拟随机序列优化存储进共享内存。首先从全局内存中读取出拟随机序列，许多拟随机序列组成为一个2D数据。如图2所示，共享内存由许多的存储体(bank)交替排列构成，将拟随机序列，同一个序列成列的存储进存储体bank中，使得对于一个线程束warp中的一组线程来说，例如线程0就可以不受到限制的读取出存储体索引为0的那一列拟随机序列，而不会产生存储体冲突(bank conflict)，这样即可使得一个线程束warp可以同时访问多个不同的存储体bank来同时读取出所需要的拟随机序列，而不会产生存储体冲突(bankconflict)，即可提高效率，获得更好的并行性。5. Optimize the storage of quasi-random sequences into shared memory. First, the quasi-random sequence is read from the global memory, and many quasi-random sequences form a 2D data. As shown in Figure 2, the shared memory is composed of many memory banks (banks) arranged alternately. The quasi-random sequence and the same sequence are stored in the memory bank in columns, so that for a group of threads in a warp warp In other words, for example, thread 0 can read the quasi-random sequence with bank index 0 without restriction, without generating bank conflicts, so that a thread warp can simultaneously access multiple different memory banks to simultaneously read out the required quasi-random sequence without bank conflict, which can improve efficiency and obtain better parallelism.

6、通过步骤5的优化共享内存的优化存储之后，线程束warp内的线程可以并行的访问读取拟随机序列，对于每一个线程束warp，使其中的每一线程并行的读取存储于共享内存中的拟随机序列，每一个block对应的共享内存中存储部分拟随机序列。在获取到拟随机序列之后，即可开始仿真马尔科夫链。通过获得需要求解的目标分量的下标Idx，使得该条马尔科夫链的初始状态为需要求解的目标分享对应的状态(state←Idx)，同时令初始期望E为0，之后根据拟随机序列的顺序，按照顺序的与概率转移矩阵P[state][j]，j＝1,…,n进行比较，获得拟随机数落入的概率区间，另其对应的状态s为该条马尔科夫链的下一状态(new_state←s)，一直循环，直到完成一条马尔科夫链的仿真，得到仿真结果，同时对每个block中的所有马尔科夫链的结果求和，并将其存储于共享内存中。6. After optimizing the shared memory in step 5, the threads in the warp can access and read quasi-random sequences in parallel. For each warp, each thread in it can read and store in parallel in the shared memory. The quasi-random sequence in the memory stores part of the quasi-random sequence in the shared memory corresponding to each block. After obtaining the quasi-random sequence, the Markov chain can be simulated. By obtaining the subscript Idx of the target component that needs to be solved, the initial state of the Markov chain is shared with the corresponding state (state←Idx) for the target to be solved, and the initial expectation E is 0, and then according to the quasi-random sequence In order, compare with the probability transition matrix P[state][j], j=1,...,n in order to obtain the probability interval that the quasi-random number falls into, and the corresponding state s is the Markov The next state of the chain (new_state←s), keeps looping until the simulation of a Markov chain is completed, and the simulation result is obtained. At the same time, the results of all Markov chains in each block are summed and stored in in shared memory.

7、最后当所有的(N条)马尔科夫链都仿真完成之后，对所有的结果进行求和取其平均值，最后即可获得需要求解的目标分量x_m。7. Finally, after all (N) Markov chains are simulated, all the results are summed to get the average value, and finally the target component x _m to be solved can be obtained.

相同或相似的标号对应相同或相似的部件；The same or similar reference numerals correspond to the same or similar components;

附图中描述位置关系的用于仅用于示例性说明，不能理解为对本专利的限制；The positional relationship described in the drawings is only for illustrative purposes and cannot be construed as a limitation to this patent;

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.

Claims

1. an accelerated method for solving linear equations based on the Monte Carlo method of shared memory, it is characterized in that, comprising the following steps:

S1: Preprocess the input data to obtain the data needed to solve the linear equation system;

S2: According to the data obtained in S1, use the quasi-random number generation algorithm that comes with the cuRAND library in the CUDA architecture to generate simulation results that require quasi-random sequences and target solution components;

S3: Store the simulation results of the quasi-random sequence and the target solution components in the shared memory allocated by the GPU, use the GPU to solve them in parallel, and finally use the GPU to perform reduction and summation to obtain the required components.

2. the method for accelerating the solving of linear equations based on the Monte Carlo method of shared memory according to claim 1, is characterized in that, the process of described step S1 is as follows:

S11: Input coefficient matrix A, vector b, truncation error δ, probability error ε, and the number of quasi-random sequences N is the number of Markov chains,

Among them, a linear equation system with n unknowns that needs to be solved is expressed as: Ax=b, A is a non-singular sparse matrix of n×n, b is a given known vector of n×1 dimension, and x is An n×1-dimensional unknown vector, which is the vector to be solved,

The following Jacobian iterative form is obtained by transformation: x=Gx+f, G is a matrix of n×n, and f is a vector of n×1;

S12: Calculate moment G:

{{{g g}_{i i j j}}}_{i i,, j j = = 11}^{n no} = = \{\begin{matrix} 00,, w w h h e e n no i i = = j j \\ \frac{{a a}_{i i j j}}{{a a}_{i i i i}},, w w h h e e n no i i &NotEqual; &NotEqual; j j \end{matrix}

Among them, a _ij is the element in the coefficient matrix A;

S13: Calculate the vector f:

{f f}_{i i} = = \frac{{b b}_{i i}}{{a a}_{i i j j}},, i i = = 11,, ... ...,, n no

Among them, a _ij is the element in the coefficient matrix A, and b _i is the element in the vector b;

S14: Calculate the row norm:

R R o o w w n no o o r r m m ((i i)) = = {Σ Σ}_{j j = = 00}^{n no} | | {g g}_{i i j j} | |,, i i = = 11,, ... ...,, n no

Among them, g _ij is an element in the matrix G;

S15: Calculate transition probability matrix P:

{p p}_{i i j j} = = \frac{{Σ Σ}_{s the s = = 00}^{j j} | | {g g}_{i i s the s} | |}{R R o o w w n no o o r r m m ((i i))},, i i,, j j = = 11,, ... ...,, n no . .

3. the method for accelerating the solving of linear equations based on the Monte Carlo method of shared memory according to claim 2, is characterized in that, the process of described step S2 is as follows:

S21: Declare a random number pointer, and use the cuRAND library of the CUDA architecture in the GPU to specify a quasi-random number generation algorithm;

S22: Set a random number seed for the quasi-random sequence generation algorithm;

S23: Specify the step size of the quasi-random sequence, generate a corresponding number of quasi-random sequences, and save them in the global memory.

4. the method for accelerating the solving of linear equations based on the Monte Carlo method of shared memory according to claim 3, is characterized in that, the process of described step S3 is as follows:

S31: first apply for the GPU to allocate shared memory, and store the quasi-random sequence generated in step S2 and a simulation result of the target solution component;

S32: use GPU to simulate each Markov chain in parallel;

S33: sum the Markov chain simulation results in all blocks and divide by the Markov chain number N, and finally obtain the value of the target component that needs to be solved, namely

5. the method for accelerating the solution of linear equations based on the Monte Carlo method of shared memory according to claim 4, is characterized in that, the process of described step S32 is specifically as follows:

S321: For each thread warp, make each thread read the quasi-random sequence stored in the shared memory in parallel, and store a part of the quasi-random sequence in the shared memory corresponding to each block;

S322: Make the initial state the point corresponding to the component to be solved;

S323: Compare the corresponding quasi-random numbers in the quasi-random sequence with the state transition probability corresponding to the probability transition matrix, select the probability interval that the quasi-random number falls into, and select the next state corresponding to the transition probability as the Markov chain next state;

S324: simulate a Markov chain according to the C22 method, obtain the simulation result of a chain, and store it in the shared memory;

S325: sum the results of all Markov chains in each block and store in the shared memory.