The drafting task dynamic equalization distribution method of GPU parallel ray tracing cluster
Technical field
The invention belongs to virtual three-dimensional scene rendering technique fields, are related to a kind of drafting of GPU parallel ray tracing cluster
Task dynamic equalization distribution method.
Background technique
Ray trace has been widely used among three-dimensional scenic drafting.It is drawn to reduce the ray trace of three-dimensional scenic
Time overhead processed can use parallel computing to accelerate to ray trace.Currently, people generally use GPU
It realizes parallelization ray trace, shortens ray trace significantly and calculate the time.If multiple GPU, which are calculated node, passes through network
It connects and constitutes GPU computing cluster, then can use the speed for further improving ray trace drafting between node parallel.
For needing the three-dimensional scenic of interaction to draw application, when accelerating ray trace to draw using GPU computing cluster, it is necessary to using good
Good load distribution and balancing technique could sufficiently excavate the potential that each GPU calculates node, and part GPU is avoided to calculate node mistake
In busy and part GPU calculating node spare time etc. problem.It is to the most direct mode that ray trace task is allocated, to screen
Curtain pixel carries out piecemeal, calculates node for each GPU and distributes different piecemeals.Ray-tracing programs usually on GPU use
The tall and handsome CUDA up to company writes.The parallel computational model of CUDA is actually data parallel mode.When multiple CUDA threads
Spatially consecutive hours, parallel acceleration effect are best for the data of access.Specifically, for the ray trace on GPU, if more
The light transmission path of a parallel C UDA thread tracking is very close, then when carrying out the intersection testing of light and geometric object,
The scene space accelerating structure and geometric object data for tracking each thread accesses of different light can continuously, be held simultaneously
The instruction sequence individual path of capable each thread is also more likely to identical.If multiple threads of the same Warp of CUDA are encountering
Different selection branches can be executed when case statement, the degree of parallelism that will lead to the thread of the Warp reduces.Therefore, GPU light is executed
When line tracing task, preferably makes while the corresponding pixel of a plurality of light of Parallel Tracking is closed on as far as possible, to guarantee these as far as possible
Light is transmitted according to substantially common direction.In this sense, square piecemeal is pressed usually than by long to screen pixels
Rectangular piecemeal is more preferable.For dynamic scene, usually there is good correlation, therefore can be between the continuous two frames picture in front and back
The drafting time overhead of a later frame picture is estimated with the drafting time overhead statistical result of former frame picture.
Summary of the invention
The object of the present invention is to provide a kind of drafting task dynamic equalization distribution sides of GPU parallel ray tracing cluster
Method is realized and balancedly distributes drafting task for each drafting node computer of GPU parallel ray tracing cluster.
The technical scheme of the present invention is realized as follows: a kind of drafting task dynamic of GPU parallel ray tracing cluster is equal
Weigh distribution method, it is characterised in that: GPU parallel ray tracing cluster to be used is drawn by 1 control node computer and n
Node computer is interconnected composition by the network switch, and wherein n is greater than 1 integer, each drafting node meter
Calculation machine software and hardware configuration having the same is equipped with GPU parallel computation unit.This method is first on control node computer
Piecemeal is carried out to three-dimensional scenic picture pixel matrix and each piecemeal is numbered.Fig. 1 is shown to three-dimensional scenic picture photo
Prime matrix carries out the schematic diagram of piecemeal, and in addition to last column or last line, other each piecemeals include identical pixel
Line number and identical pixel columns.This method according to previous frame pattern drafting time overhead come to draw next frame picture when it is each
The drawing node computer of the task carries out dynamic approximate equalization distribution, and as each drafting node computer distributes several pixels
Total drafting time overhead of piecemeal, the blocks of pixels for getting each drafting node computer is approximately equal.Needed for this method
Data structure and the specific implementation steps are as follows:
A kind of data structure PIXBLOCK is provided, for storing blocks of pixels information, data structure PIXBLOCK includes picture
The starting of the number NO of plain piecemeal, the starting line number RowS of blocks of pixels, the end line number RowE of blocks of pixels, blocks of pixels arrange
Number ColS, the end row number ColE of blocks of pixels, blocks of pixels drafting time overhead COST totally six member variables.
1) first part of this method realizes three-dimensional scenic picture pixel partitioning of matrix, and division parameter is stored in
On the control node computer of GPU parallel ray tracing cluster, the specific implementation steps are as follows:
Step101: operation cluster rendering controls program on control node computer, and it is defeated to control program by cluster rendering
Enter the number of lines of pixels M and pixel columns N of the three-dimensional scenic picture to be drawn;Program, which is controlled, by cluster rendering inputs blocks of pixels
Number of lines of pixels Bs;
Step102: following operation is executed in cluster rendering control program:
Step102-1: it calculatesM>Bs, N > Bs,Expression is rounded downwards x,
This means that the picture element matrix of three-dimensional scenic picture is divided into a blocks of pixels of M ' × N ', i.e. the 1st blocks of pixels, the 2nd picture
Plain piecemeal, and so on until a blocks of pixels of M ' × N ';From the point of view of the picture element matrix of entire three-dimensional scenic picture, m
The i-th of a blocks of pixels actually picture element matrix of corresponding three-dimensional scenic picturebIt goes to i-theRow, jthbArrange jtheThe picture of column
Element, wherein
Step102-2: the one-dimension array A001 comprising a element of M ' × N ' is created in memory, array A001's is every
A element stores the variable of a data structure PIXBLOCK type;By the incremental sequence of blocks of pixels number, i.e., opened from the 1st
Begin a up to M ' × N ', be done as follows one by one for each blocks of pixels A002:
The variables A 003 of a data structure PIXBLOCK type is created in memory;According to the number of blocks of pixels A002
M calculates ib、ie、jb、jeValue;The number NO member variable of the blocks of pixels of variables A 003 is assigned a value of m, variables A 003
The drafting time overhead COST member variable of blocks of pixels is assigned a value of 1, the starting line number RowS of the blocks of pixels of variables A 003
Member variable is assigned a value of ib, the end line number RowE member variable of the blocks of pixels of variables A 003 is assigned a value of ie, variables A 003
The starting row number ColS member variable of blocks of pixels be assigned a value of jb, the end row number ColE of the blocks of pixels of variables A 003 at
Member's variable assignments is je, m-th of element of array A001 is assigned a value of the value of variables A 003;M-th of element pair of array A001
Answer m-th of blocks of pixels;
2) second part of this method realizes the equalization task distribution of GPU parallel ray tracing cluster, and specific steps are such as
Under:
Step201: start blocks of pixels drawing program on all drafting node computers;It controls on node computer
Cluster rendering controls program and three-dimensional scene models is sent to the pixel run on each drafting node computer by network
Piecemeal drawing program, the blocks of pixels drawing program run on each drafting node computer is the three-dimensional scenic mould received
Type is stored in respective memory;Cluster rendering controls program and creates a list in the memory of control node computer
B001, the variable of structure PIXBLOCK type, enables list B001 for sky for storing data;Program is controlled by cluster rendering,
N queue QBlock, the element storing data structure of queue QBlock are created in the memory of control node computer
The variable of PIXBLOCK type, i-thqA queue QBlock distributes to i-th for storingqA pixel for drawing node computer point
Block message, iq=1,2 ..., n;Enable each queue QBlock for sky;
Step202: cluster rendering controls program and is inputted according to the current human-computer interaction of user virtual camera parameter is arranged
CamParam, cluster rendering control program draw journey to the blocks of pixels run on each drafting node computer by network
Sequence sends virtual camera parameter CamParam;The blocks of pixels drawing program that runs is according to connecing on each drafting node computer
Virtual camera used in the virtual camera parameter CamParam setting drawing three-dimensional scenic picture received;
Step203: the data structure that program stores all elements of array A001 is controlled by cluster rendering
The variable of PIXBLOCK type is added in list B001;The data structure PIXBLOCK type that the element of list B001 is stored
Variable blocks of pixels drafting time overhead COST member variable value as keyword, by sequence from big to small to column
The element of table B001 is ranked up;
Step204: the variable for the data structure PIXBLOCK type that the 1st element of list B001 stores is added to the
In 1 queue QBlock, the variable for the data structure PIXBLOCK type that the 2nd element of list B001 stores is added to the
In 2 queue QBlock, and so on, the change for the data structure PIXBLOCK type that the nth elements of list B001 are stored
Amount is added in n-th of queue QBlock;Counter Counter is enabled to be equal to n+1;Program is controlled by cluster rendering to tie in control
One one-dimension array ARRCOST comprising n element of creation in the memory of point computer;
Step205: the value for enabling all elements of array ARRCOST is all 0;It is directed to i respectivelyq=1,2 ..., n calculate i-thq
The drafting time of the blocks of pixels of the variable of the data structure PIXBLOCK type of all elements storage in a queue QBlock
The cumulative and SUMC of expense COST member variable, the i-th of array ARRCOSTqA element is assigned a value of cumulative and SUMC;Calculate number
Number IDA of the smallest element of value of group ARRCOST in array ARRCOST, deposits the Counter element of list B001
The variable of the data structure PIXBLOCK type of storage is added in DA queue QBlock of I;Enable Counter=Counter+1;
Step206: if Counter > M ' × N ', Step207 is gone to step, Step205 is otherwise gone to step;
Step207: it is directed to i respectivelyq=1,2 ..., n, cluster rendering control program by network i-thqA queue
QBlock is sent to i-thqA blocks of pixels drawing program drawn on node computer;
Step208: for iq=1,2 ..., n, i-thqA blocks of pixels drawing program drawn on node computer executes
It operates below:
1. calculating i-th receivedqThe element number Num that a queue QBlock includes, creating one in memory includes
Each element of array C002 is assigned a value of 0, the element of array C002 and i-th by the one-dimension array C002 of Num elementqIt is a
The element of queue QBlock corresponds, i.e. the 1st of array C002 the element corresponding i-thqThe 1st member of a queue QBlock
Element, the 2nd element corresponding i-th of array C002qThe 2nd element of a queue QBlock, and so on;
2. for i-th receivedqThe data structure PIXBLOCK type of each element storage of a queue QBlock
Variable C001, is done as follows respectively: being drawn out with ray tracking technology by the starting line number of the blocks of pixels of variable C001
RowS member variable, the end line number RowE member variable of blocks of pixels, blocks of pixels starting row number ColS member variable, as
The all pixels color value for the blocks of pixels C004 that the value of the end row number ColE member variable of plain piecemeal determines simultaneously records corresponding
Blocks of pixels draw time overhead C003, drafting time overhead C003 is assigned to variable C001 corresponding i-thqA queue
The element of the corresponding array C002 of the element of QBlock;
3. when to i-th receivedqThe data structure PIXBLOCK class for all elements storage that a queue QBlock includes
After the variable C001 of type has executed 2. corresponding operating that walks, blocks of pixels drawing program is all pixels piecemeal drawn out
The pixel color value and array C002 of C004 is sent to the cluster rendering control program on control node computer;
Step209: it is directed to i respectivelyq=1,2 ..., n, the cluster rendering control program controlled on node computer receive the
iqThe pixel color value of all pixels piecemeal C004 that a blocks of pixels drawing program drawn on node computer is sent and
Array C002;
Step210: the cluster rendering control program on control node computer is deposited according to the element in n queue QBlock
The value of the variable of the data structure PIXBLOCK type of storage and each queue QBlock are closed with the corresponding of node computer is drawn
System, the picture for all pixels piecemeal C004 that the blocks of pixels drawing program on all drafting node computers received is sent
Plain color value is spliced into the complete three-dimensional scenic picture of a width, and is shown on display;
Step211: it is directed to i respectivelyq=1,2 ..., n are performed the following operations on control node computer:
To cluster rendering control program receive from i-thqA each of array C002 for drawing node computer and sending
Element D001, is done as follows:
From i-thqA element and i-th for drawing the array C002 that node computer is sentqThe element of a queue QBlock is one by one
It is corresponding, i.e., i-thqA the 1st element for drawing the array C002 that node computer is sent corresponding i-thqThe 1st of a queue QBlock
A element, i-thqA the 2nd element for drawing the array C002 that node computer is sent corresponding i-thqThe 2nd of a queue QBlock
A element, and so on;BNo is enabled to indicate element D001 corresponding i-thqThe data structure of the element storage of a queue QBlock
The value of the number NO member variable of the blocks of pixels of the variable of PIXBLOCK type;The BNo element of array A001 is stored
The drafting time overhead COST member variable of blocks of pixels of variable of data structure PIXBLOCK type be assigned a value of element
The value of D001;
Step212: enabling the list B001 in the memory of control node computer is sky;Enable the memory of control node computer
In each queue QBlock be sky;If receiving stopping rendering order, Step213 is gone to step, is otherwise gone to step
Step202;
Step213: stop drawing.
The positive effect of the present invention is: the present invention is when drawing first frame picture, it is assumed that when the drafting of each blocks of pixels
Between expense it is identical, and accordingly for each draftings node computer distribution blocks of pixels;However since drawing the second frame picture, this
Invention uses the drafting time overhead of each blocks of pixels counted when drawing former frame picture as foundation to draw again to be each
Node computer processed distributes blocks of pixels;Since drawing the second frame picture, the present invention divides for each drafting node computer
When with blocks of pixels, it can guarantee that the total drafting time for all pixels piecemeal that each drafting node computer is got is approximately equal,
So that the distribution of the GPU parallel ray tracing cluster rendering task of approximate equalization is realized, to play each draw to the maximum extent
The calculating potentiality of node computer processed.
Detailed description of the invention
Fig. 1 is three-dimensional scenic picture pixel partitioning of matrix schematic diagram.
Specific embodiment
In order to which the feature and advantage of this method are more clearly understood, this method is made into one combined with specific embodiments below
The description of step.In the present embodiment, consider following virtual room three-dimensional scenic: putting 1 desk and 1 in a room chair,
The objects such as fruit, metal teapot, porcelain cup are put on desk, have a point light source to be aimed downwardly three dimensional field on the ceiling in room
Scape.One piece of Nvidia Quadro K2000 video card is installed on each drafting node computer.
The technical scheme of the present invention is realized as follows: a kind of drafting task dynamic of GPU parallel ray tracing cluster is equal
Weigh distribution method, it is characterised in that: GPU parallel ray tracing cluster to be used is drawn by 1 control node computer and n
Node computer is interconnected composition by the network switch, and wherein n is greater than 1 integer, each drafting node meter
Calculation machine software and hardware configuration having the same is equipped with GPU parallel computation unit.This method is first on control node computer
Piecemeal is carried out to three-dimensional scenic picture pixel matrix and each piecemeal is numbered.Fig. 1 is shown to three-dimensional scenic picture photo
Prime matrix carries out the schematic diagram of piecemeal, and in addition to last column or last line, other each piecemeals include identical pixel
Line number and identical pixel columns.This method according to previous frame pattern drafting time overhead come to draw next frame picture when it is each
The drawing node computer of the task carries out dynamic approximate equalization distribution, and as each drafting node computer distributes several pixels
Total drafting time overhead of piecemeal, the blocks of pixels for getting each drafting node computer is approximately equal.Needed for this method
Data structure and the specific implementation steps are as follows:
A kind of data structure PIXBLOCK is provided, for storing blocks of pixels information, data structure PIXBLOCK includes picture
The starting of the number NO of plain piecemeal, the starting line number RowS of blocks of pixels, the end line number RowE of blocks of pixels, blocks of pixels arrange
Number ColS, the end row number ColE of blocks of pixels, blocks of pixels drafting time overhead COST totally six member variables.
1) first part of this method realizes three-dimensional scenic picture pixel partitioning of matrix, and division parameter is stored in
On the control node computer of GPU parallel ray tracing cluster, the specific implementation steps are as follows:
Step101: operation cluster rendering controls program on control node computer, and it is defeated to control program by cluster rendering
Enter the number of lines of pixels M and pixel columns N of the three-dimensional scenic picture to be drawn;Program, which is controlled, by cluster rendering inputs blocks of pixels
Number of lines of pixels Bs;
Step102: following operation is executed in cluster rendering control program:
Step102-1: it calculatesM>Bs, N > Bs,Expression is rounded downwards x,
This means that the picture element matrix of three-dimensional scenic picture is divided into a blocks of pixels of M ' × N ', i.e. the 1st blocks of pixels, the 2nd picture
Plain piecemeal, and so on until a blocks of pixels of M ' × N ';From the point of view of the picture element matrix of entire three-dimensional scenic picture, m
The i-th of a blocks of pixels actually picture element matrix of corresponding three-dimensional scenic picturebIt goes to i-theRow, jthbArrange jtheThe picture of column
Element, wherein
Step102-2: the one-dimension array A001 comprising a element of M ' × N ' is created in memory, array A001's is every
A element stores the variable of a data structure PIXBLOCK type;By the incremental sequence of blocks of pixels number, i.e., opened from the 1st
Begin a up to M ' × N ', be done as follows one by one for each blocks of pixels A002:
The variables A 003 of a data structure PIXBLOCK type is created in memory;According to the number of blocks of pixels A002
M calculates ib、ie、jb、jeValue;The number NO member variable of the blocks of pixels of variables A 003 is assigned a value of m, variables A 003
The drafting time overhead COST member variable of blocks of pixels is assigned a value of 1, the starting line number RowS of the blocks of pixels of variables A 003
Member variable is assigned a value of ib, the end line number RowE member variable of the blocks of pixels of variables A 003 is assigned a value of ie, variables A 003
The starting row number ColS member variable of blocks of pixels be assigned a value of jb, the end row number ColE of the blocks of pixels of variables A 003 at
Member's variable assignments is je, m-th of element of array A001 is assigned a value of the value of variables A 003;M-th of element pair of array A001
Answer m-th of blocks of pixels;
2) second part of this method realizes the equalization task distribution of GPU parallel ray tracing cluster, and specific steps are such as
Under:
Step201: start blocks of pixels drawing program on all drafting node computers;It controls on node computer
Cluster rendering controls program and three-dimensional scene models is sent to the pixel run on each drafting node computer by network
Piecemeal drawing program, the blocks of pixels drawing program run on each drafting node computer is the three-dimensional scenic mould received
Type is stored in respective memory;Cluster rendering controls program and creates a list in the memory of control node computer
B001, the variable of structure PIXBLOCK type, enables list B001 for sky for storing data;Program is controlled by cluster rendering,
N queue QBlock, the element storing data structure of queue QBlock are created in the memory of control node computer
The variable of PIXBLOCK type, i-thqA queue QBlock distributes to i-th for storingqA pixel for drawing node computer point
Block message, iq=1,2 ..., n;Enable each queue QBlock for sky;
Step202: cluster rendering controls program and is inputted according to the current human-computer interaction of user virtual camera parameter is arranged
CamParam, cluster rendering control program draw journey to the blocks of pixels run on each drafting node computer by network
Sequence sends virtual camera parameter CamParam;The blocks of pixels drawing program that runs is according to connecing on each drafting node computer
Virtual camera used in the virtual camera parameter CamParam setting drawing three-dimensional scenic picture received;
Step203: the data structure that program stores all elements of array A001 is controlled by cluster rendering
The variable of PIXBLOCK type is added in list B001;The data structure PIXBLOCK type that the element of list B001 is stored
Variable blocks of pixels drafting time overhead COST member variable value as keyword, by sequence from big to small to column
The element of table B001 is ranked up;
Step204: the variable for the data structure PIXBLOCK type that the 1st element of list B001 stores is added to the
In 1 queue QBlock, the variable for the data structure PIXBLOCK type that the 2nd element of list B001 stores is added to the
In 2 queue QBlock, and so on, the change for the data structure PIXBLOCK type that the nth elements of list B001 are stored
Amount is added in n-th of queue QBlock;Counter Counter is enabled to be equal to n+1;Program is controlled by cluster rendering to tie in control
One one-dimension array ARRCOST comprising n element of creation in the memory of point computer;
Step205: the value for enabling all elements of array ARRCOST is all 0;It is directed to i respectivelyq=1,2 ..., n calculate i-thq
The drafting time of the blocks of pixels of the variable of the data structure PIXBLOCK type of all elements storage in a queue QBlock
The cumulative and SUMC of expense COST member variable, the i-th of array ARRCOSTqA element is assigned a value of cumulative and SUMC;Calculate number
Number IDA of the smallest element of value of group ARRCOST in array ARRCOST, deposits the Counter element of list B001
The variable of the data structure PIXBLOCK type of storage is added in DA queue QBlock of I;Enable Counter=Counter+1;
Step206: if Counter > M ' × N ', Step207 is gone to step, Step205 is otherwise gone to step;
Step207: it is directed to i respectivelyq=1,2 ..., n, cluster rendering control program by network i-thqA queue
QBlock is sent to i-thqA blocks of pixels drawing program drawn on node computer;
Step208: for iq=1,2 ..., n, i-thqA blocks of pixels drawing program drawn on node computer executes
It operates below:
1. calculating i-th receivedqThe element number Num that a queue QBlock includes, creating one in memory includes
Each element of array C002 is assigned a value of 0, the element of array C002 and i-th by the one-dimension array C002 of Num elementqIt is a
The element of queue QBlock corresponds, i.e. the 1st of array C002 the element corresponding i-thqThe 1st member of a queue QBlock
Element, the 2nd element corresponding i-th of array C002qThe 2nd element of a queue QBlock, and so on;
2. for i-th receivedqThe data structure PIXBLOCK type of each element storage of a queue QBlock
Variable C001, is done as follows respectively: being drawn out with ray tracking technology by the starting line number of the blocks of pixels of variable C001
RowS member variable, the end line number RowE member variable of blocks of pixels, blocks of pixels starting row number ColS member variable, as
The all pixels color value for the blocks of pixels C004 that the value of the end row number ColE member variable of plain piecemeal determines simultaneously records corresponding
Blocks of pixels draw time overhead C003, drafting time overhead C003 is assigned to variable C001 corresponding i-thqA queue
The element of the corresponding array C002 of the element of QBlock;
3. when to i-th receivedqThe data structure PIXBLOCK class for all elements storage that a queue QBlock includes
After the variable C001 of type has executed 2. corresponding operating that walks, blocks of pixels drawing program is all pixels piecemeal drawn out
The pixel color value and array C002 of C004 is sent to the cluster rendering control program on control node computer;
Step209: it is directed to i respectivelyq=1,2 ..., n, the cluster rendering control program controlled on node computer receive the
iqThe pixel color value of all pixels piecemeal C004 that a blocks of pixels drawing program drawn on node computer is sent and
Array C002;
Step210: the cluster rendering control program on control node computer is deposited according to the element in n queue QBlock
The value of the variable of the data structure PIXBLOCK type of storage and each queue QBlock are closed with the corresponding of node computer is drawn
System, the picture for all pixels piecemeal C004 that the blocks of pixels drawing program on all drafting node computers received is sent
Plain color value is spliced into the complete three-dimensional scenic picture of a width, and is shown on display;
Step211: it is directed to i respectivelyq=1,2 ..., n are performed the following operations on control node computer:
To cluster rendering control program receive from i-thqA each of array C002 for drawing node computer and sending
Element D001, is done as follows:
From i-thqA element and i-th for drawing the array C002 that node computer is sentqThe element of a queue QBlock is one by one
It is corresponding, i.e., i-thqA the 1st element for drawing the array C002 that node computer is sent corresponding i-thqThe 1st of a queue QBlock
A element, i-thqA the 2nd element for drawing the array C002 that node computer is sent corresponding i-thqThe 2nd of a queue QBlock
A element, and so on;BNo is enabled to indicate element D001 corresponding i-thqThe data structure of the element storage of a queue QBlock
The value of the number NO member variable of the blocks of pixels of the variable of PIXBLOCK type;The BNo element of array A001 is stored
The drafting time overhead COST member variable of blocks of pixels of variable of data structure PIXBLOCK type be assigned a value of element
The value of D001;
Step212: enabling the list B001 in the memory of control node computer is sky;Enable the memory of control node computer
In each queue QBlock be sky;If receiving stopping rendering order, Step213 is gone to step, is otherwise gone to step
Step202;
Step213: stop drawing.
In the present embodiment, M=1920, N=1080, Bs=100, n=4.