[go: up one dir, main page]

CN109472854A - A Dynamic Balanced Allocation Method of Drawing Tasks for GPU Parallel Ray Tracing Clusters - Google Patents

A Dynamic Balanced Allocation Method of Drawing Tasks for GPU Parallel Ray Tracing Clusters Download PDF

Info

Publication number
CN109472854A
CN109472854A CN201811371313.2A CN201811371313A CN109472854A CN 109472854 A CN109472854 A CN 109472854A CN 201811371313 A CN201811371313 A CN 201811371313A CN 109472854 A CN109472854 A CN 109472854A
Authority
CN
China
Prior art keywords
pixels
blocks
node computer
qblock
variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811371313.2A
Other languages
Chinese (zh)
Other versions
CN109472854B (en
Inventor
陈纯毅
杨华民
蒋振刚
曲福恒
李华
潘石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Science and Technology
Original Assignee
Changchun University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Science and Technology filed Critical Changchun University of Science and Technology
Priority to CN201811371313.2A priority Critical patent/CN109472854B/en
Publication of CN109472854A publication Critical patent/CN109472854A/en
Application granted granted Critical
Publication of CN109472854B publication Critical patent/CN109472854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/06Ray-tracing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Generation (AREA)
  • Image Processing (AREA)

Abstract

本发明公开一种GPU并行光线跟踪集群的绘制任务动态均衡分配方法。本方法在绘制第一帧画面时,假定各个像素分块的绘制时间开销相同,并据此为各个绘制结点计算机分配像素分块;然而从绘制第二帧画面开始,本发明用绘制前一帧画面时统计的各个像素分块的绘制时间开销作为依据来重新为各个绘制结点计算机分配像素分块;从绘制第二帧画面开始,本发明在为各个绘制结点计算机分配像素分块时,能保证各个绘制结点计算机分得的所有像素分块的总绘制时间近似相等,从而实现近似均衡的GPU并行光线跟踪集群绘制任务的分配,以便最大限度地发挥各个绘制结点计算机的计算潜力。

The invention discloses a dynamic balanced allocation method for drawing tasks of a GPU parallel ray tracing cluster. When the method draws the first frame of picture, it is assumed that the drawing time cost of each pixel block is the same, and the pixel blocks are allocated to each drawing node computer accordingly; The drawing time overhead of each pixel block calculated during the frame picture is used as a basis to re-allocate pixel blocks to each drawing node computer; starting from drawing the second frame picture, the present invention allocates pixel blocks to each drawing node computer. , which can ensure that the total drawing time of all pixel blocks assigned by each drawing node computer is approximately equal, so as to achieve approximately balanced GPU parallel ray tracing cluster drawing task allocation, so as to maximize the computing potential of each drawing node computer .

Description

The drafting task dynamic equalization distribution method of GPU parallel ray tracing cluster
Technical field
The invention belongs to virtual three-dimensional scene rendering technique fields, are related to a kind of drafting of GPU parallel ray tracing cluster Task dynamic equalization distribution method.
Background technique
Ray trace has been widely used among three-dimensional scenic drafting.It is drawn to reduce the ray trace of three-dimensional scenic Time overhead processed can use parallel computing to accelerate to ray trace.Currently, people generally use GPU It realizes parallelization ray trace, shortens ray trace significantly and calculate the time.If multiple GPU, which are calculated node, passes through network It connects and constitutes GPU computing cluster, then can use the speed for further improving ray trace drafting between node parallel. For needing the three-dimensional scenic of interaction to draw application, when accelerating ray trace to draw using GPU computing cluster, it is necessary to using good Good load distribution and balancing technique could sufficiently excavate the potential that each GPU calculates node, and part GPU is avoided to calculate node mistake In busy and part GPU calculating node spare time etc. problem.It is to the most direct mode that ray trace task is allocated, to screen Curtain pixel carries out piecemeal, calculates node for each GPU and distributes different piecemeals.Ray-tracing programs usually on GPU use The tall and handsome CUDA up to company writes.The parallel computational model of CUDA is actually data parallel mode.When multiple CUDA threads Spatially consecutive hours, parallel acceleration effect are best for the data of access.Specifically, for the ray trace on GPU, if more The light transmission path of a parallel C UDA thread tracking is very close, then when carrying out the intersection testing of light and geometric object, The scene space accelerating structure and geometric object data for tracking each thread accesses of different light can continuously, be held simultaneously The instruction sequence individual path of capable each thread is also more likely to identical.If multiple threads of the same Warp of CUDA are encountering Different selection branches can be executed when case statement, the degree of parallelism that will lead to the thread of the Warp reduces.Therefore, GPU light is executed When line tracing task, preferably makes while the corresponding pixel of a plurality of light of Parallel Tracking is closed on as far as possible, to guarantee these as far as possible Light is transmitted according to substantially common direction.In this sense, square piecemeal is pressed usually than by long to screen pixels Rectangular piecemeal is more preferable.For dynamic scene, usually there is good correlation, therefore can be between the continuous two frames picture in front and back The drafting time overhead of a later frame picture is estimated with the drafting time overhead statistical result of former frame picture.
Summary of the invention
The object of the present invention is to provide a kind of drafting task dynamic equalization distribution sides of GPU parallel ray tracing cluster Method is realized and balancedly distributes drafting task for each drafting node computer of GPU parallel ray tracing cluster.
The technical scheme of the present invention is realized as follows: a kind of drafting task dynamic of GPU parallel ray tracing cluster is equal Weigh distribution method, it is characterised in that: GPU parallel ray tracing cluster to be used is drawn by 1 control node computer and n Node computer is interconnected composition by the network switch, and wherein n is greater than 1 integer, each drafting node meter Calculation machine software and hardware configuration having the same is equipped with GPU parallel computation unit.This method is first on control node computer Piecemeal is carried out to three-dimensional scenic picture pixel matrix and each piecemeal is numbered.Fig. 1 is shown to three-dimensional scenic picture photo Prime matrix carries out the schematic diagram of piecemeal, and in addition to last column or last line, other each piecemeals include identical pixel Line number and identical pixel columns.This method according to previous frame pattern drafting time overhead come to draw next frame picture when it is each The drawing node computer of the task carries out dynamic approximate equalization distribution, and as each drafting node computer distributes several pixels Total drafting time overhead of piecemeal, the blocks of pixels for getting each drafting node computer is approximately equal.Needed for this method Data structure and the specific implementation steps are as follows:
A kind of data structure PIXBLOCK is provided, for storing blocks of pixels information, data structure PIXBLOCK includes picture The starting of the number NO of plain piecemeal, the starting line number RowS of blocks of pixels, the end line number RowE of blocks of pixels, blocks of pixels arrange Number ColS, the end row number ColE of blocks of pixels, blocks of pixels drafting time overhead COST totally six member variables.
1) first part of this method realizes three-dimensional scenic picture pixel partitioning of matrix, and division parameter is stored in On the control node computer of GPU parallel ray tracing cluster, the specific implementation steps are as follows:
Step101: operation cluster rendering controls program on control node computer, and it is defeated to control program by cluster rendering Enter the number of lines of pixels M and pixel columns N of the three-dimensional scenic picture to be drawn;Program, which is controlled, by cluster rendering inputs blocks of pixels Number of lines of pixels Bs
Step102: following operation is executed in cluster rendering control program:
Step102-1: it calculatesM>Bs, N > Bs,Expression is rounded downwards x, This means that the picture element matrix of three-dimensional scenic picture is divided into a blocks of pixels of M ' × N ', i.e. the 1st blocks of pixels, the 2nd picture Plain piecemeal, and so on until a blocks of pixels of M ' × N ';From the point of view of the picture element matrix of entire three-dimensional scenic picture, m The i-th of a blocks of pixels actually picture element matrix of corresponding three-dimensional scenic picturebIt goes to i-theRow, jthbArrange jtheThe picture of column Element, wherein
Step102-2: the one-dimension array A001 comprising a element of M ' × N ' is created in memory, array A001's is every A element stores the variable of a data structure PIXBLOCK type;By the incremental sequence of blocks of pixels number, i.e., opened from the 1st Begin a up to M ' × N ', be done as follows one by one for each blocks of pixels A002:
The variables A 003 of a data structure PIXBLOCK type is created in memory;According to the number of blocks of pixels A002 M calculates ib、ie、jb、jeValue;The number NO member variable of the blocks of pixels of variables A 003 is assigned a value of m, variables A 003 The drafting time overhead COST member variable of blocks of pixels is assigned a value of 1, the starting line number RowS of the blocks of pixels of variables A 003 Member variable is assigned a value of ib, the end line number RowE member variable of the blocks of pixels of variables A 003 is assigned a value of ie, variables A 003 The starting row number ColS member variable of blocks of pixels be assigned a value of jb, the end row number ColE of the blocks of pixels of variables A 003 at Member's variable assignments is je, m-th of element of array A001 is assigned a value of the value of variables A 003;M-th of element pair of array A001 Answer m-th of blocks of pixels;
2) second part of this method realizes the equalization task distribution of GPU parallel ray tracing cluster, and specific steps are such as Under:
Step201: start blocks of pixels drawing program on all drafting node computers;It controls on node computer Cluster rendering controls program and three-dimensional scene models is sent to the pixel run on each drafting node computer by network Piecemeal drawing program, the blocks of pixels drawing program run on each drafting node computer is the three-dimensional scenic mould received Type is stored in respective memory;Cluster rendering controls program and creates a list in the memory of control node computer B001, the variable of structure PIXBLOCK type, enables list B001 for sky for storing data;Program is controlled by cluster rendering, N queue QBlock, the element storing data structure of queue QBlock are created in the memory of control node computer The variable of PIXBLOCK type, i-thqA queue QBlock distributes to i-th for storingqA pixel for drawing node computer point Block message, iq=1,2 ..., n;Enable each queue QBlock for sky;
Step202: cluster rendering controls program and is inputted according to the current human-computer interaction of user virtual camera parameter is arranged CamParam, cluster rendering control program draw journey to the blocks of pixels run on each drafting node computer by network Sequence sends virtual camera parameter CamParam;The blocks of pixels drawing program that runs is according to connecing on each drafting node computer Virtual camera used in the virtual camera parameter CamParam setting drawing three-dimensional scenic picture received;
Step203: the data structure that program stores all elements of array A001 is controlled by cluster rendering The variable of PIXBLOCK type is added in list B001;The data structure PIXBLOCK type that the element of list B001 is stored Variable blocks of pixels drafting time overhead COST member variable value as keyword, by sequence from big to small to column The element of table B001 is ranked up;
Step204: the variable for the data structure PIXBLOCK type that the 1st element of list B001 stores is added to the In 1 queue QBlock, the variable for the data structure PIXBLOCK type that the 2nd element of list B001 stores is added to the In 2 queue QBlock, and so on, the change for the data structure PIXBLOCK type that the nth elements of list B001 are stored Amount is added in n-th of queue QBlock;Counter Counter is enabled to be equal to n+1;Program is controlled by cluster rendering to tie in control One one-dimension array ARRCOST comprising n element of creation in the memory of point computer;
Step205: the value for enabling all elements of array ARRCOST is all 0;It is directed to i respectivelyq=1,2 ..., n calculate i-thq The drafting time of the blocks of pixels of the variable of the data structure PIXBLOCK type of all elements storage in a queue QBlock The cumulative and SUMC of expense COST member variable, the i-th of array ARRCOSTqA element is assigned a value of cumulative and SUMC;Calculate number Number IDA of the smallest element of value of group ARRCOST in array ARRCOST, deposits the Counter element of list B001 The variable of the data structure PIXBLOCK type of storage is added in DA queue QBlock of I;Enable Counter=Counter+1;
Step206: if Counter > M ' × N ', Step207 is gone to step, Step205 is otherwise gone to step;
Step207: it is directed to i respectivelyq=1,2 ..., n, cluster rendering control program by network i-thqA queue QBlock is sent to i-thqA blocks of pixels drawing program drawn on node computer;
Step208: for iq=1,2 ..., n, i-thqA blocks of pixels drawing program drawn on node computer executes It operates below:
1. calculating i-th receivedqThe element number Num that a queue QBlock includes, creating one in memory includes Each element of array C002 is assigned a value of 0, the element of array C002 and i-th by the one-dimension array C002 of Num elementqIt is a The element of queue QBlock corresponds, i.e. the 1st of array C002 the element corresponding i-thqThe 1st member of a queue QBlock Element, the 2nd element corresponding i-th of array C002qThe 2nd element of a queue QBlock, and so on;
2. for i-th receivedqThe data structure PIXBLOCK type of each element storage of a queue QBlock Variable C001, is done as follows respectively: being drawn out with ray tracking technology by the starting line number of the blocks of pixels of variable C001 RowS member variable, the end line number RowE member variable of blocks of pixels, blocks of pixels starting row number ColS member variable, as The all pixels color value for the blocks of pixels C004 that the value of the end row number ColE member variable of plain piecemeal determines simultaneously records corresponding Blocks of pixels draw time overhead C003, drafting time overhead C003 is assigned to variable C001 corresponding i-thqA queue The element of the corresponding array C002 of the element of QBlock;
3. when to i-th receivedqThe data structure PIXBLOCK class for all elements storage that a queue QBlock includes After the variable C001 of type has executed 2. corresponding operating that walks, blocks of pixels drawing program is all pixels piecemeal drawn out The pixel color value and array C002 of C004 is sent to the cluster rendering control program on control node computer;
Step209: it is directed to i respectivelyq=1,2 ..., n, the cluster rendering control program controlled on node computer receive the iqThe pixel color value of all pixels piecemeal C004 that a blocks of pixels drawing program drawn on node computer is sent and Array C002;
Step210: the cluster rendering control program on control node computer is deposited according to the element in n queue QBlock The value of the variable of the data structure PIXBLOCK type of storage and each queue QBlock are closed with the corresponding of node computer is drawn System, the picture for all pixels piecemeal C004 that the blocks of pixels drawing program on all drafting node computers received is sent Plain color value is spliced into the complete three-dimensional scenic picture of a width, and is shown on display;
Step211: it is directed to i respectivelyq=1,2 ..., n are performed the following operations on control node computer:
To cluster rendering control program receive from i-thqA each of array C002 for drawing node computer and sending Element D001, is done as follows:
From i-thqA element and i-th for drawing the array C002 that node computer is sentqThe element of a queue QBlock is one by one It is corresponding, i.e., i-thqA the 1st element for drawing the array C002 that node computer is sent corresponding i-thqThe 1st of a queue QBlock A element, i-thqA the 2nd element for drawing the array C002 that node computer is sent corresponding i-thqThe 2nd of a queue QBlock A element, and so on;BNo is enabled to indicate element D001 corresponding i-thqThe data structure of the element storage of a queue QBlock The value of the number NO member variable of the blocks of pixels of the variable of PIXBLOCK type;The BNo element of array A001 is stored The drafting time overhead COST member variable of blocks of pixels of variable of data structure PIXBLOCK type be assigned a value of element The value of D001;
Step212: enabling the list B001 in the memory of control node computer is sky;Enable the memory of control node computer In each queue QBlock be sky;If receiving stopping rendering order, Step213 is gone to step, is otherwise gone to step Step202;
Step213: stop drawing.
The positive effect of the present invention is: the present invention is when drawing first frame picture, it is assumed that when the drafting of each blocks of pixels Between expense it is identical, and accordingly for each draftings node computer distribution blocks of pixels;However since drawing the second frame picture, this Invention uses the drafting time overhead of each blocks of pixels counted when drawing former frame picture as foundation to draw again to be each Node computer processed distributes blocks of pixels;Since drawing the second frame picture, the present invention divides for each drafting node computer When with blocks of pixels, it can guarantee that the total drafting time for all pixels piecemeal that each drafting node computer is got is approximately equal, So that the distribution of the GPU parallel ray tracing cluster rendering task of approximate equalization is realized, to play each draw to the maximum extent The calculating potentiality of node computer processed.
Detailed description of the invention
Fig. 1 is three-dimensional scenic picture pixel partitioning of matrix schematic diagram.
Specific embodiment
In order to which the feature and advantage of this method are more clearly understood, this method is made into one combined with specific embodiments below The description of step.In the present embodiment, consider following virtual room three-dimensional scenic: putting 1 desk and 1 in a room chair, The objects such as fruit, metal teapot, porcelain cup are put on desk, have a point light source to be aimed downwardly three dimensional field on the ceiling in room Scape.One piece of Nvidia Quadro K2000 video card is installed on each drafting node computer.
The technical scheme of the present invention is realized as follows: a kind of drafting task dynamic of GPU parallel ray tracing cluster is equal Weigh distribution method, it is characterised in that: GPU parallel ray tracing cluster to be used is drawn by 1 control node computer and n Node computer is interconnected composition by the network switch, and wherein n is greater than 1 integer, each drafting node meter Calculation machine software and hardware configuration having the same is equipped with GPU parallel computation unit.This method is first on control node computer Piecemeal is carried out to three-dimensional scenic picture pixel matrix and each piecemeal is numbered.Fig. 1 is shown to three-dimensional scenic picture photo Prime matrix carries out the schematic diagram of piecemeal, and in addition to last column or last line, other each piecemeals include identical pixel Line number and identical pixel columns.This method according to previous frame pattern drafting time overhead come to draw next frame picture when it is each The drawing node computer of the task carries out dynamic approximate equalization distribution, and as each drafting node computer distributes several pixels Total drafting time overhead of piecemeal, the blocks of pixels for getting each drafting node computer is approximately equal.Needed for this method Data structure and the specific implementation steps are as follows:
A kind of data structure PIXBLOCK is provided, for storing blocks of pixels information, data structure PIXBLOCK includes picture The starting of the number NO of plain piecemeal, the starting line number RowS of blocks of pixels, the end line number RowE of blocks of pixels, blocks of pixels arrange Number ColS, the end row number ColE of blocks of pixels, blocks of pixels drafting time overhead COST totally six member variables.
1) first part of this method realizes three-dimensional scenic picture pixel partitioning of matrix, and division parameter is stored in On the control node computer of GPU parallel ray tracing cluster, the specific implementation steps are as follows:
Step101: operation cluster rendering controls program on control node computer, and it is defeated to control program by cluster rendering Enter the number of lines of pixels M and pixel columns N of the three-dimensional scenic picture to be drawn;Program, which is controlled, by cluster rendering inputs blocks of pixels Number of lines of pixels Bs
Step102: following operation is executed in cluster rendering control program:
Step102-1: it calculatesM>Bs, N > Bs,Expression is rounded downwards x, This means that the picture element matrix of three-dimensional scenic picture is divided into a blocks of pixels of M ' × N ', i.e. the 1st blocks of pixels, the 2nd picture Plain piecemeal, and so on until a blocks of pixels of M ' × N ';From the point of view of the picture element matrix of entire three-dimensional scenic picture, m The i-th of a blocks of pixels actually picture element matrix of corresponding three-dimensional scenic picturebIt goes to i-theRow, jthbArrange jtheThe picture of column Element, wherein
Step102-2: the one-dimension array A001 comprising a element of M ' × N ' is created in memory, array A001's is every A element stores the variable of a data structure PIXBLOCK type;By the incremental sequence of blocks of pixels number, i.e., opened from the 1st Begin a up to M ' × N ', be done as follows one by one for each blocks of pixels A002:
The variables A 003 of a data structure PIXBLOCK type is created in memory;According to the number of blocks of pixels A002 M calculates ib、ie、jb、jeValue;The number NO member variable of the blocks of pixels of variables A 003 is assigned a value of m, variables A 003 The drafting time overhead COST member variable of blocks of pixels is assigned a value of 1, the starting line number RowS of the blocks of pixels of variables A 003 Member variable is assigned a value of ib, the end line number RowE member variable of the blocks of pixels of variables A 003 is assigned a value of ie, variables A 003 The starting row number ColS member variable of blocks of pixels be assigned a value of jb, the end row number ColE of the blocks of pixels of variables A 003 at Member's variable assignments is je, m-th of element of array A001 is assigned a value of the value of variables A 003;M-th of element pair of array A001 Answer m-th of blocks of pixels;
2) second part of this method realizes the equalization task distribution of GPU parallel ray tracing cluster, and specific steps are such as Under:
Step201: start blocks of pixels drawing program on all drafting node computers;It controls on node computer Cluster rendering controls program and three-dimensional scene models is sent to the pixel run on each drafting node computer by network Piecemeal drawing program, the blocks of pixels drawing program run on each drafting node computer is the three-dimensional scenic mould received Type is stored in respective memory;Cluster rendering controls program and creates a list in the memory of control node computer B001, the variable of structure PIXBLOCK type, enables list B001 for sky for storing data;Program is controlled by cluster rendering, N queue QBlock, the element storing data structure of queue QBlock are created in the memory of control node computer The variable of PIXBLOCK type, i-thqA queue QBlock distributes to i-th for storingqA pixel for drawing node computer point Block message, iq=1,2 ..., n;Enable each queue QBlock for sky;
Step202: cluster rendering controls program and is inputted according to the current human-computer interaction of user virtual camera parameter is arranged CamParam, cluster rendering control program draw journey to the blocks of pixels run on each drafting node computer by network Sequence sends virtual camera parameter CamParam;The blocks of pixels drawing program that runs is according to connecing on each drafting node computer Virtual camera used in the virtual camera parameter CamParam setting drawing three-dimensional scenic picture received;
Step203: the data structure that program stores all elements of array A001 is controlled by cluster rendering The variable of PIXBLOCK type is added in list B001;The data structure PIXBLOCK type that the element of list B001 is stored Variable blocks of pixels drafting time overhead COST member variable value as keyword, by sequence from big to small to column The element of table B001 is ranked up;
Step204: the variable for the data structure PIXBLOCK type that the 1st element of list B001 stores is added to the In 1 queue QBlock, the variable for the data structure PIXBLOCK type that the 2nd element of list B001 stores is added to the In 2 queue QBlock, and so on, the change for the data structure PIXBLOCK type that the nth elements of list B001 are stored Amount is added in n-th of queue QBlock;Counter Counter is enabled to be equal to n+1;Program is controlled by cluster rendering to tie in control One one-dimension array ARRCOST comprising n element of creation in the memory of point computer;
Step205: the value for enabling all elements of array ARRCOST is all 0;It is directed to i respectivelyq=1,2 ..., n calculate i-thq The drafting time of the blocks of pixels of the variable of the data structure PIXBLOCK type of all elements storage in a queue QBlock The cumulative and SUMC of expense COST member variable, the i-th of array ARRCOSTqA element is assigned a value of cumulative and SUMC;Calculate number Number IDA of the smallest element of value of group ARRCOST in array ARRCOST, deposits the Counter element of list B001 The variable of the data structure PIXBLOCK type of storage is added in DA queue QBlock of I;Enable Counter=Counter+1;
Step206: if Counter > M ' × N ', Step207 is gone to step, Step205 is otherwise gone to step;
Step207: it is directed to i respectivelyq=1,2 ..., n, cluster rendering control program by network i-thqA queue QBlock is sent to i-thqA blocks of pixels drawing program drawn on node computer;
Step208: for iq=1,2 ..., n, i-thqA blocks of pixels drawing program drawn on node computer executes It operates below:
1. calculating i-th receivedqThe element number Num that a queue QBlock includes, creating one in memory includes Each element of array C002 is assigned a value of 0, the element of array C002 and i-th by the one-dimension array C002 of Num elementqIt is a The element of queue QBlock corresponds, i.e. the 1st of array C002 the element corresponding i-thqThe 1st member of a queue QBlock Element, the 2nd element corresponding i-th of array C002qThe 2nd element of a queue QBlock, and so on;
2. for i-th receivedqThe data structure PIXBLOCK type of each element storage of a queue QBlock Variable C001, is done as follows respectively: being drawn out with ray tracking technology by the starting line number of the blocks of pixels of variable C001 RowS member variable, the end line number RowE member variable of blocks of pixels, blocks of pixels starting row number ColS member variable, as The all pixels color value for the blocks of pixels C004 that the value of the end row number ColE member variable of plain piecemeal determines simultaneously records corresponding Blocks of pixels draw time overhead C003, drafting time overhead C003 is assigned to variable C001 corresponding i-thqA queue The element of the corresponding array C002 of the element of QBlock;
3. when to i-th receivedqThe data structure PIXBLOCK class for all elements storage that a queue QBlock includes After the variable C001 of type has executed 2. corresponding operating that walks, blocks of pixels drawing program is all pixels piecemeal drawn out The pixel color value and array C002 of C004 is sent to the cluster rendering control program on control node computer;
Step209: it is directed to i respectivelyq=1,2 ..., n, the cluster rendering control program controlled on node computer receive the iqThe pixel color value of all pixels piecemeal C004 that a blocks of pixels drawing program drawn on node computer is sent and Array C002;
Step210: the cluster rendering control program on control node computer is deposited according to the element in n queue QBlock The value of the variable of the data structure PIXBLOCK type of storage and each queue QBlock are closed with the corresponding of node computer is drawn System, the picture for all pixels piecemeal C004 that the blocks of pixels drawing program on all drafting node computers received is sent Plain color value is spliced into the complete three-dimensional scenic picture of a width, and is shown on display;
Step211: it is directed to i respectivelyq=1,2 ..., n are performed the following operations on control node computer:
To cluster rendering control program receive from i-thqA each of array C002 for drawing node computer and sending Element D001, is done as follows:
From i-thqA element and i-th for drawing the array C002 that node computer is sentqThe element of a queue QBlock is one by one It is corresponding, i.e., i-thqA the 1st element for drawing the array C002 that node computer is sent corresponding i-thqThe 1st of a queue QBlock A element, i-thqA the 2nd element for drawing the array C002 that node computer is sent corresponding i-thqThe 2nd of a queue QBlock A element, and so on;BNo is enabled to indicate element D001 corresponding i-thqThe data structure of the element storage of a queue QBlock The value of the number NO member variable of the blocks of pixels of the variable of PIXBLOCK type;The BNo element of array A001 is stored The drafting time overhead COST member variable of blocks of pixels of variable of data structure PIXBLOCK type be assigned a value of element The value of D001;
Step212: enabling the list B001 in the memory of control node computer is sky;Enable the memory of control node computer In each queue QBlock be sky;If receiving stopping rendering order, Step213 is gone to step, is otherwise gone to step Step202;
Step213: stop drawing.
In the present embodiment, M=1920, N=1080, Bs=100, n=4.

Claims (1)

  1. The drafting task dynamic equalization distribution method of 1.GPU parallel ray tracing cluster, it is characterised in that: GPU to be used is simultaneously Row ray trace cluster is interconnected on by 1 control node computer and n drafting node computer by the network switch It constitutes together, wherein n is greater than 1 integer, and each drafting node computer software and hardware configuration having the same is equipped with GPU parallel computation unit;This method carries out piecemeal simultaneously to three-dimensional scenic picture pixel matrix on control node computer first Each piecemeal is numbered;This method is drawn according to previous frame pattern drafting time overhead to each when drawing next frame picture The task of node computer processed carries out dynamic approximate equalization distribution, and as each drafting node computer distributes several pixels point Total drafting time overhead of block, the blocks of pixels for getting each drafting node computer is approximately equal;Number needed for this method According to structure and the specific implementation steps are as follows:
    A kind of data structure PIXBLOCK is provided, for storing blocks of pixels information, data structure PIXBLOCK includes pixel point The number NO of block, the starting line number RowS of blocks of pixels, the end line number RowE of blocks of pixels, blocks of pixels starting row number ColS, the end row number ColE of blocks of pixels, blocks of pixels drafting time overhead COST totally six member variables;
    1) first part of this method realizes three-dimensional scenic picture pixel partitioning of matrix, and division parameter is stored in GPU simultaneously On the control node computer of row ray trace cluster, the specific implementation steps are as follows:
    Step101: operation cluster rendering controls program on control node computer, controls program input by cluster rendering and wants The number of lines of pixels M and pixel columns N of the three-dimensional scenic picture of drafting;The picture that program inputs blocks of pixels is controlled by cluster rendering Plain line number Bs
    Step102: following operation is executed in cluster rendering control program:
    Step102-1: it calculatesM>Bs, N > Bs,Expression is rounded downwards x, this meaning The picture element matrix of three-dimensional scenic picture is divided into a blocks of pixels of M ' × N ', i.e. the 1st blocks of pixels, the 2nd blocks of pixels, And so on until a blocks of pixels of M ' × N ';From the point of view of the picture element matrix of entire three-dimensional scenic picture, m-th of pixel point The i-th of the block actually picture element matrix of corresponding three-dimensional scenic picturebIt goes to i-theRow, jthbArrange jtheThe pixel of column, wherein
    Step102-2: the one-dimension array A001 comprising a element of M ' × N ', each member of array A001 are created in memory The variable of element one data structure PIXBLOCK type of storage;By the incremental sequence of blocks of pixels number, i.e., one since the 1st Until M ' × N ' is a, it is done as follows one by one for each blocks of pixels A002:
    The variables A 003 of a data structure PIXBLOCK type is created in memory;According to the number m of blocks of pixels A002, meter Calculate ib、ie、jb、jeValue;The number NO member variable of the blocks of pixels of variables A 003 is assigned a value of m, the pixel of variables A 003 The drafting time overhead COST member variable of piecemeal is assigned a value of 1, the starting line number RowS member of the blocks of pixels of variables A 003 Variable assignments is ib, the end line number RowE member variable of the blocks of pixels of variables A 003 is assigned a value of ie, the picture of variables A 003 The starting row number ColS member variable of plain piecemeal is assigned a value of jb, the end row number ColE member of the blocks of pixels of variables A 003 is become Amount is assigned a value of je, m-th of element of array A001 is assigned a value of the value of variables A 003;The corresponding m of m-th of element of array A001 A blocks of pixels;
    2) second part of this method realizes the equalization task distribution of GPU parallel ray tracing cluster, the specific steps are as follows:
    Step201: start blocks of pixels drawing program on all drafting node computers;Control the cluster on node computer It draws control program and three-dimensional scene models is sent to by network by the blocks of pixels run on each drafting node computer Drawing program, the blocks of pixels drawing program run on each drafting node computer protect the three-dimensional scene models received There are in respective memory;Cluster rendering controls program and creates a list B001 in the memory of control node computer, uses In the variable of storing data structure PIXBLOCK type, enable list B001 for sky;Program is controlled by cluster rendering, is tied in control N queue QBlock is created in the memory of point computer, the element storing data structure PIXBLOCK type of queue QBlock Variable, i-thqA queue QBlock distributes to i-th for storingqA blocks of pixels information for drawing node computer, iq=1, 2,…,n;Enable each queue QBlock for sky;
    Step202: cluster rendering controls program and is inputted according to the current human-computer interaction of user virtual camera parameter is arranged CamParam, cluster rendering control program draw journey to the blocks of pixels run on each drafting node computer by network Sequence sends virtual camera parameter CamParam;The blocks of pixels drawing program that runs is according to connecing on each drafting node computer Virtual camera used in the virtual camera parameter CamParam setting drawing three-dimensional scenic picture received;
    Step203: the data structure PIXBLOCK class that program stores all elements of array A001 is controlled by cluster rendering The variable of type is added in list B001;The picture of the variable for the data structure PIXBLOCK type that the element of list B001 is stored The value of the drafting time overhead COST member variable of plain piecemeal is as keyword, by sequence from big to small to the member of list B001 Element is ranked up;
    Step204: the variable for the data structure PIXBLOCK type that the 1st element of list B001 stores is added to the 1st In queue QBlock, the variable for the data structure PIXBLOCK type that the 2nd element of list B001 stores is added to the 2nd In queue QBlock, and so on, the variable for the data structure PIXBLOCK type that the nth elements of list B001 store is added Enter into n-th of queue QBlock;Counter Counter is enabled to be equal to n+1;Program is controlled in control node meter by cluster rendering The one-dimension array ARRCOST comprising n element is created in the memory of calculation machine;
    Step205: the value for enabling all elements of array ARRCOST is all 0;It is directed to i respectivelyq=1,2 ..., n calculate i-thqA team Arrange the drafting time overhead of the blocks of pixels of the variable of the data structure PIXBLOCK type of all elements storage in QBlock The cumulative and SUMC of COST member variable, the i-th of array ARRCOSTqA element is assigned a value of cumulative and SUMC;Calculate array Number IDA of the smallest element of the value of ARRCOST in array ARRCOST stores the Counter element of list B001 The variable of data structure PIXBLOCK type be added in DA queue QBlock of I;Enable Counter=Counter+1;
    Step206: if Counter > M ' × N ', Step207 is gone to step, Step205 is otherwise gone to step;
    Step207: it is directed to i respectivelyq=1,2 ..., n, cluster rendering control program by network i-thqA queue QBlock hair Give i-thqA blocks of pixels drawing program drawn on node computer;
    Step208: for iq=1,2 ..., n, i-thqA blocks of pixels drawing program drawn on node computer executes following Operation:
    1. calculating i-th receivedqThe element number Num that a queue QBlock includes creates one in memory and includes Num Each element of array C002 is assigned a value of 0, the element of array C002 and i-th by the one-dimension array C002 of elementqA queue The element of QBlock corresponds, i.e. the 1st of array C002 the element corresponding i-thqThe 1st element of a queue QBlock, number The 2nd element corresponding i-th of group C002qThe 2nd element of a queue QBlock, and so on;
    2. for i-th receivedqThe variable of the data structure PIXBLOCK type of each element storage of a queue QBlock C001 is done as follows respectively: with ray tracking technology draw out the starting line number RowS by the blocks of pixels of variable C001 at Member variable, the end line number RowE member variable of blocks of pixels, the starting row number ColS member variable of blocks of pixels, blocks of pixels End row number ColE member variable value determine blocks of pixels C004 all pixels color value and record corresponding pixel Piecemeal draws time overhead C003, and drafting time overhead C003 is assigned to variable C001 corresponding i-thqA queue QBlock's The element of the corresponding array C002 of element;
    3. when to i-th receivedqThe change of the data structure PIXBLOCK type for all elements storage that a queue QBlock includes After amount C001 has executed 2. corresponding operating that the walks, picture of the blocks of pixels drawing program all pixels piecemeal C004 drawn out Plain color value and array C002 are sent to the cluster rendering control program on control node computer;
    Step209: it is directed to i respectivelyq=1,2 ..., n, the cluster rendering control program controlled on node computer receive i-thqIt is a Draw the pixel color value and array of all pixels piecemeal C004 that the blocks of pixels drawing program on node computer is sent C002;
    Step210: the cluster rendering control program on control node computer is according to the element storage in n queue QBlock The value of the variable of data structure PIXBLOCK type and each queue QBlock and the corresponding relationship for drawing node computer, The pixel face for all pixels piecemeal C004 that the blocks of pixels drawing program on all drafting node computers received is sent Color value is spliced into the complete three-dimensional scenic picture of a width, and is shown on display;
    Step211: it is directed to i respectivelyq=1,2 ..., n are performed the following operations on control node computer:
    To cluster rendering control program receive from i-thqA each element for drawing the array C002 that node computer is sent D001 is done as follows:
    From i-thqA element and i-th for drawing the array C002 that node computer is sentqThe element one of a queue QBlock is a pair of It answers, i.e., i-thqA the 1st element for drawing the array C002 that node computer is sent corresponding i-thqThe 1st of a queue QBlock Element, i-thqA the 2nd element for drawing the array C002 that node computer is sent corresponding i-thqThe 2nd of a queue QBlock Element, and so on;BNo is enabled to indicate element D001 corresponding i-thqThe data structure of the element storage of a queue QBlock The value of the number NO member variable of the blocks of pixels of the variable of PIXBLOCK type;The BNo element of array A001 is stored The drafting time overhead COST member variable of blocks of pixels of variable of data structure PIXBLOCK type be assigned a value of element The value of D001;
    Step212: enabling the list B001 in the memory of control node computer is sky;In the memory for enabling control node computer Each queue QBlock is sky;If receiving stopping rendering order, Step213 is gone to step, Step202 is otherwise gone to step;
    Step213: stop drawing.
CN201811371313.2A 2018-11-20 2018-11-20 A Dynamic Balanced Allocation Method of Drawing Tasks for GPU Parallel Ray Tracing Clusters Active CN109472854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811371313.2A CN109472854B (en) 2018-11-20 2018-11-20 A Dynamic Balanced Allocation Method of Drawing Tasks for GPU Parallel Ray Tracing Clusters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811371313.2A CN109472854B (en) 2018-11-20 2018-11-20 A Dynamic Balanced Allocation Method of Drawing Tasks for GPU Parallel Ray Tracing Clusters

Publications (2)

Publication Number Publication Date
CN109472854A true CN109472854A (en) 2019-03-15
CN109472854B CN109472854B (en) 2022-10-21

Family

ID=65673972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811371313.2A Active CN109472854B (en) 2018-11-20 2018-11-20 A Dynamic Balanced Allocation Method of Drawing Tasks for GPU Parallel Ray Tracing Clusters

Country Status (1)

Country Link
CN (1) CN109472854B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6111584A (en) * 1995-12-18 2000-08-29 3Dlabs Inc. Ltd. Rendering system with mini-patch retrieval from local texture storage
US20080117217A1 (en) * 2003-11-19 2008-05-22 Reuven Bakalash Multi-mode parallel graphics rendering system employing real-time automatic scene profiling and mode control
US20080129747A1 (en) * 2003-11-19 2008-06-05 Reuven Bakalash Multi-mode parallel graphics rendering system employing real-time automatic scene profiling and mode control
CN104835193A (en) * 2015-05-13 2015-08-12 长春理工大学 Load balancing method of 3D scene GPU cluster rendering system based on ray tracing
CN105447905A (en) * 2015-11-17 2016-03-30 长春理工大学 Three dimensional scene approximation soft shadow light tracking based on visible smooth filtering
CN106776020A (en) * 2016-12-07 2017-05-31 长春理工大学 The computer cluster distribution route tracking method for drafting of large-scale three dimensional scene

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6111584A (en) * 1995-12-18 2000-08-29 3Dlabs Inc. Ltd. Rendering system with mini-patch retrieval from local texture storage
US20080117217A1 (en) * 2003-11-19 2008-05-22 Reuven Bakalash Multi-mode parallel graphics rendering system employing real-time automatic scene profiling and mode control
US20080129747A1 (en) * 2003-11-19 2008-06-05 Reuven Bakalash Multi-mode parallel graphics rendering system employing real-time automatic scene profiling and mode control
CN104835193A (en) * 2015-05-13 2015-08-12 长春理工大学 Load balancing method of 3D scene GPU cluster rendering system based on ray tracing
CN105447905A (en) * 2015-11-17 2016-03-30 长春理工大学 Three dimensional scene approximation soft shadow light tracking based on visible smooth filtering
CN106776020A (en) * 2016-12-07 2017-05-31 长春理工大学 The computer cluster distribution route tracking method for drafting of large-scale three dimensional scene

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李华等: "动态3D虚拟场景并行化光线跟踪加速结构设计", 《长春理工大学学报(自然科学版)》 *
蒋聪等: "基于手势的交互式三维场景并行光线跟踪绘制研究", 《长春理工大学学报(自然科学版)》 *

Also Published As

Publication number Publication date
CN109472854B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
US8089481B2 (en) Updating frame divisions based on ray tracing image processing system performance
US9043801B2 (en) Two-tiered dynamic load balancing using sets of distributed thread pools
CN106056529B (en) Method and equipment for training convolutional neural network for picture recognition
US7940266B2 (en) Dynamic reallocation of processing cores for balanced ray tracing graphics workload
US9378533B2 (en) Central processing unit, GPU simulation method thereof, and computing system including the same
CN109409513A (en) A kind of task processing method neural network based and relevant device
CN110009233B (en) Game theory-based task allocation method in crowd sensing
CN105988879A (en) Method and system for optimizing allocation of multi-tasking servers
CN108875956A (en) Primary tensor processor
US20200012929A1 (en) Instruction distribution in an array of neural network cores
US20230401789A1 (en) Methods and systems for unified rendering of light and sound content for a simulated 3d environment
CN106204713A (en) Static merging treatment method and apparatus
CN106776020B (en) Computer Cluster Distributed Path Tracing Rendering Method for Large 3D Scenes
CN114764841A (en) Use of built-in functions for shadow denoising in ray tracing applications
CN110942202A (en) Emergency drilling deduction method, computer storage medium and electronic equipment
CN108924534A (en) Methods of exhibiting, client, server and the storage medium of panoramic picture
US11614964B2 (en) Deep-learning-based image processing method and system
Zhang et al. Multi-gpu parallel pipeline rendering with splitting frame
CN109472854A (en) A Dynamic Balanced Allocation Method of Drawing Tasks for GPU Parallel Ray Tracing Clusters
do Nascimento et al. Gpu-based real-time procedural distribution of vegetation on large-scale virtual terrains
Marcus et al. A learning-based service for cost and performance management of cloud databases
CN110013669A (en) A kind of virtual reality is raced exchange method more
Ruetschle et al. Distributed ray tracing of large scenes using actors
Park et al. A fast hybrid time-synchronous/event approach to parallel discrete event simulation of queuing networks
CN104835193B (en) The load-balancing method of three-dimensional scenic GPU cluster drawing system based on ray trace

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant