Detailed Description
For a better understanding of the technical solutions of the present disclosure, exemplary embodiments of the present disclosure will be described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding, and they should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Embodiments of the disclosure and features of embodiments may be combined with each other without conflict. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Before describing embodiments of the present disclosure in detail, some related concepts are first explained for clarity.
The rendering object (RENDER TARGET) may refer to an object to be rendered, which may be part of a virtual scene to be rendered, or may also be a picture or video frame to be rendered, etc.
Primitives (PRIMITIVE) may refer to the basic geometry that makes up the graphic, such as points, lines, triangles, and the like. For example, for a graphic drawn by an application, it may be represented in a computer with a large number of basic geometric tiles.
Rendering pipeline (RENDERING PIPELINE) is a key concept in computer graphics, which is a stage in a Graphics Processing Unit (GPU) that is responsible for processing and converting graphics data for rendering. The main task of the rendering pipeline is to convert the input geometric primitives (e.g., points, lines, triangles, etc.) into pixels that are visible on the screen. The goal of the rendering pipeline is to process graphics and generate final images in an efficient manner. With parallel processing and specialized hardware support, the GPU is able to quickly perform these computations to achieve real-time graphics rendering.
In the related art, in order to optimize the overall performance of a graphics processing apparatus and reduce power consumption, pixel Shader (PS) calls are reduced, so that it is ensured that the GPU can still maintain high efficiency and low power consumption when rendering a high-resolution, complex scene, and there are various rendering processing modes.
In some embodiments, the related art rendering method includes immediate mode rendering (IMMEDIATE MODE RENDERING, IMR), and is characterized in that the graphics primitive is directly written into the global frame buffer after being processed, without blocking. IMR may employ a standard depth Test (Z-Test), also known as visibility Test (visibility Test), or an Early depth Test (Early-Z).
The Z-Test compares the depth values of the fragments at the same position to ensure that only the pixels at the forefront (namely visible) are finally displayed, so that the correct visibility sequence can be ensured, the blocked fragments can be filtered out, and unnecessary execution of a pixel shader is reduced. Early-Z advances the depth test from the pixel shader to the pixel shader and then before the pixel shader, thus comparing the depth of the fragments as Early as possible after rasterization and quickly eliminating pixels that are obviously invisible. The rejection is completed before formally entering the high-overhead PS stage, so that the subsequent fragment coloring burden is greatly reduced, and the unnecessary consumption of computing resources and energy is effectively reduced.
In some embodiments, other relevant techniques of depth testing are Coarse-grained depth testing, which typically calculates depth ranges (min/max) for larger regions (e.g., blocks of multiple pixels), rather than pixel-by-pixel fine processing, which is used to quickly cull significantly occluded regions, reduce subsequent computation, hierarchical depth testing, which quickly determines the visibility of large regions through high-level depth ranges, and then refines to pixel level layer-by-layer, which is an efficient occlusion culling technique, and the like, with depth buffer data organized in a golden-tower hierarchy (e.g., multi-level depth map from full resolution to low resolution).
FIG. 1 is a schematic diagram of a rendering architecture of the related art, which is a schematic diagram of an IMR rendering pipeline. As shown in FIG. 1, the rendering pipeline may include vertex processing (vertex Processing) 11, clipping, projection, and culling (clip, project/cull) 12, rasterization (raster) 13, early depth testing (also known as early visibility testing, early visibility test) 14, texture and shading (texture and shade) 15, alpha testing (ALPHA TEST) 16, late depth testing (also known as late visibility testing, late visibility test) 17, alpha blending (alpha blend) 18.
The graphics processing device may read geometry data from a system memory (system memory), obtain primitives through vertex processing 11 and clipping, projection and rejection 12, obtain corresponding primitives through rasterization 13, reject occluded primitives through depth comparison of early depth test 14, perform texture and shadow processing based on texture data read in the system memory in texture and shadow 15 stage through the primitives of early depth test 14, then perform alpha test 16, perform post depth test 17 based on depth buffer (depth buffer) through the primitives of alpha test 16, and perform alpha mixing 18 with data in frame buffer (frame buffer) after passing to obtain rendering results, such as pixel data to be output.
It should be understood that the above is merely illustrative of IMR rendering pipelines, and that one skilled in the art may set the specific structure of various rendering pipelines according to actual situations, and the present disclosure is not limited thereto.
In some embodiments, the related art Rendering process also includes tile-Based Rendering (Tiled-Based Rendering, TBR). The processing mode divides the rendering object into a plurality of tiles, and depth test and other preliminary processing are performed inside each Tile by using a local cache. At the time of rendering, the direct rendering target is not the current frame buffer (frame buffer) and depth buffer (depth buffer) any more, but a cache called tile buffer (tile buffer), i.e., on-chip buffer (on-chip buffer). Therefore, the read-write operation of the frame buffer and the depth buffer in the IMR is changed into the read-write operation of the high-speed memory in the GPU. By utilizing the locality advantages, global memory access is reduced, bandwidth consumption and power consumption are reduced, and meanwhile, over-painting phenomenon is effectively reduced.
In some embodiments, the related art rendering process further includes Tile-Based delayed rendering (Tile-Based DEFERRED RENDERING, TBDR). TBDR further delay the patch coloring process based on TBR, and eliminate hidden surface (Hidden Surface Removal, HSR) by the characteristics of hardware level, thereby solving the problem of overdrawing (Overdraw). The basic idea is to collect geometrical data and depth information in each tile first, to perform delayed depth Test and cull centrally, to perform expensive PS computation only on the actual visible and valid primitives, and to do not immediately draw pixels that pass Early depth Test (Early Z-Test), but to only mark the primitives of these pixels, and to only draw pixels that eventually pass HSR. In this way, the system memory bandwidth requirements can be significantly reduced, thereby minimizing inefficient computation and power consumption.
Fig. 2 is a schematic diagram of a rendering architecture of the related art, which is a schematic diagram of a TBDR rendering pipeline. As shown in FIG. 2, the rendering pipeline may include vertex processing (vertex Processing) 21, clipping, projection, and culling (clip, project/cull) 22, tiling (tiling) 23, rasterizing (raster) 24, early depth testing (early visibility test) 25, texture and shading (texture and shade) 26, alpha testing (ALPHA TEST) 27, late depth testing (late visibility test) 28, alpha blending (alpha blend) 29.
The graphics processing device may read geometry Data from a system memory (system memory), obtain primitives of the rendering object through Vertex processing 21 and clipping, projection and rejection 22, complete the segmentation of the rendering object in a stage of tiling 23, record graphics Data covering each tile, including primitive list (PRIMITIVE LIST), vertex Data (Vertex Data), and write into the system memory. For a tile, the rendering pipeline loads all primitives (such as triangle primitives) contained in the tile into a primitive stage for processing, all primitives covered on the tile can be directly read from a corresponding primitive list, and when all primitives of the tile are processed, the next tile can be processed continuously.
The primitives of each tile are rasterized 24 to obtain corresponding primitives, and the primitives are subjected to depth comparison by an early depth test 25 to reject the masked primitives. The primitives passing early depth test 25 are textured and shaded based on texture data in system memory at texture and shading stage 26, followed by alpha test 27. The primitives passing the alpha test 27 are subjected to a post-depth test 28 based on an on-chip depth buffer (on-CHIP DEPTH buffer), and alpha blending 29 is performed on the primitives after passing the alpha test with data in the on-chip color buffer (on-chip buffer) to obtain rendering results, such as pixel data to be output. The on-chip depth cache performs data interaction with the depth cache in the system memory, and the on-chip color cache performs data interaction with the frame cache in the system memory.
It should be understood that the above is merely illustrative of TBDR rendering pipelines, and that one skilled in the art may set the specific structure of various rendering pipelines according to the actual situation, and the present disclosure is not limited thereto.
In general, IMR, TBR, TBDR and other technologies are optimized around reducing unnecessary fragment coloring (namely PS call), unnecessary fragments are filtered out in advance at different stages, so that the rendering efficiency can be improved, the whole energy consumption can be reduced, and the real-time rendering method is suitable for real-time rendering requirements under high-resolution and complex scenes.
However, the related art, although having a certain effect in improving performance, has the following disadvantages:
Redundancy of full-range depth comparison-traditional depth testing only compares the depth values of the current segment with the corresponding position in the frame buffer, and cannot limit the depth range according to scene requirements. For example, when objects need to be rendered only for a certain depth interval (e.g., near the water surface), conventional approaches may process all fragments even if they are out of target range, resulting in unnecessary fragment shader computations and bandwidth wastage.
Multi-stage rendering overhead of complex scenes-some effects (e.g. partial shadows, refraction) require multiple renderings of different depth layers of the same region. The developer needs to implement through multiple draw calls or complex hierarchical rendering logic, which increases the overhead of CPU-GPU communication and state switching.
The processing of transparent and blending effects is inefficient in that rendering of transparent objects typically relies on depth write disabling and sequencing, but conventional depth testing does not flexibly control blending regions, possibly resulting in excessive rendering or sequencing errors.
The hidden surface eliminates the limitation of HSR, namely the HSR technology depends on the accuracy of shielding information, and in a dynamic or complex scene, all shielded fragments can not be effectively removed, so that invalid PS calls are caused, and the performance is reduced.
According to the rendering processing method, depth range test (Depth Bounds Test, DBT) and depth test (Z-test) can be combined, a user layer is allowed to configure DBT in drawing call (draw call), configuration information of the depth range test of the current drawing call is obtained at a driving layer, DBT driving information of the current drawing call is determined according to DBT configuration and depth test configuration of the previous drawing call and the depth test configuration of the current drawing call, and further corresponding instruction information is generated and issued, so that graphics processing equipment executes the depth range test, invalid fragments are further removed on the basis of the depth test, unnecessary pixel shader execution is reduced, rendering efficiency is improved, and overall power consumption is reduced.
In addition, according to the rendering processing method of the embodiment of the disclosure, the depth range test can be divided into an Early-stage DBT (Early-DBT) and a Late-stage DBT (Late-DBT), the Early-stage DBT is cooperated with the Early-stage depth test (Early-Z), the Late-stage DBT is cooperated with the Late-stage depth test (Late-Z), unnecessary calculation of the Early-stage depth test and the Late-stage depth test can be reduced, performance can be optimized respectively, and accuracy of a DBT result can be guaranteed.
The rendering processing method according to the embodiment of the present disclosure may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a vehicle-mounted device, a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor invoking computer readable program instructions stored in a memory. Or the method may be performed by a server.
Fig. 3 is a flowchart of a rendering processing method according to an embodiment of the present disclosure. The method may be performed by a processing device in the terminal device, such as a central processing unit (Central Processing Unit, CPU). Referring to fig. 3, the method includes:
In step S31, first configuration information of a depth range test DBT of a current first draw call of a rendering object is obtained, where the first configuration information includes a switch state of the depth range test and a first depth interval.
In step S32, when the switch state of the depth range test is on, determining the first driving information of the depth range test of the first draw call according to the first configuration information, the second configuration information of the depth test of the first draw call, and the second driving information of the depth range test of the second draw call.
The depth range test comprises an early DBT and a later DBT, and the first driving information comprises a first enabling mark of the early DBT, a second enabling mark of the later DBT and a second depth interval of the depth range test.
In step S33, instruction information is generated from the first driving information and sent to a graphics processing apparatus, so that the graphics processing apparatus executes a depth range test in the first draw call based on the instruction information.
For example, depth range test DBT techniques allow only primitives with depth values within a specified depth range to be processed during rendering, eliminating primitives that do not require computation. The DBT may be set to execute after the shear Test (Scissor-Test) and before the depth Test (Z-Test). Unlike depth testing, DBT does not rely on interpolation or the depth value output by the fragment shader, but rather makes a determination based on the depth value stored in the current depth cache. The DBT can be further fine removed in the depth direction (Z direction), and the DBT is combined with the depth test, so that the generation of useless fragments and the call of a shader can be reduced to the greatest extent under the scene such as a template shadow body, volume light or delay rendering, and the efficiency and the application performance of a rendering pipeline are improved.
In some possible implementations, the corresponding interface may be configured at the user layer of the processing device, allowing the user to configure parameters for depth range testing in each Draw Call (Draw Call). The interface may be a standardized interface provided by a graphics API (Application Programming Interface ) such as Direct3D (a bottom graphics API specifically optimized for a Windows platform), vulkan (an open source graphics API for a cross-platform), openGL (a graphics API standard for a cross-platform), and the disclosure does not limit a specific type of interface.
In some possible implementations, the corresponding DBT module may be provided at a hardware layer of a graphics processing device (e.g., GPU) to implement the DBT function. Wherein the depth range test includes an Early-DBT (Early-DBT) and a Late-DBT (Late-DBT), the Early DBT being disposed before the Early depth test Early-Z, the Late-DBT being disposed before the Late depth test Late-Z, so as to remove invalid primitives in advance.
In some possible implementations, the parameters of the depth range test may include a switch state of the depth range test and a depth interval. Wherein the switch state includes on or off, the user sets the switch state to off if he wishes to disable the DBT function, sets the switch state to on if he wishes to enable the DBT function, and the depth interval may include a minimum depth value and a maximum depth value (zmin, zmax) of the floating point type FLOAT for defining a depth range tested by the DBT. The maximum range of the depth interval may be [0.0f, 1.0f ], or may be set smaller than the maximum range, for example [0.3f, 0.7f ]. The parameters of the depth range test may also include other parameters, which the present disclosure is not limited to.
In some possible implementations, for a current first draw call of the rendering object, the driver layer of the processing device may obtain first configuration information of the first draw call in step S31, including a switch state of the depth range test and a first depth interval.
In some possible implementations, if the switch state is on, the first driving information of the depth range test of the first draw call may be determined in step S32 according to the first configuration information, the second configuration information of the depth test of the first draw call, and the second driving information of the depth range test of the second draw call. The first driving information comprises a first enabling mark of an early DBT, a second enabling mark of the later DBT and a second depth interval of the depth range test.
In some possible implementations, the depth test includes an early depth test and a late depth test, and the second configuration information of the depth test includes an enable flag of the early depth test and an enable flag of the late depth test, and so on. The second driving information of the depth range test of the second draw call may include enable flags of the early DBT and the late DBT, a third depth interval of the depth range test of the second draw call, and the like.
In some possible implementations, in step S32, a data checksum conversion may be performed on the first depth interval. The data verification may include checking whether a minimum depth value zmin and a maximum depth value zmax of the first depth interval are NAN (non-numerical values), converting NAN to 0.0f if NAN exists, and checking a range of zmin and zmax, performing a Clamp operation to [0.0f, 1.0f ], i.e., limiting a numerical value exceeding the range of [0.0f, 1.0f ] within the range of [0.0f, 1.0f ], and further checking whether zmin < = zmax is satisfied, if satisfied, passing, if not, transmitting warning information (warning) to the runtime environment (runtime). And, still can continue the follow-up step, the hardware still can handle zmin < = zmax, can judge as failing.
In some possible implementations, the adjusted zmin and zmax may be mapped to corresponding integer or floating point number values according to a data format of a depth buffer (z-buffer) to obtain the second depth interval. The data format of the depth cache is, for example, d16_ UNORM (16 bit integer), d24_ UNORM (24 bit integer), d32_flow (32 bit floating point number), etc., where the interval of floating point numbers [0.0f, 1.0f ] is converted to d16_ UNORM, which can be expressed as [0, 65535]. The present disclosure does not limit the specific data format of the depth cache.
In some possible implementations, a first enable flag (Early-DBT enabled flag) of the Early DBT and a second enable flag (Late-DBT enabled flag) of the Late DBT may be determined based on the first configuration information, the second configuration information of the depth test of the first draw call, and the second driving information of the depth range test of the second draw call. That is, the Early-DBT and/or Late-DBT can be dynamically decided to be started according to the configuration states of the current drawing call and the preamble drawing call, so that the DBT function meets the data dependence and limitation, the possible processing errors are avoided, and the corresponding enabling mark is further configured to generate driving information.
In some possible implementations, the driving layer may perform data packing and transmission in step S33, and generate instruction information according to the first driving information, that is, encode the first enable flag, the second enable flag, and the second depth interval into a hardware instruction packet. The instruction information may also include an identification of the data format of the depth cache for the hardware module of the graphics processing device to adapt to the compare logic, etc., which is not limiting in this disclosure.
In some possible implementations, the generated instruction information is sent to the graphics processing device, and after the graphics processing device receives the instruction information, the graphics processing device may perform a depth range test in the first draw call based on the instruction information, thereby implementing the DBT function.
According to the embodiment of the invention, the depth range test and the depth test can be combined to obtain the configuration information of the depth range test of the current draw call, the DBT driving information of the current draw call is determined according to the DBT configuration and the depth test configuration of the preamble draw call and the depth test configuration of the current draw call, and the corresponding instruction information is further generated and issued, so that the graphics processing equipment executes the depth range test, thereby further removing invalid fragments on the basis of the depth test, reducing unnecessary pixel shader execution, improving the rendering efficiency and reducing the overall power consumption.
A rendering processing method according to an embodiment of the present disclosure is explained below.
In some possible implementations, the rendering of the rendering object includes a tile-based delayed rendering TBDR, the rendering object being divided into a plurality of tiles. That is, the rendering processing method according to the embodiment of the present disclosure may be performed under TBDR architecture, a depth range Test DBT module is newly added before a depth Test Z-Test module of a graphic processing apparatus, and the DBT is rapidly performed directly using a depth buffer (Z-buffer).
Fig. 4 is a schematic diagram of a rendering architecture according to an embodiment of the disclosure. As shown in FIG. 4, the disclosed embodiments improve upon the TBDR architecture in that the rendering pipeline 40 may include vertex processing (vertex Processing) 401, clipping, projection, and culling (clip, project/cull) 402, tiling (tiling) 403, rasterizing (raster) 404, early DBT (Early-DBT) 405, early depth test (Early visibility test) 406, texture and shading (texture and shade) 407, alpha test (ALPHA TEST) 408, late DBT (Late-DBT) 409, late depth test (Late visibility test) 410, alpha blend (alpha blend) 411.
Wherein the early DBT 405 and the late DBT 409 may multiplex DBT modules of the graphics processing device, and the early depth Test 406 and the late depth Test 410 may multiplex Z-Test modules of the graphics processing device.
In the processing process, the graphics processing apparatus may read geometry Data from a system memory (system memory), obtain primitives of the rendering object through Vertex processing 401 and clipping, projection and culling 402, complete the segmentation of the rendering object in a stage of tiling 403, record graphics Data covering each tile, including primitive list (PRIMITIVE LIST), vertex Data (Vertex Data), and write into the system memory. For a tile, the rendering pipeline loads all primitives (such as triangle primitives) contained in the tile into a primitive stage for processing, all primitives covered on the tile can be directly read from a corresponding primitive list, and when all primitives of the tile are processed, the next tile can be processed continuously.
The primitives of each tile are rasterized 404 to obtain corresponding primitives, and early DBT 405 removes primitives outside the depth interval range based on-chip depth buffer (on-CHIP DEPTH buffer), and then enters early depth test 406 to perform depth comparison and reject the blocked primitives. Wherein the early depth test 406 may be preceded by a hidden surface elimination HSR.
The primitives that passed early depth test 406 are textured and shaded based on texture data in system memory at texture and shading 407 stage, followed by alpha test 408. The primitives passing the α test 27 sequentially perform a post-DBT 409 and a post-depth test 410 based on an on-chip depth buffer (on-CHIP DEPTH buffer), and α -mix 411 the primitives after passing the α -test with data in the on-chip color buffer (on-chip buffer) to obtain a rendering result, for example, pixel data to be output. The on-chip depth cache performs data interaction with the depth cache in the system memory, and the on-chip color cache performs data interaction with the frame cache in the system memory.
In some possible implementations, corresponding to a process in a graphics processing device in a rendering architecture, a user layer in the processing device (e.g., CPU) configures a corresponding interface, allowing the user to configure parameters of a depth range test, including a switch state of the depth range test and a depth interval, in each Draw Call (Draw Call).
Wherein the switch state includes on or off, the user sets the switch state to off if he wishes to disable the DBT function, sets the switch state to on if he wishes to enable the DBT function, and the depth interval may include a minimum depth value and a maximum depth value (zmin, zmax) of the floating point type FLOAT for defining a depth range tested by the DBT.
In some possible implementations, for a current first draw call of the rendering object, the driving layer of the processing device may acquire first configuration information of the first draw call in step S31, including a switch state of the depth range test and a first depth interval, and further determine first driving information of the depth range test DBT of the first draw call. The first driving information includes a first enable flag of an early DBT, a second enable flag of the late DBT, and a second depth interval of a depth range test.
In some possible implementations, if the switch state is off, the first enable flag and the second enable flag may be set directly to be inactive, the second depth interval may be set to [0.0f, 0.0f ] or null, and instruction information generated according to the first driving information may be used to instruct the graphics processing apparatus not to perform DBT in the first draw call.
In some possible implementations, if the switch state is on, the first driving information of the depth range test of the first draw call may be determined in step S32 according to the first configuration information, the second configuration information of the depth test of the first draw call, and the second driving information of the depth range test of the second draw call. Wherein the second draw call is a preceding draw call of the first draw call, e.g., a preceding draw call of the same tile. The depth test includes an early depth test and a late depth test, and the second configuration information of the depth test includes an enable flag of the early depth test and an enable flag of the late depth test, etc. The second driving information of the depth range test of the second draw call may include a third enable flag of the early DBT and a fourth enable flag of the late DBT, a third depth interval of the depth range test of the second draw call, and the like. The third depth interval is a depth interval actually used in the DBT of the second draw call after data verification and format conversion.
That is, the driving information of the DBT in the current draw call can be determined together by the driving information of the DBT in the preamble draw call and the configuration information of the depth test in the current draw call, so that the DBT function satisfies the data dependency and restriction, and possible processing errors are avoided.
In some possible implementations, the step of determining the first driving information of the depth range test of the first draw call in the step S32 may include performing data verification and format conversion on the first depth interval to obtain the second depth interval, determining the first enabling flag and the second enabling flag according to the second configuration information when the third enabling flag and the fourth enabling flag are both invalid, and determining the first enabling flag and the second enabling flag according to the difference between the second depth interval and the third depth interval, the third enabling flag, the fourth enabling flag and the second configuration information when at least one of the third enabling flag and the fourth enabling flag is valid.
For example, the first depth interval set by the user may have an error, and the data format of the depth buffer of the graphics processing apparatus may be different from the data format of the first depth interval, so that the data checksum conversion needs to be performed on the first depth interval.
In some possible implementation manners, the step of performing data verification and format conversion on the first depth interval may include performing data verification on a minimum depth value and a maximum depth value of the first depth interval, determining the minimum depth value and the maximum depth value after verification, performing numerical conversion on the minimum depth value and the maximum depth value after verification according to a data format of a depth cache of the graphics processing device, and determining the second depth interval.
For example, the data verification may include checking whether a minimum depth value zmin and a maximum depth value zmax of the first depth interval are NAN (non-numerical values), converting NAN to 0.0f if NAN exists, and checking the range of zmin and zmax, performing Clamp to [0.0f, 1.0f ] operation, i.e., limiting the numerical value exceeding the range of [0.0f, 1.0f ] to within the range of [0.0f, 1.0f ], and further checking whether zmin < = zmax is satisfied, if so, passing, and if not, sending alert information (warning) to the runtime environment (runtime). And, the subsequent steps can still be continued, the hardware still processes zmin < = zmax, but it is determined as failed.
In some possible implementations, the adjusted zmin and zmax may be mapped to corresponding integer or floating point number values according to a data format of a depth buffer (z-buffer) to obtain the second depth interval. The data format of the depth cache is, for example, d16_ UNORM (16 bit integer), d24_ UNORM (24 bit integer), d32_flow (32 bit floating point number), etc., where the interval of floating point numbers [0.0f, 1.0f ] is converted to d16_ UNORM, which can be expressed as [0, 65535]. The present disclosure does not limit the specific data format of the depth cache.
By the method, unreasonable depth values can be corrected, and the depth values of the depth interval are matched with the depth values stored in the depth cache of the graphics processing equipment so as to facilitate subsequent comparison, thereby improving the accuracy of the depth range test.
In some possible implementations, if the third enable flag and the fourth enable flag are both inactive, that is, the DBT function in the preamble draw call is turned off, whether the early DBT and the late DBT of the current draw call are turned on is directly determined through the second configuration information of the depth test of the current draw call, and then the first enable flag and the second enable flag are set to be active or inactive.
If at least one of the third enabling flag and the fourth enabling flag is valid, that is, the DBT function in the preamble draw call is started, whether the early DBT and the later DBT of the current draw call are started or not can be comprehensively determined according to the difference between the second depth interval and the third depth interval, the third enabling flag and the fourth enabling flag of the DBT in the preamble draw call and the second configuration information of the depth test of the current draw call, and then the first enabling flag and the second enabling flag are set to be valid or invalid.
By the method, DBT function configuration of current draw call can be realized on the basis of meeting data dependence and limitation, so that accuracy of a processing result is improved, and rendering efficiency is improved.
Wherein, for Early depth test Early-Z and Late depth test Late-Z, the switch enabling scenarios can be divided into three categories:
1) The method is characterized in that Early-Z is opened, late-Z is closed, and Z-buffer is updated at Early-Z, wherein the use scenes comprise standard opaque rendering scenes, standard semitransparent rendering (Alpha Blend disabling) scenes and the like.
2) Closing Early-Z, opening Late-Z, updating Z-buffer at Late-Z, wherein the use scene comprises a scene with a modified depth value, and if the depth value is changed in the fragment shader, the Early-Z cannot predict the final depth, so that the fragment shader fails.
3) Opening Early-Z (discarding part of occlusion cells in advance to improve performance), opening Late-Z (discarding all occlusion cells to ensure correct results), updating Z-buffer at Late-Z, wherein the usage scene comprises a scene of discarding (discard)/clipping (clip) operation, explicit discarding of the cells in the cell shader can make Early-Z unable to judge whether the cells should be processed in advance, and the processing of Late-Z is needed, and starting alpha test or alpha mixed scene, which is needed to determine the visibility of the cells after the cell shader is executed, and the processing of Late-Z is needed again.
It should be understood that the switch states of Early-Z and Late-Z can be set by those skilled in the art according to the actual situation, and the present disclosure is not limited thereto.
In some possible implementations, either Early-Z or Late-Z, processing is strictly followed by the commit order or the order of primitives in the same draw call. Thus, when a plurality of primitives (such as primitives p0, p1 and p 2) are overlapped in the screen space, the subsequent primitives are always tested based on the Z-buffer writing result of the preceding primitives, so that the correct shielding relation can be obtained in both Early-Z and Late-Z stages. The Early-Z and Late-Z switch enable of the preceding and following fragments has certain dependence and limitation, and can be specifically expressed as 1. Early-Z and Late-Z order-preserving execution of each fragment in p0, p1 and p 2.2. Early-Z order preservation of three chips p0, p1 and p2 is performed. 3. And executing the Late-Z order preservation of three fragments of p0, p1 and p 2.
4. The order of Early-Z read operations of the subsequent primitives and the Late-Z write operations of the preceding primitives is not determined. For example, if the Early-Z read operation of tile p2 precedes the Late-Z write operation of tile p1, it may occur that (1) if p2 opens Early-Z, closes Late-Z, then there is no guarantee that the depth value Z of the p2 read and p1 write is consistent, then the Early-Z read operation of p2 must wait for the Late-Z write of p1 to complete.
(2) If p2 opens Early-Z, opens Late-Z, p1 and p2 have different depth test comparison patterns, then the Early-Z read operation of p2 must wait for the Late-Z write of p1 to complete, otherwise p2 may be erroneously discarded at Early-Z.
(3) If p2 is on Early-Z, and on Late-Z, the depth test comparison patterns of p1 and p2 are the same, then the situation may occur that ① if p 2's Early-Z passes, then p 2's Late-Z read operation must be performed after p 1's Late-Z write, and the final result is correct.
② If Early-Z of p2 fails, p2 is discarded at Early-Z, and the final result is correct because I) if Late-Z of p1 fails and is discarded (Z-buffer is not updated), the depth value read by p2 is not problematic, p2 must be discarded, and II) if Late-Z of p1 passes and remains (Z-buffer is updated), p1 and p2 are on both sides of p0, and p2 must be discarded even if p2 reads the depth value updated by p 1.
Fig. 5 is a schematic diagram of a switching state of a depth test provided in an embodiment of the present disclosure. Where "∈" indicates on or sync waiting and "×" indicates off or no sync waiting.
As shown in FIG. 5, if Early-Z of the preceding fragment is on and Late-Z is off, no synchronous waiting is needed no matter how the on-off states of Early-Z and Late-Z of the following fragment are set, and the processing result of the following fragment is always correct.
As shown in FIG. 5, if Early-Z of the preceding element is turned on or off and Late-Z is turned on, the processing result of the following element may be in error, and the setting needs to be performed according to the on-off state and the depth test comparison mode of the following element Early-Z and Late-Z, including the case 4 (1) above that Early-Z is turned on and Late-Z is turned off, synchronous waiting is required to ensure that the processing result is correct, the case 3 above that Late-Z of the preceding element and Late-Z of the following element are sequentially performed, synchronous waiting is not required, the processing result of the following element is always correct, the case 4 (2) above that Early-Z is turned on, if the depth test comparison mode of the preceding element and the following element is different, synchronous waiting is required to ensure that the processing result is correct, the case 3 above that the following element Early-Z is always correctly turned on, and the following element is not correctly turned on, and the case 4 (if the preceding element is not correctly turned on) is required to be compared).
Similar to the depth test, the depth value z of the z-buffer is required to be read from the preamble fragment in the process of the depth range test DBT, and then the depth value z is compared with the maximum depth value and the minimum depth value of the depth interval. If the read precedes the write operation of the preceding fragment, the error depth value is read, possibly resulting in a DBT failure, and the fragment is discarded erroneously.
Depth range test DBT according to an embodiment of the present disclosure includes an Early DBT that is set before Early depth test Early-Z and a Late DBT that is set before Late depth test Late-Z to remove invalid primitives in advance.
The limitation between the switch states of the depth test of the preceding and the following chips is described above, and when both Early-Z and Late-Z of the preceding and the following chips are simultaneously turned on, the Late-Z of the following chip can always correctly read the depth value of the write Z-buffer of the preceding chip, but Early-Z may be affected. Early-DBT and Late-DBT are similar, so that Early-DBT needs to be turned off and Late-DBT needs to be turned on in the situation, and depth intervals of the front fragment and the rear fragment are the same or different, so that the processing result of DBT is affected.
The limitations between depth testing of the preceding and following chips and the on-off state of the DBT are described below.
1. The depth interval of the preceding fragment is the same as that of the following fragment.
(1) The following element may appear as ① following element if Early-DBT is on, late-DBT is off, and Early-DBT is on, late-DBT is off if Early-Z is on. Because Early-DBT read operations of the subsequent primitives follow Early-Z write operations of the preceding primitives.
② And if the follow-up fragment is closed Early-Z and opened Late-Z, closing Early-DBT and opening Late-DBT. Because Late-DBT read operations of the subsequent primitives follow Early-Z write operations of the preceding primitives.
③ And if the follow-up fragment is opened Early-Z and is opened Late-Z, opening Early-DBT and opening Late-DBT. Because Early-DBT, late-DBT read operations of the subsequent primitives follow Early-Z write operations of the preceding primitives.
(2) The following element may be ① if Early-DBT is turned on and Late-DBT is turned on, and if Early-Z is turned on and Late-Z is turned off, early-DBT is turned on and Late-DBT is turned off. Because Early-DBT read operations of the subsequent primitives follow the Early-Z write operations of the preceding primitives (Z-Test is guaranteed in the previous description).
② And if the follow-up fragment is closed Early-Z and opened Late-Z, closing Early-DBT and opening Late-DBT. Because the Late-DBT read operation of the subsequent fragment follows the Late-Z write operation of the preceding fragment.
③ And if the follow-up fragment is opened Early-Z and is opened Late-Z, opening Early-DBT and opening Late-DBT. Considering the scenario below, for three overlapping tiles p0, p1 and p2, even if p2 reads the p0 depth value before p1 is written in Early-DBT phase (where p1 and p2 are both tested based on the p0 depth value), if p2 passes Early-DBT, p2 can read the correct depth value that p1 has updated in Late-DBT phase regardless of whether p1 eventually passes Late-DBT and writes back to the new depth.
If p2 does not pass Early-DBT, p1 must not pass Late-DBT because p1 and p2 both use the same depth interval. I.e. the DBT behavior of the subsequent primitives is always correct in this scenario.
(3) The preface fragment, if open Early-DBT, open Late-DBT: ① the preface fragment, if open Early-Z, close Late-Z, open Early-DBT, close Late-DBT. Because Early-DBT read operations of the subsequent primitives follow the Early-Z write operations of the preceding primitives (Z-Test is guaranteed in the preceding summary).
② And if the follow-up fragment is closed Early-Z and opened Late-Z, closing Early-DBT and opening Late-DBT. Because the Late-DBT read operation of the subsequent fragment follows the Late-Z write operation of the preceding fragment.
③ And if the follow-up fragment is opened Early-Z and is opened Late-Z, opening Early-DBT and opening Late-DBT. Considering the scenario below, for three overlapping tiles p0, p1 and p2, even if p2 reads the depth value of p0 before p1 is written in Early-DBT phase (where p1 and p2 are both tested based on the depth value of p 0), if p2 passes Early-DBT, p2 can read the correct depth value updated by p1 in Late-DBT phase regardless of whether p1 finally passes Late-DBT and writes back the new depth. If p2 does not pass Early-DBT, p1 must not pass Late-DBT because p1 and p2 both use the same depth interval. I.e. the DBT behavior of the subsequent primitives is always correct in this scenario.
Therefore, under the condition that the depth intervals of the front fragment and the rear fragment are the same, the DBT switch state of the rear fragment is consistent with the switch state of the depth test of the rear fragment, and the influence of the DBT of the front fragment and the switch state of the depth test is avoided.
2. The depth interval of the preceding fragment is different from that of the following fragment.
(1) The following fragment may be ① if Early-DBT is turned on, early-DBT is turned off, early-Z writes into Z-buffer, and then Early-DBT is turned on if Early-Z is turned on, late-DBT is turned off. Because Early-DBT read operations of the subsequent primitives follow Early-Z write operations of the preceding primitives. ② And if the follow-up fragment is closed Early-Z and opened Late-Z, closing Early-DBT and opening Late-DBT. Because Late-DBT read operations of the subsequent primitives follow Early-Z write operations of the preceding primitives.
③ And if the follow-up fragment is opened Early-Z and is opened Late-Z, opening Early-DBT and opening Late-DBT. Because Early-DBT, late-DBT read operations of the subsequent primitives follow Early-Z write operations of the preceding primitives.
(2) The following element may be ① if Early-DBT is turned on and Late-DBT is turned on, and if Early-Z is turned on and Late-Z is turned off, early-DBT is turned on and Late-DBT is turned off. Because Early-DBT read operations of the subsequent primitives follow the Early-Z write operations of the preceding primitives (Z-Test is guaranteed in the preceding summary).
② And if the follow-up fragment is closed Early-Z and opened Late-Z, closing Early-DBT and opening Late-DBT. Because the Late-DBT read operation of the subsequent fragment follows the Late-Z write operation of the preceding fragment.
③ And if the follow-up fragment is opened Early-Z and is opened Late-Z, closing Early-DBT and opening Late-DBT. Considering the scenario where, for three overlapping tiles p0, p1 and p2, even if the p0 depth value before p1 writing is read in the p2Early-DBT stage (where both p1 and p2 are tested based on the p0 depth value), it is not possible to determine whether p 1's DBT passes or not if p2 does not pass Early-DBT because p1 and p2 use different depth intervals. If p1 passes Early-DBT and updates the depth value, early-DBT of p2 may pass, and p2 has been discarded by error, as shown in FIG. 6. Therefore, early-DBT of p2 must be turned off in this scenario, ensuring that the final result is correct, as shown in FIG. 7.
(3) The following element may be ① if Early-DBT is on and Late-DBT is on, and if Early-Z is on and Late-Z is off, early-DBT is on and Late-DBT is off. Because Early-DBT read operations of the subsequent primitives follow the Early-Z write operations of the preceding primitives (Z-Test is guaranteed in the previous description).
② And if the follow-up fragment is closed Early-Z and opened Late-Z, closing Early-DBT and opening Late-DBT. Because the Late-DBT read operation of the subsequent fragment follows the Late-Z write operation of the preceding fragment.
③ And if the follow-up fragment is opened Early-Z and is opened Late-Z, closing Early-DBT and opening Late-DBT. Considering the following scenario, for three overlapping tiles p0, p1 and p2, even if p0 depth values before p1 writing are read in Early-DBT stage of p2 (where p1 and p2 are both tested based on p0 depth values), there are cases where if p2 does not pass Early-DBT, it cannot be determined whether p 1's DBT passes because p1 and p2 use different depth intervals. If p1 passes the DBT and updates the depth value, early-DBT of p2 may pass, and p2 has been discarded by error, as shown in FIG. 6. Therefore, early-DBT of p2 must be turned off in this scenario, ensuring that the final result is correct, as shown in FIG. 7.
Fig. 6 and 7 are schematic diagrams of the processing of depth testing and depth range testing provided by embodiments of the present disclosure. In fig. 6 and fig. 7, the depth interval of the preceding fragment is [0.1,0.5], the depth interval of the following fragment is [0.5,0.7], different from the preceding fragment, the currently stored depth value in the depth buffer z-buffer is z0=0.4, the depth value of the preceding fragment is z1=0.6, and the depth value of the following fragment is z2=0.5.
As shown in FIG. 6, the leading primitive turns on Early-Z and Late-Z, and turns on Early-DBT and Late-DBT, and the following primitive turns on Early-Z and Late-Z, and turns on Early-DBT and Late-DBT.
For the pre-fragment, z0=0.4 is in the depth interval [0.1,0.5] in the Early-DBT processing, the Early-DBT passes, z1=0.6 >0.4 in the Early-Z processing, the Early-Z passes, and the calling of the fragment shader is executed, then the Late-DBT processing is executed, z0=0.4 is in the depth interval [0.1,0.5], the Late-DBT passes, z1=0.6 >0.4 in the Late-Z processing, and the depth value in the updated depth buffer Z-buffer is z0=0.6.
For the subsequent primitives, early-DBT processing is started before Late-Z updates the depth buffer, the unrefreshed depth value z0=0.4 is read, the Early-DBT is outside the depth interval [0.5,0.7], the subsequent primitives are discarded, and after the depth value is updated z0=0.6, the Early-DBT is in the depth interval [0.5,0.7 ]. Therefore, the subsequent fragments are discarded erroneously.
As shown in FIG. 7, the leading primitive turns on Early-Z and Late-Z, and turns on Early-DBT and Late-DBT, and the trailing primitive turns on Early-Z and Late-Z, and turns off Early-DBT, and turns on Late-DBT.
For the preamble element, the processing procedure is identical to that of fig. 6, and will not be described herein.
And for the subsequent fragment, not executing Early-DBT, directly executing Early-Z, wherein z2=0.5 >0.4, passing Early-Z, executing calling of a fragment shader, then executing Late-DBT processing, reading the updated depth value z0=0.6, and continuing to execute the subsequent processing when the updated depth value z0=0.6 is in a depth interval of [0.5,0.7 ]. Therefore, the result is kept correct in this case.
Therefore, under the condition that the depth intervals of the preceding fragment and the following fragment are different, the following fragment needs to close Early-DBT and open Late-DBT under the condition that the preceding fragment opens Late-Z and the following fragment simultaneously opens Early-Z and Late-Z, and under other conditions, the DBT switch state of the following fragment is consistent with the switch state of the depth test of the following fragment, and the influence of the switch state of the depth test of the preceding fragment can be avoided.
Fig. 8 is a schematic diagram of the switch states of the depth test and the depth range test provided by the embodiments of the present disclosure. Wherein "∈" indicates on and "×" indicates off.
As shown in fig. 8, in the case where both Early-DBT and Late-DBT of the preceding patch are turned off, the open/close states of Early-DBT and Late-DBT of the following patch are determined according to the open/close states of Early-Z and Late-Z of the following patch. If at least one of Early-Z and Late-Z is turned on, the Early-DBT can be turned on to realize DBT function, and if at least one of Early-Z and Late-Z is turned on, the opening and closing states of Early-DBT and Late-DBT can be consistent with the opening and closing states of Early-Z and Late-Z.
As shown in fig. 8, in the case where at least one of Early-DBT and Late-DBT of a preceding patch is turned on, if both Early-Z and Late-Z of a succeeding patch are turned off, the open states of Early-DBT and Late-DBT of the succeeding patch are commonly determined according to whether the open states of Early-DBT and Late-DBT of the preceding patch and the depth zone are identical, and if at least one of Early-Z and Late-Z of the succeeding patch is turned on, the open states of Early-DBT and Late-DBT of the succeeding patch are commonly determined according to whether the open states of Early-Z and Late-Z of the preceding patch and the depth zone are identical.
In some possible implementations, the depth test includes an early depth test and a late depth test, and the second configuration information of the depth test of the current draw call includes a fifth enable flag of the early depth test and a sixth enable flag of the late depth test.
The step of determining a first enabling flag and a second enabling flag according to second configuration information when a third enabling flag of the Early-DBT and a fourth enabling flag of the Late-DBT of the preamble draw call are both invalid includes determining that the first enabling flag is valid when the fifth enabling flag and the sixth enabling flag are both invalid, and determining that the first enabling flag and the second enabling flag are according to the fifth enabling flag and the sixth enabling flag when at least one of the fifth enabling flag and the sixth enabling flag is valid.
For example, on the processing device side, a leading primitive corresponds to a leading draw call and a following primitive corresponds to a current draw call. If both Early-DBT and Late-DBT of the preface draw call are turned off, the judgment is made by the fifth enable flag of Early-Z and the sixth enable flag of Late-Z of the current draw call.
In some possible implementations, if neither Early-Z nor Late-Z of the current draw call is enabled, i.e., neither the fifth enable flag nor the sixth enable flag is enabled, then Early-DBT of the current draw call may be turned on to implement DBT functionality. In this case, early-DBT has removed fragments outside the second depth interval, and Late-DBT does not have to be turned on repeatedly, and can be turned off, thereby setting the first enable flag to be active, for example, 1, and the second enable flag to be inactive, for example, 0.
In some possible implementations, if at least one of Early-Z and Late-Z of the current draw call is enabled, i.e., at least one of the fifth enable flag and the sixth enable flag is valid, the first enable flag and the second enable flag are determined based on the fifth enable flag and the sixth enable flag. That is, the first enabling flag is made consistent with the fifth enabling flag, and the second enabling flag is made consistent with the sixth enabling flag, so that the opening and closing states of Early-DBT and Late-DBT and Early-Z and Late-Z are kept consistent, and the cooperation of the depth range test and the depth test is realized.
By the method, DBT function configuration of current draw call can be realized on the basis of meeting data dependence and limitation, so that accuracy of a processing result is improved, and rendering efficiency is improved.
In some possible implementations, determining the first and second enable flags based on the difference between the second depth interval and the third depth interval, the third enable flag, the fourth enable flag, and the second configuration information, if at least one of the third and fourth enable flags is valid, includes determining the first and second enable flags if both the fifth and sixth enable flags are invalid, by determining that the first and second enable flags are valid if the third and fourth enable flags are valid;
If the fourth enabling mark is valid and the second depth interval is consistent with the third depth interval, determining that the first enabling mark is valid and the second enabling mark is valid;
and if the fourth enabling mark is valid and the second depth interval is inconsistent with the third depth interval, determining that the first enabling mark is invalid and the second enabling mark is valid.
For example, on the processing device side, if at least one of Early-DBT and Late-DBT of the preamble draw call is enabled, that is, at least one of the third enabled flag and the fourth enabled flag is enabled, and neither Early-Z nor Late-Z of the current draw call is enabled, that is, neither fifth enabled flag nor sixth enabled flag is enabled, then the judgment is made by the difference of the second depth section and the third depth section, the third enabled flag and the fourth enabled flag.
In some possible implementations, if the third enable flag of the Early-DBT of the preamble draw call is valid and the fourth enable flag of the Late-DBT is invalid, the Late-DBT is off, the depth interval has no effect on the DBT result, the Early-DBT of the current draw call may be turned on to implement the DBT function. In this case, early-DBT has removed the fragments outside the second depth interval, and Late-DBT does not have to be turned on repeatedly, and can be turned off, thereby setting the first enable flag to be active and the second enable flag to be inactive.
In some possible implementations, the fourth enable flag of the Late-DBT of the preamble draw call is valid, that is, the Late-DBT of the preamble fragment is enabled, neither Early-Z nor Late-Z of the succeeding fragment is enabled, then the Late-DBT of the succeeding fragment is enabled, and whether the Early-DBT is enabled is determined according to whether the depth intervals of the preceding and succeeding fragments are consistent. If the second depth interval is inconsistent with the third depth interval, the Early-DBT result is possible to be wrong and is not possible to be wrong.
That is, when the fourth enable flag is valid and the second depth section is identical to the third depth section, the first enable flag is valid and the second enable flag is valid, and when the fourth enable flag is valid and the second depth section is not identical to the third depth section, the first enable flag is invalid and the second enable flag is valid. The second depth interval and the third depth interval are consistent and correspond to the same maximum depth value and the same minimum depth value of the second depth interval and the third depth interval.
By the method, DBT function configuration of current draw call can be realized on the basis of meeting data dependence and limitation, correct depth values are read in processing, and processing results are kept correct, so that the correctness of the processing results is improved, unnecessary pixel shader execution is reduced, rendering efficiency is improved, and overall power consumption is reduced.
In some possible implementations, the step of determining the first and second enable flags according to the difference between the second and third depth intervals, the third, fourth and second configuration information, if at least one of the fifth and sixth enable flags is valid, comprises determining the first and second enable flags by determining that the first and second enable flags are valid if a preset condition of the enable flags is met and the second depth interval is consistent with the third depth interval, determining that the first and second enable flags are valid if a preset condition is met and the second depth interval is not consistent with the third depth interval, determining that the first and second enable flags are invalid if the preset condition is not met and the second enable flag is valid, and determining that the fifth and sixth enable flags are valid if the preset condition is not met, wherein the fifth and sixth enable flags are both valid according to the fifth and sixth enable flags.
For example, on the processing device side, if at least one of Early-DBT and Late-DBT of the preamble draw call is enabled, that is, at least one of a third enable flag and a fourth enable flag is valid, and at least one of Early-Z and Late-Z of the current draw call is enabled, that is, at least one of a fifth enable flag and a sixth enable flag is valid, a determination is made by a difference of the second depth interval and the third depth interval, the third enable flag and the fourth enable flag of the preamble draw call, and the fifth enable flag and the sixth enable flag of the current draw call being integrated.
In some possible implementations, if the Late-DBT of the preamble draw call is enabled, i.e., the fourth enable flag is valid, and both the Early-Z and Late-Z of the current draw call are enabled, i.e., both the fifth enable flag and the sixth enable flag are valid, in combination with the previous analysis and FIG. 8, the Early-DBT result will not be erroneous, and may be enabled, if the second depth interval is consistent with the third depth interval, and the Early-DBT result may be erroneous, and may not be enabled, if the second depth interval is inconsistent with the third depth interval.
In this case, the preset conditions for setting the enabling flags include that the fourth enabling flag, the fifth enabling flag and the sixth enabling flag are all valid, if the preset conditions are met and the second depth interval is consistent with the third depth interval, the first enabling flag is set to be valid and the second enabling flag is set to be valid, namely Early-DBT and Late-DBT are simultaneously started, and if the preset conditions are met and the second depth interval is inconsistent with the third depth interval, the first enabling flag is set to be invalid and the second enabling flag is set to be valid, namely Early-DBT is simultaneously closed and Late-DBT is started.
In some possible implementations, if the preset condition is not met, i.e., the fourth enable flag, the fifth enable flag, and the sixth enable flag are not active at the same time, in combination with the previous analysis and fig. 8, whether the depth interval is consistent does not affect the result of Early-DBT. In this case, the first and second enable flags may be determined based on the fifth and sixth enable flags, that is, the first enable flag is made to coincide with the fifth enable flag, and the second enable flag is made to coincide with the sixth enable flag, thereby keeping Early-DBT and Late-DBT in coincidence with the open and closed states of Early-Z and Late-Z.
By the method, the accuracy of the DBT processing result can be ensured, the cooperation of the depth range test and the depth test can be realized, invalid fragments are removed in advance before the depth test, unnecessary pixel shader execution is reduced, the rendering efficiency is improved, and the overall power consumption is reduced.
In some possible implementations, after determining the first enable flag, the second enable flag, and the second depth section, the driving layer may perform data packing and transmission in step S33, and generate instruction information according to the first driving information, that is, encode the first enable flag, the second enable flag, and the second depth section into a hardware instruction packet. The instruction information may also include an identification of the data format of the depth cache for the hardware module of the graphics processing device to adapt to the compare logic, etc., which is not limiting in this disclosure.
In some possible implementations, the generated instruction information is sent to the graphics processing device, and after the graphics processing device receives the instruction information, the graphics processing device may perform a depth range test in the first draw call based on the instruction information, thereby implementing the DBT function.
Fig. 9 is a flowchart of a rendering processing method according to an embodiment of the present disclosure. The method may be performed by a graphics processing device in the terminal device, such as a GPU. Referring to fig. 9, the method includes:
In step S91, first driving information of a depth range test DBT of a current first draw call of a rendering object is acquired, wherein the depth range test includes an early DBT and a later DBT, and the first driving information includes a first enable flag of the early DBT, a second enable flag of the later DBT, and a second depth interval of the depth range test.
In step S92, in response to the first draw call, a rasterization process is performed on a first primitive in the tile of the rendering object, so as to obtain a corresponding first primitive.
In step S93, if the first enable flag is valid, performing a depth range test on the first primitive according to the second depth interval, to obtain an early DBT result of the first primitive.
In step S94, if the early DBT result of the first primitive is passed, performing early depth test and rendering processing on the first primitive, and if the early depth test is passed, obtaining a rendered second primitive.
In step S95, if the second enable flag is valid, performing a depth range test on the second slice according to the second depth interval, to obtain a late DBT result of the second slice.
In step S96, if the later DBT result of the second primitive is passing, performing a later depth test and rendering processing on the second primitive, and obtaining rendered first pixel data if the later depth test passes.
For example, after receiving the instruction information sent by the processing device, the graphics processing device may determine, in step S91, first driving information of the DBT of the first draw call of the rendering object currently, including an early DBT first enable flag, a late DBT second enable flag, and a DBT second depth section.
In some possible implementations, the rendering of the rendering object includes a tile-based delayed rendering TBDR, wherein the rendering object is divided into a plurality of tiles, and the rendering object includes a plurality of primitives, each of the plurality of tiles being covered by at least one primitive of the rendering object, the first primitive being any one of the plurality of primitives.
In some possible implementations, in response to the current first draw call, a rendering process may be performed in conjunction with the rendering architecture shown in fig. 4. As shown in fig. 4, the graphics processing apparatus reads geometric data from the system memory, obtains primitives of the rendering object through vertex processing 401 and clipping, projection and eliminating 402, completes the segmentation of the rendering object in a stage of tiling 403, records graphics data covering each tile, including primitive list, vertex data and the like, and writes the graphics data into the system memory.
In some possible implementations, for any primitive in any tile of the rendering object, referred to as a first primitive, the first primitive may be rasterized in step S92 to obtain a corresponding first primitive.
And if the first enabling mark is valid, executing Early-DBT on the first fragment according to the second depth interval in step S93, namely executing depth range test through the DBT module.
In some possible implementations, step S93 may include reading a depth value corresponding to the location information from an on-chip depth cache of the graphics processing device based on the location information of the first primitive, and determining that an early DBT result of the first primitive is passed if the depth value is within the second depth interval. The position information may include position coordinates of a pixel point corresponding to the first element in the screen.
For example, according to the location information, as shown in fig. 4, the DBT module may read a depth value z-buffer corresponding to the location information from an on-chip depth buffer of the graphics processing apparatus, and compare the depth value with a second depth interval. If the depth value is within the second depth interval, i.e., zmin is less than or equal to Z-buffer is less than or equal to zmax, then the Early DBT result of the first patch is passed, and the subsequent Early-Z processing is continued.
Otherwise, if the depth value is outside the second depth interval, i.e. Z-buffer < zmin or Z-buffer > zmax, the Early DBT result of the first fragment is not passed, directly discarded, and the subsequent Early-Z processing and other processing steps are skipped.
By executing Early-DBT before Early-Z, fragments outside a preset depth interval can be removed at the earliest stage, subsequent depth test and invalid execution of a pixel shader are obviously reduced, and therefore overall rendering performance is improved.
In some possible implementations, the rendering processing method according to an embodiment of the present disclosure further includes determining that an early DBT result of the first fragment is passed if the first enable flag is not valid. That is, if the first enable flag is not valid, the Early DBT process may be skipped, the Early DBT result of the first fragment is considered as passing, and subsequent Early-Z processes and other processes are performed directly. In this way, whether to execute the DBT can be determined according to the enable flag, improving processing efficiency and processing flexibility.
In some possible implementations, where the Early DBT result of the first patch is passed, early depth testing (Early-Z) and rendering processing may be performed on the first patch in step S94. If the enabling flag of Early-Z is valid, the Early-Z is executed, and the rendering of the fragment is performed after the Early-Z passes. As shown in fig. 4, the rendering process may include texture and shadow 407 and alpha test 408, which, after processing, results in a rendered second fragment.
In some possible implementations, the Late-Z execution stage may determine whether the second enable flag of the Late-DBT is valid, and if the second enable flag is valid, execute the Late-DBT on the second primitive according to the second depth interval in step S95, that is, perform the depth range test through the DBT module.
Similarly, based on the location information of the second primitive, the DBT module may read a depth value from an on-chip depth cache of the graphics processing device that is current with the location information and compare the depth value to the second depth interval. If the depth value is within the second depth interval, the later DBT result of the second fragment is passed, and the subsequent Late-Z processing is continued.
Otherwise, if the depth value is outside the second depth interval, the later DBT result of the second fragment is not passed, and the later DBT result is directly discarded, and the subsequent Late-Z processing and other processing steps are skipped.
By executing the Late-DBT before the Late-Z, the residual fragments with depth exceeding the depth interval can be removed for the second time, so that the memory bandwidth pressure and the power consumption cost caused by fragment processing are further reduced, and the hardware resource utilization rate is optimized.
In some possible implementations, the rendering processing method according to an embodiment of the present disclosure further includes determining that the post-DBT result of the second primitive is pass if the second enable flag is not valid. That is, if the second enable flag is not valid, the post DBT process can be skipped, and the post DBT of the second fragment is considered as passing, and subsequent Late-Z and other processes are performed directly. In this way, whether to execute the DBT can be determined according to the enable flag, improving processing efficiency and processing flexibility.
In some possible implementations, in the event that the post-DBT result of the first fragment is passed, a post-depth test (Late-Z) and rendering process may be performed on the second fragment in step S964. If the enabling mark of the Late-Z is valid, executing the Late-Z, and after the Late-Z passes, rendering the fragment.
As shown in fig. 4, the rendering process may include alpha blending 411 and other processes, after which rendered first pixel data is obtained. In this way, the entire rendering process of the first primitive is achieved.
According to the rendering processing method of the embodiment of the disclosure, a depth range test DBT is combined with a depth test Z-test, DBT is allowed to be configured in a drawing call, processing equipment obtains configuration information of the depth range test of the current drawing call, DBT driving information of the current drawing call is determined according to DBT configuration and depth test configuration of the previous drawing call and the depth test configuration of the current drawing call, and further corresponding instruction information is generated and issued, so that graphics processing equipment executes the depth range test, invalid fragments are further removed on the basis of the depth test, unnecessary pixel shader execution is reduced, rendering efficiency is improved, and overall power consumption is reduced.
According to the rendering processing method, depth range tests are divided into Early-stage DBT (Early-DBT) and Late-stage DBT (Late-DBT), the Early-stage DBT is cooperated with Early-stage depth tests (Early-Z), the Late-stage DBT is cooperated with Late-stage depth tests (Late-Z), unnecessary calculation of the Early-stage depth tests and the Late-stage depth tests can be reduced, performance can be optimized respectively, and accuracy of DBT results can be guaranteed.
According to the rendering processing method of the embodiment of the disclosure, a DBT hardware module is arranged in the graphics processing device, and depth range test can be realized before Early-Z and before Late-Z respectively. Therefore, by executing Early-DBT before Early-Z, fragments outside a preset depth interval can be removed at the earliest stage, the follow-up depth test and the invalid execution of a pixel shader are obviously reduced, so that the overall rendering performance is improved, and by executing Late-DBT before Late-Z, the residual fragments with depth exceeding the depth interval can be removed for the second time, the memory bandwidth pressure and the power consumption cost caused by fragment processing are further reduced, and the hardware resource utilization rate is optimized.
And by segment rendering aiming at different depth intervals, special visual effects based on depth, such as depth profile edge drawing, layered volume fog, depth blurring or perspective deformation, can be realized, so that the rendering system is endowed with stronger flexibility and expandability.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the present disclosure further provides a rendering processing device, an electronic device, and a computer readable storage medium, where the foregoing may be used to implement any one of the rendering processing methods provided in the present disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.
Fig. 10 is a block diagram of a rendering processing apparatus according to an embodiment of the present disclosure.
Referring to fig. 10, an embodiment of the present disclosure provides a rendering processing apparatus, which includes a configuration information obtaining module 81 configured to obtain first configuration information of a depth range test DBT of a current first draw call of a rendering object, where the first configuration information includes a switch state of the depth range test and a first depth interval.
The driving information determining module 82 is configured to determine, when the switch state of the depth range test is on, first driving information of the depth range test of the first draw call according to the first configuration information, second configuration information of the depth test of the first draw call, and second driving information of the depth range test of the second draw call. The depth range test comprises an early DBT and a later DBT, and the first driving information comprises a first enabling mark of the early DBT, a second enabling mark of the later DBT and a second depth interval of the depth range test.
An instruction sending module 83, configured to generate instruction information according to the first driving information and send the instruction information to a graphics processing device, so that the graphics processing device performs a depth range test in the first draw call based on the instruction information.
In some possible implementations, the second driving information includes a third enable flag of the early DBT, a fourth enable flag of the late DBT, and a third depth interval of the depth range test, where the driving information determining module 82 is configured to perform data verification and format conversion on the first depth interval to obtain the second depth interval, determine the first enable flag and the second enable flag according to the second configuration information when the third enable flag and the fourth enable flag are both invalid, and determine the first enable flag and the second enable flag according to a difference between the second depth interval and the third depth interval, the third enable flag, the fourth enable flag, and the second configuration information when at least one of the third enable flag and the fourth enable flag is valid.
In some possible implementations, the depth test includes an early depth test and a late depth test, the second configuration information includes a fifth enable flag of the early depth test and a sixth enable flag of the late depth test, wherein the driving information determining module 82 is configured to determine that the first enable flag is valid when the fifth enable flag and the sixth enable flag are both invalid, and determine that the first enable flag and the second enable flag are both valid when at least one of the fifth enable flag and the sixth enable flag is valid.
In some possible implementations, the driving information determining module 82 is configured to:
And when the fifth enabling flag and the sixth enabling flag are both invalid, determining the first enabling flag and the second enabling flag by determining that the first enabling flag is valid and the second enabling flag is invalid if the third enabling flag is valid and the fourth enabling flag is invalid, determining that the first enabling flag is valid and the second enabling flag is valid if the fourth enabling flag is valid and the second depth interval is consistent with the third depth interval, and determining that the first enabling flag is invalid and the second enabling flag is valid if the fourth enabling flag is valid and the second depth interval is inconsistent with the third depth interval.
In some possible implementations, the driving information determining module 82 is configured to determine, if at least one of the fifth enable flag and the sixth enable flag is valid, the first enable flag and the second enable flag by determining that the first enable flag is valid and the second enable flag is valid if a preset condition of an enable flag is satisfied and the second depth interval is consistent with the third depth interval, determining that the first enable flag is invalid and the second enable flag is valid if the preset condition is satisfied and the second depth interval is inconsistent with the third depth interval, and determining that the first enable flag is valid and the second enable flag is valid if the preset condition is not satisfied, and determining the first enable flag and the second enable flag according to the fifth enable flag and the sixth enable flag if the preset condition is not satisfied, where the preset condition includes that the fourth enable flag, the fifth enable flag, and the sixth enable flag are valid.
In some possible implementations, the driving information determining module 82 is configured to perform data verification on the minimum depth value and the maximum depth value of the first depth interval, determine the verified minimum depth value and the maximum depth value, and perform numerical conversion on the verified minimum depth value and the maximum depth value according to a data format of a depth cache of the graphics processing device, and determine the second depth interval.
According to an embodiment of the present disclosure, there is also provided a rendering processing apparatus including a driving information obtaining module configured to obtain first driving information of a depth range test DBT of a current first draw call of a rendering object, the depth range test including an early DBT and a late DBT, the first driving information including a first enable flag of the early DBT, a second enable flag of the late DBT, and a second depth interval of the depth range test; the system comprises a first rendering module, a rasterization processing module, a first DBT module, a first depth test and rendering module, a second depth test and rendering module, a first DBT module and a second DBT module, wherein the first rendering module is used for responding to the first drawing call, rasterizing a first primitive in a tile of a rendering object to obtain a corresponding first primitive, the first DBT module is used for conducting depth range test on the first primitive according to the second depth interval when the first enabling mark is valid to obtain an early DBT result of the first primitive, the first depth test and rendering module is used for conducting early depth test and rendering processing on the first primitive when the early DBT result of the first primitive is passing, obtaining a rendered second primitive when the early depth test is passing, the second DBT module is used for conducting depth range test on the second primitive when the second enabling mark is valid to obtain a later DBT result of the second primitive, and conducting depth test on the second primitive when the later DBT result of the second primitive is passing, and rendering data of the second primitive after the second primitive passes and rendering data after the first depth test and rendering data after the first primitive pass.
In some possible implementations, the first DBT module is further configured to determine an early DBT result of the first tile as a pass if the first enable flag is not valid, or the second DBT module is further configured to determine a late DBT result of the second tile as a pass if the second enable flag is not valid.
In some possible implementations, a first DBT module is configured to read a depth value corresponding to the location information from an on-chip depth cache of a graphics processing device according to the location information of the first primitive, and determine that an early DBT result of the first primitive is passed if the depth value is within the second depth interval.
In some possible implementations, the rendering of the rendering object includes a tile-based delayed rendering TBDR, wherein the rendering object is divided into a plurality of tiles and the rendering object includes a plurality of primitives, each tile of the plurality of tiles is covered by at least one primitive of the rendering object, the first primitive is any one of the plurality of primitives.
Fig. 11 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Referring to fig. 11, an embodiment of the present disclosure provides an electronic device including at least one processor 701, at least one memory 702, and one or more I/O interfaces 703 connected between the processor 701 and the memory 702, wherein the memory 702 stores one or more computer programs executable by the at least one processor 701, and the one or more computer programs are executed by the at least one processor 701 to enable the at least one processor 701 to perform the above-described rendering processing method.
The disclosed embodiments also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the rendering processing method described above. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when executed in a processor of an electronic device, performs the above-described rendering processing method.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components, for example, one physical component may have a plurality of functions, or one function or step may be cooperatively performed by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and may include any information delivery media.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.