[go: up one dir, main page]

CN116152039B - Image rendering method - Google Patents

Image rendering method Download PDF

Info

Publication number
CN116152039B
CN116152039B CN202310409599.3A CN202310409599A CN116152039B CN 116152039 B CN116152039 B CN 116152039B CN 202310409599 A CN202310409599 A CN 202310409599A CN 116152039 B CN116152039 B CN 116152039B
Authority
CN
China
Prior art keywords
cluster
rendered
data
model
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310409599.3A
Other languages
Chinese (zh)
Other versions
CN116152039A (en
Inventor
穆子杰
崔铭
冯星
杨峰峰
韦建伟
方顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xuanguang Technology Co ltd
Original Assignee
Beijing Xuanguang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xuanguang Technology Co ltd filed Critical Beijing Xuanguang Technology Co ltd
Priority to CN202310409599.3A priority Critical patent/CN116152039B/en
Publication of CN116152039A publication Critical patent/CN116152039A/en
Application granted granted Critical
Publication of CN116152039B publication Critical patent/CN116152039B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

The embodiment of the application provides a method for image rendering, which comprises the following steps: acquiring an initial cluster set corresponding to each model to be rendered, wherein the initial cluster set is obtained by splitting grid data of the corresponding model; respectively carrying out back face rejection on initial cluster sets corresponding to the models to be rendered by taking the clusters as rejection units to obtain target cluster sets corresponding to the models to be rendered; and providing cluster data to be rendered corresponding to the target cluster set for an image processor so that the image processor performs drawing processing. According to the method and the device for eliminating the graphics processing unit, more fine-grained elimination can be achieved on the CPU side, data quantity provided for the GPU is reduced, and the CPU can start the rendering process of the GPU through rendering commands of the same times as the material types, so that excessive interface calling processes when the model is used as the rendering commands are reduced.

Description

Image rendering method
Technical Field
The present application relates to the field of image rendering, and in particular, embodiments of the present application relate to a method of image rendering.
Background
With the continuous development of computer hardware, computing power is continuously improved, and rendering capability of platforms such as computers and mobile phones is greatly enhanced. In many fields such as games, movies, VR, etc., there are increasing demands for rendering effects. At the user end, players are beginning to pursue video-level effects within the game scene as well as open world play. At the production end of art resources, the pictures, models and scenes made by related personnel tend to be higher in precision. At the production end, after the main game play method of each large game manufacturer is determined, the products of each large game manufacturer also begin to seek to describe the works by using more detailed and realistic picture expressive force.
Taking a game scenario as an example, in the existing game engine, the CPU is responsible for updating each logic system (such as a physical system, a special effect system, an ai system, resource management, and data communication) of the game, and bears great computational pressure. Meanwhile, the CPU is also responsible for driving the whole rendering process to be carried out, and when a large number of high-precision models are brought by new generation game demands, the CPU often becomes a performance bottleneck and causes system blocking in the world playing method.
Disclosure of Invention
The purpose of the embodiments of the present application is to provide a method for rendering an image, by which more fine-grained rejection can be achieved at a CPU side to reduce the amount of data provided to a GPU, and the CPU can start the rendering process of the GPU through rendering commands of the same number of times as the material types to reduce excessive interface calling processes when using a model as a rendering command, and by which pixel point coloring and block coloring are performed at the GPU side with material identifiers, so that the pixel coloring speed is significantly improved.
In a first aspect, some embodiments of the present application provide a method for image rendering, applied to a CPU processor, the method including: acquiring an initial cluster set corresponding to each model to be rendered, wherein the initial cluster set is obtained by splitting grid data of the corresponding model, each cluster of the initial cluster set comprises a set number of triangles, each model to be rendered belongs to a model in an ith frame to be updated or loaded (i is an integer greater than or equal to 1), the model to be rendered is a model subjected to scene tree elimination processing, and the set number is related to the size of a cache unit set by an image processor for a streaming multiprocessor SM; respectively carrying out back face rejection on initial cluster sets corresponding to the models to be rendered by taking the clusters as rejection units to obtain target cluster sets corresponding to the models to be rendered; and providing cluster data to be rendered corresponding to the target cluster set for an image processor so that the image processor performs drawing processing.
According to the embodiment of the application, the amount of data transmitted to the GPU is reduced by removing the back of the cluster by taking the cluster as a unit, and the data transmission amount is reduced and the data transmission speed is improved.
In some embodiments, each cluster of the initial cluster set includes a set number of triangles, the models to be rendered belong to models in an i-th frame to be updated or loaded, and the models to be rendered are models processed by scene tree culling.
In some embodiments, the first model to be rendered belongs to a model to be rendered loaded in the ith frame, and the obtaining an initial cluster set corresponding to each model to be rendered in the ith frame includes: starting a splitting thread; splitting grid data corresponding to the first model to be rendered into multiple clusters through the splitting thread to obtain an initial cluster set corresponding to the first model to be rendered; or, the second model to be rendered belongs to a model to be rendered loaded in a history frame, wherein the obtaining the initial cluster set corresponding to each model to be rendered in the ith frame includes: and reading an initial cluster set corresponding to the second model to be rendered from a storage unit.
Some embodiments of the present application perform grid data splitting corresponding to models by establishing a thread to obtain a plurality of clusters corresponding to each model, so that back surface rejection can be implemented on the CPU side instead of model rejection based on a scene tree alone.
In some embodiments, before the acquiring the initial cluster set corresponding to each model to be rendered in the ith frame, the method includes: and performing model elimination based on a scene tree on the original data corresponding to the ith frame to obtain each model to be rendered.
Some embodiments of the present application require model elimination through a scene tree, after which the models to be rendered are obtained, and then cluster-based cluster back elimination is performed on the models.
In some embodiments, the set number is related to a size of a cache unit that the image processor sets for the streaming multiprocessor SM.
Some embodiments of the present application determine the size of each cluster according to the hardware characteristics of the CPU, and improve the processing speed of the clusters on the GPU.
In some embodiments, the buffer unit size is a, each cluster includes k×a triangles, and k is a positive integer.
In some embodiments, the third model to be rendered belongs to any one of the models to be rendered in the ith frame, wherein the performing back face rejection on the initial cluster set corresponding to each model to be rendered with a cluster as a rejection unit to obtain a target cluster set corresponding to each model to be rendered includes: acquiring a test cone and cone opening angles of each cluster in an initial cluster set corresponding to the third model to be rendered; and testing the corresponding clusters according to the test cone and the cone opening angle, and taking the corresponding clusters as clusters in the target cluster set if the test passes.
Some embodiments of the present application perform back face culling by acquiring test cones and cone opening angles for each cluster, and then performing back face culling according to the information.
In some embodiments, the first cluster is a cluster passing the test, wherein the providing cluster data to be rendered corresponding to the target cluster set to the image processor includes: if the image processor is not provided with a storage unit for storing the cluster data to be rendered corresponding to the first cluster, distributing a storage address on the image processor for the cluster data to be rendered corresponding to the first cluster, and taking the cluster data to be rendered corresponding to the first cluster as block updating data; repeating the above process to obtain the block update data of all the clusters passing the test, and sending all the block update data to the image processor.
Some embodiments of the present application reduce the bandwidth required to transmit data by identifying clusters that need to be updated and then providing the data corresponding to those clusters to the GPU.
In some embodiments, the cluster data to be rendered includes at least one of the following information: cluster bounding box information, cluster vertex information, cluster index information, cluster material information and cluster rotation scaling offset matrix information, wherein the cluster bounding box information is used for view cone rejection by the image processor, the view cone rejection is performed by taking a cluster as a rejection unit, the cluster rotation scaling offset matrix information is used for recording the space position of a corresponding cluster, and the image processor obtains coordinates of each vertex in the corresponding cluster on a screen according to the cluster rotation scaling offset matrix information.
In some embodiments of the present application, the to-be-rendered cluster data that needs to be sent to the GPU includes various information of the clusters and cluster vertex data, so that the GPU may render objects with the clusters, thereby implementing high-speed rendering.
In some embodiments, the cluster bounding box information is used to characterize bounding box center locations and bounding box radii of the corresponding clusters.
Some embodiments of the present application employ bounding box center locations and bounding box radii of clusters as cluster bounding box information.
In some embodiments, the cluster material information is obtained from a material information candidate set comprising a plurality of material identification numbers obtained by a policy comprising: traversing the material properties of all objects in each frame, combining the same material information, and allocating a unique material identification number for each combined material, wherein the image processor compares the cluster material information with the material information to be rendered carried by the rendering command to complete pixel coloring processing, and the material information to be rendered corresponds to one material in the ith frame.
Some embodiments of the present application may respectively assign a unique identifier to each different material to obtain a material identifier (or referred to as a material identifier), and then the drawing command is also distinguished by the different material identifier, where each rendering of the GPU is driven by the different material identifier.
In some embodiments, the image processor compares the material information to be rendered carried by the ith rendering command as a depth value with the material information recorded by the pixel to be rendered on the screen to complete the pixel coloring process.
In some embodiments, after the providing the image processor with cluster data to be rendered corresponding to the set of target clusters, the method further comprises: and a rendering command initiated to the image processor by taking any material identifier included in the ith frame as a parameter, wherein the any material identifier is used for representing one of material identifier numbers corresponding to all different materials included in the target cluster set, and the image processor determines whether to color the corresponding pixel point by comparing the recorded material identifier number of the pixel point in the ith frame with the any material identifier number.
Some embodiments of the present application provide for the CPU to send different drawing commands to the GPU with different material identifications.
In a second aspect, some embodiments of the present application provide a method of image rendering, applied to a GPU processor, the method comprising: receiving cluster data to be rendered of an ith frame obtained by adopting any embodiment of the first aspect; receiving a rendering command carrying a j-th material identification number, wherein the j-th material identification number is used for representing one material in the i-th frame, and different materials of the i-th frame are represented by different identifications; and finishing drawing according to the rendering command and the cluster data to be rendered.
Some embodiments of the present application draw through the to-be-rendered cluster data and drawing instructions of each cluster received from the CPU.
In some embodiments, the completing drawing according to the rendering command and the cluster data to be rendered includes: taking a cluster as a rejection unit, performing view cone rejection on each to-be-rendered model according to the to-be-rendered cluster data to reject clusters in a target cluster set corresponding to each to-be-rendered model to obtain a target to-be-rendered cluster set corresponding to each to-be-rendered model, wherein the target cluster set is obtained by performing back rejection on an initial cluster set by taking the cluster as a rejection unit, and each cluster in the initial cluster set is obtained by performing clustering processing on all triangles included in grid data of the to-be-rendered model; and drawing according to the target cluster set to be rendered and the rendering command.
Some embodiments of the present application perform view cone culling in clusters.
In some embodiments, the cluster data to be rendered includes: cluster bounding box information of each cluster corresponding to any one to-be-rendered model, wherein the performing view cone rejection on each to-be-rendered model according to the to-be-rendered cluster data comprises the following steps: and performing view cone rejection according to the cluster bounding box information of each cluster to obtain the target cluster set to be rendered.
Some embodiments of the present application perform view cone culling by receiving cluster bounding box information from clusters of a CPU.
In some embodiments, the drawing according to the target set of clusters to be rendered and the drawing instruction includes: obtaining material identification numbers of all pixel points on a screen according to cluster data to be rendered corresponding to the target cluster to be rendered; and comparing the j-th material identification number with the material identification number of at least part of the pixel points to complete coloring of the pixel points.
According to the method and the device, whether the similar points are colored is determined by comparing whether the material identification numbers of the pixel points on the screen are the same as the j-th material identification numbers carried by the corresponding drawing commands, so that the pixel point coloring taking the material identification as parameters is realized, and compared with the pixel point coloring scheme taking a model as a unit in the related art, the number of times of calling a GPU interface by a CPU is reduced.
In some embodiments, the drawing according to the target set of clusters to be rendered and the rendering command includes: obtaining material identification numbers of all pixel points on a screen according to cluster data to be rendered corresponding to the target cluster to be rendered; dividing the screen into a plurality of blocks and determining the coverage range of the material identification number corresponding to each block; selecting at least one block from the plurality of blocks as a target block to be rendered according to the j-th material identification number (j is an integer greater than or equal to 1) and the coverage area, wherein the j-th material identification number is positioned in the coverage area corresponding to the target block to be rendered; and comparing the j-th material identification number with the material identification number of each pixel point on the target block to be rendered, and finishing coloring.
According to the method and the device, the screen is divided into a plurality of blocks, the corresponding blocks which can be colored by the j-th material identification number are identified, and the coloring speed of the pixel points is improved.
In some embodiments, the drawing according to the target set of clusters to be rendered and the drawing instruction includes: and in the vertex drawing process, acquiring coordinates of each vertex corresponding to all target clusters to be rendered on a screen, and drawing corresponding vertices on the screen according to the coordinates.
According to the method and the device for drawing the vertex in the visual cone, all the remaining vertices are drawn after the visual cone is removed at one time, and the efficiency of vertex drawing is improved in a vertex drawing mode taking a model as a unit in a related technology.
In some embodiments, the drawing according to the target cluster data to be rendered further includes: in the pixel coloring process, clustering is carried out according to the target to be rendered to obtain a visible image and a material image, wherein the size of the visible image is equal to the screen resolution, each pixel point on the visible image stores a mapping sampling coordinate and a normal value of a triangle where the corresponding pixel point is located, the size of the material image is equal to the screen resolution, and each pixel point on the material image stores a material identification number of the triangle where the corresponding pixel point is located; generating a material range diagram according to the material diagram, wherein the material range diagram is obtained through the following strategy: dividing the screen into a plurality of blocks and storing a first material identification value and a second material identification value of the corresponding block on each block, wherein the first material identification value and the second material identification value are used for representing the range of material identification numbers covered by all materials on the corresponding block; comparing the j-th material identification number with each block on the material range diagram, and determining the block which is positioned on the material range diagram and matched with the j-th material identification number as the target to-be-rendered block; obtaining a block image of a block corresponding to the block to be rendered from the material map, and obtaining a target block image; and finishing coloring the pixel points by comparing the j-th material identification number with the value of each pixel point on the target block image.
According to the embodiment of the application, the technical scheme of coloring the partitioned pixel points and determining whether to color a certain pixel point by using the material identification is realized by constructing a plurality of images (namely, storing the data of each pixel point of the plurality of images), so that the coloring speed of the pixel points is improved.
In some embodiments, the clustering to be rendered according to the target to obtain a visible graph and a texture graph includes: deleting cluster vertex data corresponding to a rejected cluster from all cluster vertex data included in the cluster data to be rendered to obtain target cluster vertex data, wherein the rejected cluster is obtained through view cone rejection operation; deleting cluster index information corresponding to the rejected clusters from cluster index information of all clusters included in the cluster data to be rendered to obtain target cluster index information; and obtaining the visible graph according to the target cluster vertex data and the target cluster index information.
Some embodiments of the present application obtain target cluster data to be rendered by deleting part of data (i.e., data corresponding to vertices of clusters that are eliminated by view pyramids) carried in cluster data to be rendered from a CPU, and construct a visible graph according to the data.
In some embodiments, the clustering to be rendered according to the target to obtain a visible graph and a texture graph includes: deleting material information corresponding to the removed clusters from material identification information of all clusters included in the cluster data to be rendered, deleting cluster rotation scaling offset matrix information corresponding to the removed clusters from matrix information of all clusters included in the cluster data to be rendered, and obtaining a target cluster matrix and material information; and obtaining the texture map according to the target cluster matrix, the texture information and the target cluster index information.
Some embodiments of the present application obtain target cluster data to be rendered by deleting part of data (i.e., data corresponding to vertices of clusters removed by view cones) carried in cluster data to be rendered from a CPU, and construct a texture map according to the data.
In some embodiments, cluster bounding box information of each cluster included in the cluster data to be rendered is stored in a cluster bounding box storage unit; all cluster vertex data included in the cluster data to be rendered are stored in a vertex data storage unit; cluster index information of all clusters included in the cluster data to be rendered is stored in an index information storage unit; and the material identification information of all clusters included in the cluster data to be rendered and the matrix information of all clusters included in the cluster data to be rendered are stored in a matrix material storage unit.
Some embodiments of the present application store cluster data to be rendered from each cluster of a CPU by providing a plurality of storage units on a GPU.
In a third aspect, some embodiments of the present application provide a central processing unit, the processor comprising: the system comprises an initial cluster acquisition module, a scene tree elimination processing module and a scene tree elimination processing module, wherein the initial cluster acquisition module is configured to acquire an initial cluster set corresponding to each model to be rendered in an ith frame, wherein the initial cluster set is obtained by splitting grid data of the corresponding model, each cluster of the initial cluster set comprises a set number of triangles, and the model to be rendered is a model subjected to scene tree elimination processing; the back face eliminating module is configured to eliminate back faces of initial cluster sets corresponding to the models to be rendered respectively by taking the clusters as eliminating units to obtain target cluster sets corresponding to the models to be rendered; and the providing module is configured to provide cluster data to be rendered corresponding to the target cluster set for an image processor so as to enable the image processor to conduct drawing processing.
In a fourth aspect, some embodiments of the present application provide an image processor, the processor comprising: the cluster data to be rendered receiving module is configured to receive cluster data to be rendered of an ith frame in the cluster data to be rendered, which is obtained by a central processing unit; a rendering command receiving module configured to receive a rendering command carrying a j-th material identification number, where the j-th material identification number is used to characterize a material in the i-th frame, and different materials of the i-th frame are characterized by different identifications; and the rendering execution module is configured to complete drawing according to the rendering command and the cluster data to be rendered.
In a fifth aspect, some embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs a method according to any embodiment of the first and second aspects.
In a sixth aspect, some embodiments of the present application provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor is configured to implement a method according to any embodiment of the first or second aspects when executing the program.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a component architecture diagram of a related art image rendering system provided in an embodiment of the present application;
fig. 2 is a flowchart of image rendering processing performed by a CPU according to the related art provided in the embodiment of the present application;
FIG. 3 is a flow of image rendering processing performed by a CPU provided by some embodiments of the present application;
FIG. 4 is a flow chart of a method of image rendering performed by a CPU provided in an embodiment of the present application;
FIG. 5 is a flowchart of a method for image rendering performed by a GPU according to an embodiment of the present application;
fig. 6 is a memory unit provided on a GPU according to an embodiment of the present application;
FIG. 7 is a block diagram of a CPU provided in an embodiment of the present application;
FIG. 8 is a block diagram illustrating a GPU provided in an embodiment of the present application;
fig. 9 is a schematic diagram of electronic device composition according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
Taking a game scene as an example to exemplarily illustrate the technical problems existing in the existing rendering system.
In the mainstream game engine, the CPU carries the task of driving the whole engine to run, and it will update several times per second, and in each update, the rendering system, physical system, collision detection system, sound effect, script engine, animation, artificial intelligence, network system, and scene management system will be updated, where the rendering system is the most important one, because the picture presented on the display is the most intuitive feeling that a product can bring to the user.
A common technical solution for displaying an object in a game is to first construct the game objects on the CPU side, load their model data, and then add them to the scene tree for management. Each model has the property of a bounding box (bounding box refers to a cube or two-dimensional rectangle that can hold an object, a type of bounding volume), which is the smallest cube that encloses the model. In the updating of each frame, traversing all models through a scene tree, performing coarse-granularity visibility rejection on the models according to bounding boxes through camera parameters (namely, achieving the visibility rejection by taking the models as rejection units), and then selecting a model LOD (LOD: one model can be represented by data with different precision, and each data precision is one level LOD) according to the distance from the camera. And submitting vertex data, index data and material data of one model to the GPU through an interface of the GPU to finish the initiation of rendering once. The GPU then performs the real rendering again. If the number of models is particularly large, the above-described flow is performed numerous times, including one model providing data to the GPU. If the data of a single model is very large, all model data is submitted to the GPU, no matter how large it is available for display within the camera.
Referring to fig. 1, fig. 1 is an image rendering system provided in an embodiment of the present application, where the system includes a central processing unit 100 and an image processor 200, and functions of the central processing unit (i.e. CPU) in image rendering include: and calculating the drawing object and sending a rendering command to a graphic processing unit (namely GPU), wherein the GPU draws the object according to the rendering command of the CPU.
The process of image rendering by the co-implementation of the CPU and GPU in the related art is exemplarily described below with reference to fig. 2.
As shown in fig. 2, in performing image rendering, the central processor is configured to perform the following operations:
first, obtaining model data, namely obtaining grid data of each model.
The mesh data of each model includes: vertex coordinates, normals, texture coordinates, triangle drawing sequences, and other useful attribute information. The 3D model is formed by splicing polygons, and the polygons are formed by splicing a plurality of triangles in practice, that is, the surface of a 3D model is formed by a plurality of triangular faces connected with each other, and in the three-dimensional space, the set of points and edges forming the triangles is Mesh data of the model.
And secondly, removing the scene tree based on the model data, and removing the whole model which does not need to be drawn by the image processor.
The scene tree-based visibility rejection is that the scene octree is rejected through the object bounding box, and the rejection process takes the whole model as a rejection unit. For example, the first model and the second model are culled by scene tree culling.
Thirdly, obtaining n models passing through rejection
That is, by performing the second step of scene tree culling, n models remain to pass culling, requiring further rendering.
Fourth, setting the state of the renderer.
And fifthly, uploading rendering data, namely uploading rendering data of one model in the n models, wherein the rendering data are grid data corresponding to the model.
And sixthly, starting rendering the model with the rendering data uploaded in the fifth step through the GPU.
And judging whether rendering of the n models is finished, and if not, repeating the fourth step to the sixth step until the rendering of the n models is finished. The non-inventors of the present application found in the study that in the related art, uploading the GPU with the rendering data of each model requires n times of invoking the GPU interface, and the amount of data to be transmitted to the GPU is huge with the whole model as the scene tree rejection unit and the rejection granularity is coarse.
At least to solve the above technical problem, some embodiments of the present application further need to perform back-face elimination on a model after scene tree elimination in a cluster unit on the CPU side, and the amount of data to be transmitted after elimination is significantly reduced, for example, if only a part of a model is visible, the embodiment of the present application only needs to transmit a cluster corresponding to the visible part, and in the related art, since only a part of the model is visible in a model unit, the amount of data to be transmitted to the GPU is significantly greater. In other embodiments of the present application, the block update policy of the rendering data remaining after the back-side culling in clusters is also used as an update unit to reduce the amount of rendering data transmitted to the GPU. In some embodiments of the present application, rendering data required by all models is formed into a data to which to-be-updated rendering data of a plurality of models is transmitted through a call GPU interface, and compared with independent transmission of independent call interfaces of the models in the prior art, the call times of the interface in some embodiments of the present application are obviously reduced, and the processing speed is improved.
For example, in some embodiments of the present application, the image rendering process executed by the CPU is shown in fig. 3, unlike fig. 2, in some embodiments of the present application, after the model data (i.e. the mesh data) of each model is read, a plurality of threads are started to divide each model data into clusters (i.e. into a plurality of clusters, each cluster contains a set number of triangles), and if cluster data of a corresponding model already exists, only direct reading is needed without starting the threads to split the model data. In fig. 3, after the scene tree is removed, a plurality of threads are started for the remaining n models to remove the back of the cluster, that is, the back of the cluster is started by the plurality of threads, and compared with fig. 2, the rendering data required to be transmitted after the back of the cluster is removed is obviously reduced. In fig. 3, the rendering data of the clusters remaining after the back surface rejection is subjected to a block (one cluster corresponds to one block) update strategy, and fig. 3 provides one data of all model data to the image processor, namely, provides the block update data corresponding to a plurality of clusters, which needs to be updated, of all models to the image processor.
The method of image rendering performed by the central processor in some embodiments of the present application is exemplarily described below in conjunction with fig. 4.
As shown in fig. 4, some embodiments of the present application provide a method for image rendering, applied to a CPU processor, the method including:
s110, acquiring an initial cluster set corresponding to each model to be rendered, wherein the initial cluster set is obtained by splitting grid data of the corresponding model.
It should be noted that, each model to be rendered related to S110 belongs to a model in the ith frame to be updated or loaded, and the model to be rendered is a model after the scene tree rejection processing, that is, S110 is a model to be rendered remaining after the scene tree rejection in fig. 2 or fig. 3. That is, in some embodiments of the present application, before the initial cluster set corresponding to each model to be rendered in the ith frame is acquired in S110, the method includes: and performing model elimination based on a scene tree on the original data corresponding to the ith frame to obtain each model to be rendered. Some embodiments of the present application require model elimination through a scene tree, after which the models to be rendered are obtained, and then cluster-based cluster back elimination is performed on the models.
Clusters are split from the network data of the model, and the shape included in the resulting cluster is not changeable since the model is unchanged. For example, in some embodiments of the present application, where the mesh data corresponding to the model includes a plurality of triangles, each cluster of the initial cluster set includes a set number of triangles, and the number of triangles included for a general model cluster is less than the number of triangles corresponding to the model.
And S120, respectively carrying out back face rejection on the initial cluster set corresponding to each model to be rendered by taking the cluster as a rejection unit to obtain a target cluster set corresponding to each model to be rendered.
And S130, providing cluster data to be rendered corresponding to the target cluster set for an image processor so that the image processor performs drawing processing.
It is to be understood that, in the embodiment of the present application, cluster back surface culling in cluster units is also performed on the model to be rendered remaining after scene tree culling, so that the amount of data transmitted to the GPU is obviously reduced, and the data transmission amount is reduced while the data transmission speed is increased. The clusters are obtained after grouping the grid data corresponding to the model, and the grouping principle is to ensure that each cluster comprises a set number of triangles, for example, each cluster comprises 1000 triangles, each cluster comprises 128 triangles, and the clusters of less than 128 triangles can be ensured to reach 128 triangles by adding related data.
The implementation of the steps in fig. 4 is exemplarily set forth below.
As shown in fig. 3, in some embodiments of the present application, the first model to be rendered belongs to a model to be rendered loaded in the i-th frame, and the obtaining, in S110, an initial cluster corresponding to each model to be rendered in the i-th frame includes: starting a splitting thread; and splitting the grid data corresponding to the first model to be rendered into multiple clusters through the splitting thread to obtain an initial cluster set corresponding to the first model to be rendered. That is, some embodiments of the present application require that a new thread be started for a model newly loaded in a frame to be rendered to split mesh data corresponding to the model, resulting in an initial cluster set composed of a plurality of clusters.
It should be noted that, in some embodiments of the present application, grid data corresponding to the models is split by establishing a thread to obtain a plurality of clusters corresponding to each model, so that back surface culling can be implemented on the CPU side instead of model culling based on a scene tree only.
Because the scene tree rejection is required to be performed for each frame of image, in order to avoid frequent clustering of a certain model, in some embodiments of the present application, clustering results of grid data for a certain model are stored in a memory, and when back face rejection is required for the model, corresponding clustering data (i.e., an initial cluster set) is directly read.
That is, in some embodiments of the present application, the second model to be rendered belongs to a model to be rendered loaded in a history frame (it is known that the model has been clustered to obtain an initial cluster set and stored), where the obtaining the initial cluster set corresponding to each model to be rendered in the i-th frame in S110 includes: and reading an initial cluster set corresponding to the second model to be rendered from a storage unit.
It should be noted that, the set number in S110 is related to the size of the buffer unit set by the image processor for the streaming multiprocessor SM. For example, the buffer unit size is a, each cluster includes k×a triangles, and k is a positive integer. Some embodiments of the present application determine the size of each cluster according to the hardware characteristics of the CPU, and improve the processing speed of the clusters on the GPU. Each cluster of some embodiments of the present application is handled by one thread group (SM) of one GPU, and each triangle is handled by one thread, so a fixed number of triangle faces is required for the clusters in order to guarantee full utilization of computing power. For example, the set number of some embodiments of the present application is 128, i.e., in some embodiments of the present application a cluster includes 128 triangles, which is determined based on different graphics cards and project actual performance measurements.
The implementation of S120 is exemplarily set forth below.
In some embodiments of the present application, the third model to be rendered belongs to any one of the i-th frames, and the process of performing back face rejection on the initial cluster set corresponding to each of the models to be rendered with a cluster as a rejection unit to obtain the target cluster set corresponding to each of the models to be rendered in S120 includes: acquiring a test cone and cone opening angles of each cluster in an initial cluster set corresponding to the third model to be rendered; and testing the corresponding clusters according to the test cone and the cone opening angle, and taking the corresponding clusters as clusters in the target cluster set if the test passes. Some embodiments of the present application perform back face culling by acquiring test cones and cone opening angles for each cluster, and then performing back face culling according to the information.
It should be noted that, if S130 provides the GPU with the cluster data to be rendered, if all the data is uploaded each time, the bandwidth is too large to cause a blocking, so some embodiments of the present application use a block update policy in units of updated clusters. For example, in some embodiments of the present application, the first cluster is a cluster that passes the test, wherein S130 illustratively comprises: if the image processor is not provided with a storage unit for storing the cluster data to be rendered corresponding to the first cluster, distributing a storage address on the image processor for the cluster data to be rendered corresponding to the first cluster, and taking the cluster data to be rendered corresponding to the first cluster as block updating data; repeating the above process to obtain the block update data of all the clusters passing the test, and sending all the block update data to the image processor. Some embodiments of the present application reduce the bandwidth required to transmit data by identifying clusters that need to be updated and then providing the data corresponding to those clusters to the GPU.
The specific types of rendering data (i.e., cluster data to be rendered) that the CPU transmits to the GPU are exemplarily set forth below.
In some embodiments of the present application, the cluster data to be rendered includes at least one of the following information: cluster bounding box information (for example, the cluster bounding box information comprises a bounding box center position and a bounding box radius), cluster vertex information, cluster index information, cluster material information and cluster rotation scaling offset matrix information, wherein the cluster bounding box information is used for view cone rejection by an image processor, the view cone rejection is carried out by taking a cluster as a rejection unit, the cluster rotation scaling offset matrix information is used for recording the spatial position of a corresponding cluster, and the image processor obtains coordinates of each vertex in the corresponding cluster on a screen according to the matrix information. In some embodiments of the present application, the to-be-rendered cluster data that needs to be sent to the GPU includes various information of the clusters and cluster vertex data, so that the GPU may render objects with the clusters, thereby implementing high-speed rendering.
The manner in which cluster material information is obtained is exemplarily described below.
In some embodiments of the present application, the cluster material information is obtained from a material information candidate set, the material information candidate set including a plurality of material identification numbers, the plurality of material identification numbers being obtained by: traversing the material properties of all objects in each frame, combining the same material information, and allocating a unique material identification number to each combined material, wherein the image processor compares the material information with material information to be rendered carried by an ith rendering command to complete pixel coloring (for example, according to the characteristics of pixel depth comparison contained in GPU hardware in some embodiments of the application, the material information is compared as a depth value to reject the material pixel by pixel, and the passing material is subjected to coloring), and the material information to be rendered corresponds to one material in the ith frame. For example, in some embodiments of the present application, the image processor compares the material information to be rendered carried by the ith rendering command as a depth value with the material information recorded in the pixel to be rendered on the screen to complete the pixel coloring process. Some embodiments of the present application may respectively assign a unique identifier to each different material to obtain a material identifier (or referred to as a material identifier), and then the drawing command is also distinguished by the different material identifier, where each rendering of the GPU is driven by the different material identifier.
With the related art, each rendering command is sent by taking each model as an object, and some embodiments of the present application send each rendering command by taking different material information as an object. For example, if it is determined that all the target cluster sets correspond to k models, where the k models correspond to m materials, the rendering commands in some embodiments of the present application include m, and each rendering command carries a material identifier.
For example, in some embodiments of the present application, after S130, the method further comprises: and sending a rendering command to the image processor by taking any material identifier included in the ith frame as a parameter, wherein the any material identifier is used for representing one of the identifier numbers corresponding to all different materials included in the target cluster set, and the image processor determines whether to color the corresponding pixel point by comparing the recorded material identifier number of the pixel point in the ith frame with the any material identifier. Some embodiments of the present application provide for the CPU to send different drawing commands to the GPU with different material identifications.
The following illustrates how a GPU performs image rendering according to a rendering command carrying a material identifier and received cluster data to be rendered.
As shown in fig. 5, some embodiments of the present application provide an image rendering method, applied to a GPU processor, including: s210, receiving cluster data to be rendered of an ith frame obtained by adopting any embodiment of the figure 4; s220, receiving a rendering command carrying a j-th material identification number, wherein the j-th material identification number is used for representing one material in the i-th frame, and different materials of the i-th frame are represented by different identifications; and S230, drawing is completed according to the rendering command and the cluster data to be rendered. Some embodiments of the present application draw through the to-be-rendered cluster data and drawing instructions of each cluster received from the CPU.
The storage unit of the received cluster data to be rendered on the GPU is exemplarily set forth below.
The image processor is provided with a ClusterBox buffer, a vertex data storage unit VertucesBuffer, an index information storage unit IndexBuffer and a matrix material storage unit MatWBBuffer which are all shown in the figure 6. As described above, the cluster data to be rendered transmitted by the central processor includes: the cluster bounding box information of each cluster included in the cluster data to be rendered is stored in a cluster bounding box storage unit, all cluster vertex data included in the cluster data to be rendered is stored in a vertex data storage unit, the cluster index information of all clusters included in the cluster data to be rendered is stored in an index information storage unit, and the material identification information of all clusters included in the cluster data to be rendered and the matrix information of all clusters included in the cluster data to be rendered are stored on a matrix material storage unit. Some embodiments of the present application store cluster data to be rendered from each cluster of a CPU by providing a plurality of storage units on a GPU.
The following exemplifies the implementation procedure of S230.
In some embodiments of the present application, the process of completing drawing according to the rendering command and the cluster data to be rendered includes: taking the clusters as a rejection unit, performing view cone rejection on each to-be-rendered model according to the to-be-rendered cluster data to reject clusters in a target cluster set corresponding to each to-be-rendered model to obtain a target to-be-rendered cluster set corresponding to each to-be-rendered model, wherein the target cluster set is obtained by performing cluster back rejection on an initial cluster set, and each cluster in the initial cluster set is obtained by performing clustering processing on all triangles included in grid data of the first to-be-rendered model; and drawing according to the target cluster set to be rendered and the rendering command.
For example, in some embodiments of the present application, the cluster data to be rendered includes: cluster bounding box information of each cluster corresponding to a first model to be rendered, wherein the performing view cone rejection on each model to be rendered according to the cluster data to be rendered comprises the following steps: and performing view cone rejection according to the cluster bounding box information of each cluster to obtain the target cluster set to be rendered. Some embodiments of the present application perform view cone culling by receiving cluster bounding box information from clusters of a CPU.
Some embodiments of the present application determine whether the material identifier of each pixel point is equal to the identifier carried by the current rendering command, so as to complete the coloring of the pixel point. For example, in some embodiments of the present application, the drawing according to the target cluster set to be rendered and the drawing instruction includes: obtaining material identification numbers of all pixel points on a screen according to cluster data to be rendered corresponding to the target cluster to be rendered; and comparing the j-th material identification number with the material identification number of at least part of the pixel points to complete coloring of the pixel points. According to the method and the device, whether the similar points are colored is determined by comparing whether the material identification numbers of the pixel points on the screen are the same as the j-th material identification numbers carried by the corresponding drawing commands, so that the pixel point coloring taking the material identification as parameters is realized, and compared with the pixel point coloring scheme taking a model as a unit in the related art, the number of times of calling a GPU interface by a CPU is reduced.
In order to further compare the number of pixel points with the material identifier carried by the rendering command, some embodiments of the present application also prefer to partition each pixel point on the screen, determine the coverage area of each block of material identifier, and only the block including the material identifier carried by the rendering command is used as the block to be compared with the material identifier of the pixel point, so as to increase the pixel coloring speed.
For example, in some embodiments of the present application, the coloring by comparing the j-th texture identification number with the texture identification number of at least some pixels may exemplarily include: dividing the screen into a plurality of blocks and determining the coverage range of the material identification number corresponding to each block; selecting at least one block from the plurality of blocks as a target block to be rendered according to the j-th material identification number and the coverage area, wherein the j-th material identification number is positioned in the coverage area corresponding to the target block to be rendered; and comparing the j-th material identification number with the material identification number of each pixel point on the target block to be rendered, and finishing coloring. According to the method and the device, the screen is divided into a plurality of blocks, the corresponding blocks which can be colored by the j-th material identification number are identified, and the coloring speed of the pixel points is improved.
That is, in some embodiments of the present application, the drawing according to the target set of clusters to be rendered and the rendering command includes: obtaining material identification numbers of all pixel points on a screen according to cluster data to be rendered corresponding to the target cluster to be rendered; dividing the screen into a plurality of blocks and determining the coverage range of the material identification number corresponding to each block; selecting at least one block from the plurality of blocks as a target block to be rendered according to the j-th material identification number and the coverage area, wherein the j-th material identification number is positioned in the coverage area corresponding to the target block to be rendered; and comparing the j-th material identification number with the material identification number of each pixel point on the target block to be rendered, and finishing coloring.
The process of vertex rendering and pixel rendering is exemplarily described below.
In some embodiments of the present application, the drawing according to the target cluster set to be rendered and the drawing instruction includes: and in the vertex drawing process, acquiring coordinates of each vertex corresponding to all target clusters to be rendered on a screen, and drawing corresponding vertices on the screen according to the coordinates. According to the method and the device for drawing the vertex in the visual cone, all the remaining vertices are drawn after the visual cone is removed at one time, and the efficiency of vertex drawing is improved in a vertex drawing mode taking a model as a unit in a related technology.
In some embodiments of the present application, the drawing according to the target cluster set to be rendered further includes: in the pixel coloring process, clustering is carried out according to the target to be rendered to obtain a visible image and a material image, wherein the size of the visible image is equal to the screen resolution, each pixel point on the visible image stores a mapping sampling coordinate and a normal value of a triangle where the corresponding pixel point is located, the size of the material image is equal to the screen resolution, and each pixel point on the material image stores a material identification number of the triangle where the corresponding pixel point is located; generating a texture scope graph according to the texture graph (data corresponding to the graph is stored in the materialrange texture of fig. 6), wherein the texture scope graph is obtained through the following strategy: dividing the screen into a plurality of blocks and storing a first material identification value and a second material identification value of the corresponding block on each block, wherein the first material identification value and the second material identification value are used for representing a material identification range covered by all material identifications on the corresponding block; comparing the j-th material identification number with each block on the material range diagram respectively, and determining the block which is positioned on the material range diagram and matched with the j-th material identification number as a block to be rendered of the current target; obtaining a block image of a block corresponding to the block to be rendered from the material map, and obtaining a target block image; and finishing coloring the pixel points by comparing the j-th material identification number with the value of each pixel point on the target block image. According to the embodiment of the application, the technical scheme of coloring the partitioned pixel points and determining whether to color a certain pixel point by using the material identification is realized by constructing a plurality of images (namely, storing the data of each pixel point of the plurality of images), so that the coloring speed of the pixel points is improved.
The process of acquiring a visible graph (data corresponding to the visible graph is stored in the VisibilityTexture of FIG. 6) is exemplarily described below.
In some embodiments, the clustering to be rendered according to the target to obtain a visible graph includes: deleting cluster vertex data corresponding to a rejected cluster from all cluster vertex data included in the cluster data to be rendered to obtain target cluster vertex data, wherein the rejected cluster is obtained through view cone rejection operation; deleting cluster index information corresponding to the rejected clusters from cluster index information of all clusters included in the cluster data to be rendered to obtain target cluster index information; and obtaining the visible graph according to the target cluster vertex data and the target cluster index information. Some embodiments of the present application obtain target cluster data to be rendered by deleting part of data (i.e., data corresponding to vertices of clusters that are eliminated by view pyramids) carried in cluster data to be rendered from a CPU, and construct a visible graph according to the data.
The process of acquiring a texture map (data corresponding to a visual map is stored in the MaterialIDTexture of fig. 6) is exemplarily described below.
In some embodiments of the present application, the clustering to obtain a texture map according to the target to-be-rendered cluster includes: deleting material identification information corresponding to the removed clusters from material identification information of all clusters included in the cluster data to be rendered, deleting matrix information corresponding to the removed clusters from matrix information of all clusters included in the cluster data to be rendered, and obtaining a target cluster matrix and material information; and obtaining the texture map according to the target cluster matrix, the texture information and the target cluster index information. Some embodiments of the present application obtain target cluster data to be rendered by deleting part of data (i.e., data corresponding to vertices of clusters removed by view cones) carried in cluster data to be rendered from a CPU, and construct a texture map according to the data.
The data stored in each memory cell corresponding to fig. 6 is exemplarily described below.
The cluster bounding box storage unit clusteringbox buffer of fig. 6 stores { bounding box center position, bounding box radius } data for each cluster.
The vertex data storage unit vertical buffer of fig. 6 stores vertex data { position, normal, map sampling coordinates } of all clusters, for example, if each cluster contains 128 triangles (i.e., network data of the model is divided according to 128 triangles), the vertex data of each cluster has 384 at most (because there are common vertices), and each group of data is shifted according to 384.
The index information storage unit index buffer of fig. 6 stores cluster index data of all clusters. For example, if each cluster includes 128 triangles, each cluster has 384 indexes, shifting each set of data by 384.
The matrix material memory unit MatWBuffer of fig. 6 stores { matrix, material id } data for each cluster.
The clipped cluster vertex number storage unit clusterischlnstancecountbuffer of fig. 6 is used to store the vertex number of each cluster (for example, 128×3=384 when each cluster includes 128 triangles) and fill out the cluster number clipped by the GPU view cone.
The clipped cluster of FIG. 6 identifies the memory location ClusterMeshResultBuffer, stores the id of the cluster clipped by the GPU view cone, which is in one-to-one correspondence with the cluster location.
The texture of the map sample coordinate and normal value storage unit of fig. 6 is texture, that is, the size of the texture is equal to the screen resolution, each pixel stores a value, and the texture stores the map sample coordinate and normal value of the triangle where the pixel is located in each pixel.
The first texture identifier storage unit of FIG. 6, material ID texture, stores, for each pixel, the texture id value (or texture identifier) of the triangle in which the pixel is located.
The block texture identification number range storage unit materialrange texture of fig. 6, for example, divides a screen into 16 by 16 blocks, and this texture stores the maximum and minimum values of the texture identification number id in each block.
The second texture identification number storage unit vdepthttexture of fig. 6 stores the same values as the MaterialIDTexture, but in a different mapping format, which is a mapping dedicated to depth comparison, and the MaterialIDTexture is a rendering object mapping.
The process of pixel rendering is described below in connection with the example of fig. 6.
In the first step, the clusterinstancecountbuffer is cleaned up, i.e. the last frame of data stored in the clusterinstancecountbuffer is emptied.
And secondly, reading data in the ClusterBox buffer for view cone rejection, and respectively storing the rejected residual data into the emptied ClusterMeshInstanceCountBuffer and the emptied ClusterMeshResultBuffer.
Third, the drawIndirect uses the input in ClusterMeshResultBuffer as an input parameter, reads VerticesBuffer, indexBuffer and the data in MatWBuffer to generate the data in VisibilityTexture and MaterialTexture.
Fourth, data in the materialRangeTexture is generated from the materialIDTexture.
Fifth, copy the data in the MaterialIDTexture to vdepthttexture.
Sixth, all material examples are drawn in full screen, the blocks which do not meet the requirements (the blocks obtained by partitioning the screen) are removed through the materialRangeTexture in the VS stage, the depth is compared through the data in the VDepthTexture in the ps stage, and coloring is completed according to the data in the VisibilityTexture.
An image rendering method performed by the main thread and the branch thread in cooperation with the CPU and the GPU is exemplarily described below with reference to fig. 7, in which the set number of triangles is 128.
1. The basic unit of the logic object in the image rendering engine is a mesh rendering component, in which vertex information of the model and material information are stored. A flag bit is added to the component to indicate whether the model is rendered by GPU driver. When the flag is located at the wire:
1.1 the whole mesh model (mesh) is split into clusters of triangles, each containing 128 triangles. 128 can allow the granularity of each Cluster to be better adapted to the Cache size of Vertex Processing stages. The reason for the splitting is that the model may be very large, and the conventional rejection method can only reject the model with minimum granularity, and after splitting the model into smaller units, only a part of the model can be displayed, so that the rendering pressure can be reduced. Each mesh contains a GUID (globally unique id), and the mesh and the corresponding cluster result after splitting the mesh are stored through a clusteritamanager (because different game objects may share the same model, storing the corresponding relationship can avoid multiple redundant data). If the split cluster data can be found, the step 1.2 is entered. The cluster data corresponding to each model is also managed by the reference count, and the data is deleted when the reference count is zero, so that the data security can be ensured.
1.2 if the corresponding data cannot be found, a thread is independently started, and splitting of model vertex data (mesh- > cluster) is carried out. Because the vertex data of the model can be very large, placing the model in main thread split can cause the system to clip.
1.3 after the partitioning is completed, the mertrendenercomponent is notified by callback, because each cluster corresponds to a mesh, attribute information ClusterExtraData belonging to a logical object is also added to the cluster. Which contains bounding box information, texture id, offset in gpubbuffer (GPUOffset), matrix information to scale the offset, and test cone and cone opening angle (for CPU back side culling) for each cluster.
For example, the stored clusters and their logical objects are shown in Table 1 below. Each module (or called a logic object) corresponds to one grid identifier, and table 1 includes three columns, where the first column corresponds to the network identifier of each model, that is, the first grid identifier corresponding to the first model and the second network identifier corresponding to the second model, and the second column corresponds to each cluster obtained by splitting the corresponding model, for example, the cluster corresponding to the first grid identifier is k clusters, that is, k clusters are obtained by splitting the first model, and the cluster rendering data of the first cluster, the cluster rendering data of the second cluster, and … … are respectively up to the cluster rendering data of the kth cluster. The third column of table 1 lists cluster rendering data for each cluster, including cluster bounding box information, cluster material identification numbers, cluster offset in GPUbuffer, cluster rotation offset matrix information, and cluster test cone and cone opening angles.
TABLE 1
After Cluster data preparation is completed, the MeshRendernerComponent is added to the asynchronous processing queue of GPUDrivenM manager. The following needs to perform back-side culling and data sorting on all logical objects that need GPU-driven rendering. In the GPUDrivenMANAGER, several double buffers are used to store vertex data, new data, texture data and abandoned data indexes of each frame. The adoption of double buffers is that a thread is newly started because the process of data arrangement is very time-consumingTo sort the dataFor example, if the main thread uses data with id 0 in the double buffer, the thread processing the data modifies the data with id 1, and exchanges ids after the processing is completed. Thus, the data security of the main thread can be ensured.
2.1 in the newly opened thread, updating the list of the rendering objects in each frame, deleting the objects which cancel the GPU driving, and recording the offset positions of the objects in the GPUBuffer.
2.2 traversing the material property of all the objects, combining the same material information, and assigning a unique material id to each material. The material id is float, but the storage and data transmission are carried out by using the int format, so that the precision loss in the transmission process is prevented.
2.3 traversing all objects, increasing the reference count of the resource (because this step is performed by multiple threads, by which it can be ensured that when one thread uses this resource, other threads will not release it, and only when the reference count is 0 will the resource be deleted), using the test cone and cone opening angle generated by 1.3 to backface all clusters, if the tested clusters are empty, it is represented as new data, and an address is fetched from the useless data list to be allocated to it. If GPUOffset is not null, no processing is required. Data that does not pass the test, if GPUOffset is not empty, then empty and put the value into the garbage list, waiting for the next allocation. And simultaneously combining all the vertex data into a large buffer, and recording the offset of the newly added and deleted data in the buffer. Because the data is uploaded from the CPU to the GPU in the main thread by using block update, if all the data is uploaded each time, the bandwidth is too large, which causes a jam. If the size of the previously applied GPUBuffer is not sufficient, a full block re-upload is required.
2.4 reduce the reference count of the resource, marking data available (m_needledUpdate= true)
3. In each frame update of the main thread, if m_newupdate is true, the data of the current frame is fetched from the double buffer, and the GPU data is subjected to block update. Then, various drawing commands of the computer loader, the vertex loader, and the fragment loader are prepared to initiate rendering. The data processing thread is notified at the end of this frame to start the next 2-step process.
4. The rendering flow is that the GPU renders based on a vulkan platform and is divided into the following pass to perform
4.1, clearing the data module, and cleaning the data for ClusterMeshInstanceCountBuffer. ClusterMeshInstanceCountBuffer is a VkDawIndirectCommand structure that stores vertex number, instance number, vertex offset, instance offset.
4.2, a view cone eliminating module, wherein in the step 3, the CPU uploads bounding box information and position information of all the clusters to a ClusterBoxBuffer of the GPU, the loader traverses all the cluster information, the number of clusters passing the test is recorded in the ClusterMeshInstanceCountBuffer through atomic operation by view cone elimination, and the id in the ClusterBuffer corresponding to each cluster passing the test is recorded in the ClusterMeshResultBuffer.
4.3, drawing the vertex module, this pass, the CPU has uploaded VerticesBuffer, indexBuffer, matWBuffer in step 3 (2.3.4 buffer in the above figure). And in the vertex loader stage, the ClusterMeshInstanceCountBuffer obtained through 4.2 calculation is submitted to the GPU to set the rendering times, an index id is taken out from the index Buffer according to the id recorded by the ClusterMeshResultBuffer in the loader, the vertex attribute is taken out from the VerticeBuffer by the index id, and then the vertex attribute is multiplied by matrix information recorded in the MatWBBuffer to obtain the real coordinates of the vertex. And simultaneously, transmitting coordinates, normals and uv data to the fragment loader. The interpolated data is output to several graphs, visibilityTexture, materialIDTex and depth map, respectively, in the fragment loader. VisibilityTexture is a map in the rgb32 format, where the rg channel stores the uv value of the pixel, and the b channel stores the compressed normal value (the normal is compressed from rgb32 to r 32). The materialiditex stores the material id calculated in step 2.2 corresponding to the pixel. The depth map stores the vertex depth corresponding to the pixel point and is used for carrying out depth comparison when other objects are drawn later.
4.4, the screen material blocking module has great influence on the performance when full-screen coloring of all materials is carried out in the step 4.6, so that the blocking processing is carried out on the materialIDTex in the step, and the screen can be colored by dividing pixel blocks in the step 4.6, thereby reducing the cost. The current strategy is to divide the screen into 16×16 blocks, and start 16×16 threads in the loader for operation. In each thread, traversing all pixels in the current block, sampling the materialiditex according to the address of each pixel, recording the maximum materialID and the minimum materialID of the block, and storing the maximum materialID and the minimum materialID in the MaterialRangeTex.
4.5, copying the material id to the depth map module, because the common rendering object and the depth map are not in a format and can not be converted with each other, the pass samples each pixel of the material idtex, writes the material id value into a new depth map (vdepthttext), and the depth map is used for comparing the material id as the depth value in step 4.6, and the coloring is performed until the depth values are equal.
4.6, coloring module, which only has one command in the figure, but may actually be composed of a plurality of commands, the number of which is determined by the number of materials, and each material generates a command. In the vertexShader stage, vertex data are input into grids dividing a full screen into 16 x 16 blocks, wherein the z value of the vertex position of each grid is the material id, in the calculation stage, each vertex samples the material RangeTex, if the material id recorded in the z value is in the range of the sampled material id, the material id is continuously output to the rasterization stage, otherwise, the NAN value is set, and then the vertex is removed in the rasterization stage. In the fragmentloader stage, the depth map is set to vdepthttexture in 4.5, and since the z value of the vertex position also stores the texture id, only the texture belonging to this pixel point can pass the depth test after the depth comparison is set to equal. And then sampling VisibilityTexture according to screen space information by passing pixel points, taking out uv value and normal value of the pixel points, performing normal PBR rendering subsequently, outputting attributes such as metaliness, roughness and the like to GBuffer, and connecting a traditional engine rendering pipeline.
In summary, some embodiments of the present application, when directed to very complex scenes and models, reduce the computational pressure of the CPU, making the real-time rendering of the product reach an unprecedented level. The computing capacity of the GPU is greatly developed, and the load of the system is balanced. Some embodiments of the present disclosure perform rejecting and rendering on model blocks (i.e., in clusters that are separated), which is equivalent to implementing automatic batch merging, and may perform very fine rejecting on oversized models, thereby improving rendering efficiency. Some embodiments of the present application address variations of materials, and may be compatible with rendering of all materials, thereby reducing the overhead incurred by rendering through a partitioning coloring technique. Some embodiments of the present application may process data in real-time in a multithreading manner for incoming models, and the parallelism of operation allows great flexibility for the system to accept a variety of resources without exhibiting stuck.
Referring to fig. 7, fig. 7 illustrates that the embodiment of the present application provides a central processing unit, which includes: an initial cluster acquisition module 701, a back face culling module 702, and a provision module 703.
The initial cluster acquisition module is configured to acquire an initial cluster set corresponding to each model to be rendered in an ith frame, wherein the initial cluster set is obtained by splitting grid data of the corresponding model, each cluster of the initial cluster set comprises a set number of triangles, and the model to be rendered is a model subjected to scene tree elimination processing.
The back face eliminating module is configured to eliminate the back face of the initial cluster set corresponding to each model to be rendered by taking the cluster as an eliminating unit to obtain a target cluster set corresponding to each model to be rendered.
And the providing module is configured to provide cluster data to be rendered corresponding to the target cluster set for an image processor so as to enable the image processor to conduct drawing processing.
It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding procedure in the foregoing method for the specific working procedure of the apparatus described above, and this will not be repeated here.
As shown in fig. 8, some embodiments of the present application provide an image processor, the processor comprising: a cluster data receiving module to be rendered 801, a rendering command receiving module 802, and a rendering execution module 803.
And the cluster data to be rendered receiving module is configured to receive cluster data to be rendered of an ith frame obtained by adopting the central processing unit.
The rendering command receiving module is configured to receive a rendering command carrying a j-th material identification number, wherein the j-th material identification number is used for representing one material in the i-th frame, and different materials of the i-th frame are represented by different identifications.
And the rendering execution module is configured to complete drawing according to the rendering command and the cluster data to be rendered.
It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding procedure in the foregoing method for the specific working procedure of the apparatus described above, and this will not be repeated here.
Some embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method according to any of the embodiments of the method of image rendering described above.
As shown in fig. 9, some embodiments of the present application provide an electronic device 900, including a memory 910, a processor 920, and a computer program stored on the memory 910 and executable on the processor 920, where the processor 920 reads the program through a bus 930 and executes the program to implement the method as described in any of the embodiments of the method for image rendering described above.
The processor 920 may process the digital signals and may include various computing structures. Such as a complex instruction set computer architecture, a reduced instruction set computer architecture, or an architecture that implements a combination of instruction sets. In some examples, the processor 920 may be a microprocessor.
Memory 910 may be used for storing instructions to be executed by processor 920 or data related to execution of instructions. Such instructions and/or data may include code to implement some or all of the functions of one or more modules described in embodiments of the present application. The processor 920 of embodiments of the present disclosure may be used to execute instructions in the memory 910 to implement the methods shown in fig. 4 and 5. Memory 910 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memory known to those skilled in the art.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for rendering an image, applied to a CPU processor, the method comprising:
acquiring an initial cluster set corresponding to each model to be rendered, wherein the initial cluster set is obtained by splitting grid data of the corresponding model, each cluster of the initial cluster set comprises a set number of triangles, each model to be rendered belongs to a model in an i frame to be updated or loaded, the model to be rendered is a model subjected to scene tree elimination processing, and the set number is related to the size of a cache unit set by an image processor for a streaming multiprocessor SM;
respectively carrying out back face rejection on initial cluster sets corresponding to the models to be rendered by taking the clusters as rejection units to obtain target cluster sets corresponding to the models to be rendered;
providing cluster data to be rendered corresponding to the target cluster set to the image processor, and sending a rendering command to the image processor by taking any material identification number included in the ith frame as a parameter, so that the image processor determines whether to color the corresponding pixel point by comparing the recorded material identification number of the pixel point in the ith frame with any material identification;
The cluster data to be rendered comprises cluster material information, the cluster material information is obtained from a material information alternative set, the material information alternative set comprises a plurality of material identification numbers, and the plurality of material identification numbers are obtained through the following strategies: traversing the material property of all objects in each frame, combining the same material information, and allocating a unique material identification number for each combined material, wherein any material identification number is used for representing one of the material identification numbers corresponding to all different materials included in the target cluster set.
2. The method of claim 1, wherein,
the first model to be rendered belongs to the models to be rendered loaded in the ith frame, wherein the obtaining an initial cluster set corresponding to each model to be rendered includes:
starting a splitting thread;
splitting grid data corresponding to the first model to be rendered into multiple clusters through the splitting thread to obtain an initial cluster set corresponding to the first model to be rendered;
or,
the second model to be rendered belongs to the models to be rendered loaded in the history frame, wherein the obtaining the initial cluster set corresponding to each model to be rendered comprises the following steps: and reading an initial cluster set corresponding to the second model to be rendered from a storage unit.
3. The method of claim 2, wherein a third model to be rendered belongs to any one of the i-th frames, wherein,
the step of respectively carrying out back face rejection on the initial cluster set corresponding to each model to be rendered by taking the clusters as rejection units to obtain a target cluster set corresponding to each model to be rendered, comprising the following steps:
acquiring a test cone and cone opening angles of each cluster in an initial cluster set corresponding to the third model to be rendered;
testing the corresponding clusters according to the test cone and the cone opening angle, and taking the corresponding clusters as clusters in the target cluster set if the test passes;
wherein, the cluster data to be rendered further comprises at least one of the following information: cluster bounding box information, cluster vertex information, cluster index information and cluster rotation scaling offset matrix information, wherein the cluster bounding box information is used for performing view cone rejection by the image processor, the view cone rejection takes the cluster as a rejection unit, the cluster rotation scaling offset matrix information is used for recording the space position of a corresponding cluster, and the image processor obtains coordinates of each vertex in the corresponding cluster on a screen according to the cluster rotation scaling offset matrix information; and the image processor completes pixel coloring processing by comparing the cluster material information with the material information to be rendered carried by the rendering command.
4. A method according to claim 3, wherein the image processor determines whether to render the corresponding pixel by comparing the recorded texture identification number for each pixel in the i-th frame with any of the texture identification numbers.
5. A method for rendering an image, applied to a GPU processor, the method comprising:
receiving cluster data to be rendered of an ith frame in the cluster data to be rendered, which is obtained according to any one of claims 1 to 4;
receiving a rendering command carrying a j-th material identification number, wherein the j-th material identification number is used for representing one material in the i-th frame, and different materials of the i-th frame are represented by adopting different material identification numbers;
taking the clusters as a rejection unit, and performing view cone rejection on each model to be rendered according to the cluster data to be rendered to reject the clusters in the target cluster set corresponding to each model to be rendered, so as to obtain a target cluster set to be rendered corresponding to each model to be rendered;
drawing according to the target cluster set to be rendered and the rendering command;
the drawing according to the target cluster set to be rendered and the rendering command comprises the following steps:
Obtaining material identification numbers of all pixel points on a screen according to cluster data to be rendered corresponding to the target cluster to be rendered;
dividing the screen into a plurality of blocks and determining the coverage range of the material identification number corresponding to each block;
selecting at least one block from the plurality of blocks as a target block to be rendered according to the j-th material identification number and the coverage area, wherein the j-th material identification number is positioned in the coverage area corresponding to the target block to be rendered;
and comparing the j-th material identification number with the material identification number of each pixel point on the target block to be rendered, and finishing coloring.
6. The method of claim 5, wherein the target cluster set is obtained by performing back face elimination with clusters as elimination units on an initial cluster set, and each cluster in the initial cluster set is obtained by performing clustering processing on all triangles included in mesh data of a model to be rendered.
7. The method of claim 6, wherein the cluster data to be rendered comprises: cluster bounding box information of each cluster corresponding to any one to-be-rendered model, wherein the performing view cone rejection on each to-be-rendered model according to the to-be-rendered cluster data comprises the following steps:
And performing view cone rejection according to the cluster bounding box information of each cluster to obtain the target cluster set to be rendered.
8. The method of claim 6, wherein,
the drawing according to the target cluster set to be rendered and the rendering command further comprises:
in the course of the coloring of the pixels,
obtaining a visible image and a texture image according to the target to-be-rendered cluster, wherein the size of the visible image is equal to the screen resolution, each pixel point on the visible image stores a mapping sampling coordinate and a normal value of a triangle where the corresponding pixel point is located, the size of the texture image is equal to the screen resolution, and each pixel point on the texture image stores a texture identification number of the triangle where the corresponding pixel point is located;
generating a material range diagram according to the material diagram, wherein the material range diagram is obtained through the following strategy: dividing the screen into a plurality of blocks and storing a first material identification value and a second material identification value of the corresponding block on each block, wherein the first material identification value and the second material identification value are used for representing the range of material identification numbers covered by all materials on the corresponding block;
comparing the j-th material identification number with each block on the material range diagram, and determining the block which is positioned on the material range diagram and matched with the j-th material identification number as the target to-be-rendered block;
Obtaining a block image of a block corresponding to the target block to be rendered from the material map, and obtaining a target block image;
and finishing coloring the pixel points by comparing the j-th material identification number with the value of each pixel point on the target block image.
9. The method of claim 8, wherein clustering the target to be rendered into a visible graph and a texture graph comprises:
deleting cluster vertex data corresponding to a rejected cluster from all cluster vertex data included in the cluster data to be rendered to obtain target cluster vertex data, wherein the rejected cluster is obtained through view cone rejection operation;
deleting cluster index information corresponding to the rejected clusters from cluster index information of all clusters included in the cluster data to be rendered to obtain target cluster index information;
and obtaining the visible graph according to the target cluster vertex data and the target cluster index information.
10. The method of claim 8, wherein clustering the target to be rendered into a visible graph and a texture graph comprises:
deleting material information corresponding to the removed clusters from material identification information of all clusters included in the cluster data to be rendered, and deleting cluster rotation scaling offset matrix information corresponding to the removed clusters from cluster rotation scaling offset matrix information of all clusters included in the cluster data to be rendered to obtain a target cluster matrix and material information;
And obtaining the texture map according to the target cluster matrix and the texture information.
CN202310409599.3A 2023-04-18 2023-04-18 Image rendering method Active CN116152039B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310409599.3A CN116152039B (en) 2023-04-18 2023-04-18 Image rendering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310409599.3A CN116152039B (en) 2023-04-18 2023-04-18 Image rendering method

Publications (2)

Publication Number Publication Date
CN116152039A CN116152039A (en) 2023-05-23
CN116152039B true CN116152039B (en) 2023-07-21

Family

ID=86360351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310409599.3A Active CN116152039B (en) 2023-04-18 2023-04-18 Image rendering method

Country Status (1)

Country Link
CN (1) CN116152039B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117745518B (en) * 2024-02-21 2024-06-11 芯动微电子科技(武汉)有限公司 Graphics processing method and system for optimizing memory allocation
CN118364590B (en) * 2024-06-20 2024-11-12 国家超级计算天津中心 Visualization processing method, device and equipment for large-scale Euler domain simulation data
CN119206100B (en) * 2024-11-29 2025-05-30 浙江中控信息产业股份有限公司 A digital twin method based on real-time rendering
CN119206028B (en) * 2024-11-29 2025-05-16 浙江中控信息产业股份有限公司 Generation method of WebGPU real-time rendering pipeline

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10867429B2 (en) * 2018-08-10 2020-12-15 Nvidia Corporation Query-specific behavioral modification of tree traversal
CN111179402B (en) * 2020-01-02 2023-07-14 竞技世界(北京)网络技术有限公司 Rendering method, device and system of target object
CN112927339B (en) * 2021-04-01 2025-02-14 深圳市腾讯网络信息技术有限公司 Graphics rendering method and device, storage medium and electronic device
CN113178014B (en) * 2021-05-27 2023-06-13 网易(杭州)网络有限公司 Scene model rendering method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116152039A (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN112270756B (en) Data rendering method applied to BIM model file
CN116152039B (en) Image rendering method
CN110443893B (en) Large-scale building scene rendering acceleration method, system, device and storage medium
US7598952B1 (en) Three-dimensional image compositing on a GPU utilizing multiple transformations
US6529207B1 (en) Identifying silhouette edges of objects to apply anti-aliasing
CN112070909B (en) Engineering three-dimensional model LOD output method based on 3D Tiles
CN105261066B (en) A kind of three-dimensional geographic information system real-time rendering multithreading distribution and control method
US20090109219A1 (en) Real-time mesh simplification using the graphics processing unit
CN112001993B (en) A multi-GPU city simulation system for large scenes
Brabec et al. Shadow volumes on programmable graphics hardware
CN113900797B (en) Three-dimensional oblique photography data processing method, device and equipment based on illusion engine
WO2011035800A2 (en) Direct ray tracing of 3d scenes
CN101506847A (en) Methods and systems for partitioning a spatial index
CN117237502A (en) Three-dimensional rendering method, device, equipment and medium
US10846908B2 (en) Graphics processing apparatus based on hybrid GPU architecture
US20240203030A1 (en) 3d model rendering method and apparatus, electronic device, and storage medium
Richter et al. Out-of-core real-time visualization of massive 3D point clouds
Vasilakis et al. Depth-fighting aware methods for multifragment rendering
KR20160068204A (en) Data processing method for mesh geometry and computer readable storage medium of recording the same
CN115063497B (en) Point cloud data processing method and device
US7050053B2 (en) Geometric folding for cone-tree data compression
US20240312117A1 (en) System and method for real-time ray tracing in a 3d environment
KR102683773B1 (en) Method for merging layer based on virtual z-thickness in multi-fragment rendering, apparatus and computer program for performing the method
Kasik Visibility-guided rendering to accelerate 3d graphics hardware performance
Fu et al. Dynamic shadow rendering with shadow volume optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant