WO2025001846A1

WO2025001846A1 - Method and system for enhancing sense of reality of virtual model on basis of generative network

Info

Publication number: WO2025001846A1
Application number: PCT/CN2024/098707
Authority: WO
Inventors: 刘郴; 赫高峰; 王华民
Original assignee: Zhejiang Lingdi Digital Technology Co Ltd
Current assignee: Zhejiang Lingdi Digital Technology Co Ltd
Priority date: 2023-06-30
Filing date: 2024-06-12
Publication date: 2025-01-02
Anticipated expiration: 2025-12-30

Abstract

Provided in the present disclosure are a method and system for enhancing the sense of reality of a virtual model on the basis of a generative network. The method comprises: acquiring a rendered image of a virtual model wearing digital clothing; determining whether parsing information about the digital clothing in the rendered image is known; if the parsing information is unknown, performing extraction on the basis of a preset algorithm; converting the parsing information and then inputting converted parsing information into a generative network; under the control of a generative control condition, making the generative network generate a model character with a stronger sense of reality in a model area on the basis of model information; and converting the model character to obtain a generated image of a model wearing the digital clothing. By means of the solution in the present disclosure, on the basis of generative AI technology with high controllability, the sense of reality of clothing models can be significantly improved by fully utilizing priori information of digital clothing and priori information of a virtual model; and the high fidelity of modeled clothing is also kept, such that the solution is applied at a high adaptability to the alignment from the virtual model in a digital world to a real person model in a real world.

Description

A method and system for enhancing the sense of reality of virtual models based on generative network

Technical Field

本公开涉及模型增强领域，特别涉及一种基于生成网络的虚拟模特真实感增强方法及其系统。The present disclosure relates to the field of model enhancement, and in particular to a method and system for enhancing the sense of reality of a virtual model based on a generative network.

Background Art

虚拟模特是三维模型的一种具体应用，其是指使用计算机技术生成的模特，可以在数字环境中进行换装和表演。这种技术可以用于服装设计和销售，也可以用于娱乐和艺术表演。虚拟模特通常是通过三维建模和动画技术创建的，可以使用各种软件工具进行制作和编辑。和真人模特相比，虚拟模特不受时间和地点限制，可以随时进行换装和表演，同时还可以进行更精确的数据分析和测试。A virtual model is a specific application of a 3D model. It refers to a model generated using computer technology that can change clothes and perform in a digital environment. This technology can be used for clothing design and sales, as well as entertainment and artistic performances. Virtual models are usually created through 3D modeling and animation technology, and can be produced and edited using a variety of software tools. Compared with real models, virtual models are not restricted by time and place, and can be changed and performed at any time. At the same time, more accurate data analysis and testing can also be performed.

数字服装是一种使用数字技术来创建、设计和展示的服装，可用于虚拟试衣、在线购物、时尚设计、游戏和动画等。数字服装通过计算机技术进行设计、模拟和展示，可以避免传统服装制作过程中需要考虑的物理性质和制造工艺等问题，同时也方便进行修改和定制，为用户提供更加便捷、个性化、低成本的服装购买和定制体验，同时也可以为服装设计师提供更加灵活、高效的创作和展示方式。数字服装的创建通常需要使用3D建模软件、图形处理软件、虚拟现实技术等工具。Digital clothing is a type of clothing created, designed and displayed using digital technology. It can be used for virtual fitting, online shopping, fashion design, games and animation, etc. Digital clothing is designed, simulated and displayed through computer technology, which can avoid the physical properties and manufacturing processes that need to be considered in the traditional clothing production process. It is also convenient for modification and customization, providing users with a more convenient, personalized and low-cost clothing purchase and customization experience, and can also provide clothing designers with a more flexible and efficient way of creation and display. The creation of digital clothing usually requires the use of tools such as 3D modeling software, graphics processing software, and virtual reality technology.

随着元宇宙和虚拟时尚的高速发展，数字服装和虚拟模特已经成为了时尚产业的重要趋势。采用数字化建模和设计技术，可以将服装设计师的创意转化为数字模型，并在虚拟模特上实现快速的效果查看与展示。越来越多的品牌和设计师开始采用数字技术进行服装设计和制造，同时，数字服装也带来了新的商业机会和商业模式，如个性化定制、数字化营销等.With the rapid development of the metaverse and virtual fashion, digital clothing and virtual models have become an important trend in the fashion industry. The use of digital modeling and design technology can transform the creativity of fashion designers into digital models, and quickly view and display the effects on virtual models. More and more brands and designers are beginning to use digital technology for clothing design and manufacturing. At the same time, digital clothing has also brought new business opportunities and business models, such as personalized customization, digital marketing, etc.

逼真的数字化服装和模特建模，基于渲染技术虽然能够比较真实的模拟现实世界，但与真人模特相比仍然存在较大差距，尤其是虚拟模特，普遍存在着外观和动作不够真实、生动的问题。因此，如何在保证数字化服装尽可能的逼近建模效果的基础上，提升虚拟模特的真实感，使得虚拟模特能够更加接近真人模特，成为了时尚产业亟需解决的技术问题。Realistic digital clothing and model modeling, based on rendering technology, can simulate the real world more realistically, but there is still a big gap compared with real models, especially virtual models, which generally have the problem of not being realistic and vivid in appearance and movements. Therefore, how to improve the realism of virtual models while ensuring that digital clothing is as close to the modeling effect as possible? Making virtual models closer to real models has become a technical problem that the fashion industry urgently needs to solve.

发明内容Summary of the invention

有鉴于此，本公开提出了一种基于生成网络的虚拟模特真实感增强方法及其系统，具体方案如下：In view of this, the present disclosure proposes a method and system for enhancing the sense of reality of a virtual model based on a generative network, and the specific scheme is as follows:

第一部分，本公开提出了一种基于生成网络的虚拟模特真实感增强方法，包括如下步骤：In the first part, the present disclosure proposes a method for enhancing the realism of a virtual model based on a generative network, comprising the following steps:

获取关于穿着数字服装的虚拟模特的渲染图；Get renderings of virtual models wearing digital clothing;

判断所述渲染图中关于所述数字服装的解析信息是否已知，所述解析信息包括只涉及数字服装的服装区域和只涉及人体结构的模特区域；Determining whether analytical information about the digital clothing in the rendering is known, the analytical information including a clothing area related only to the digital clothing and a model area related only to the human body structure;

若未知，则基于预设算法提取出服装区域和模特区域；If unknown, the clothing area and model area are extracted based on a preset algorithm;

将所述解析信息由图像像素空间转换到隐空间后输入到生成网络中；The parsed information is converted from the image pixel space to the latent space and then input into the generation network;

在包括模特信息在内的生成控制条件的控制下，使所述生成网络基于所述模特信息在所述模特区域生成一个真实感更强的模特形象，所述模特形象维持所述虚拟模特的形态并且穿着所述数字服装；Under the control of generation control conditions including model information, the generation network generates a model image with a stronger sense of reality in the model area based on the model information, wherein the model image maintains the form of the virtual model and wears the digital clothing;

将所述模特形象由隐空间转换到图像像素空间后，得到穿着所述数字服装的模特生成图。After converting the model image from latent space to image pixel space, a generated image of the model wearing the digital clothing is obtained.

在一个具体实施例中，所述生成控制条件中还包括服装信息，所述服装信息包括数字服装的边缘、轮廓、法线及色彩；通过所述服装信息使所述生成网络强化对数字服装的局部细节和全局信息的理解和认知，进一步确保模特形象不超出所述模特区域。In a specific embodiment, the generation control conditions also include clothing information, which includes the edges, contours, normals and colors of digital clothing; through the clothing information, the generation network strengthens its understanding and cognition of local details and global information of digital clothing, further ensuring that the model image does not exceed the model area.

在一个具体实施例中，所述模特信息包括姿态、年龄、性别、肤色、脸型、五官以及发型。In a specific embodiment, the model information includes posture, age, gender, skin color, face shape, facial features and hairstyle.

在一个具体实施例中，所述生成网络包括生成对抗网络和扩散模型。In a specific embodiment, the generative network includes a generative adversarial network and a diffusion model.

在一个具体实施例中，通过图像编码器进行编码以将所述解析信息由图像像素空间转换到隐空间；通过与所述图像编码器匹配的图像解码器进行解码以将所述模特形象由隐空间转换到图像像素空间。In a specific embodiment, encoding is performed by an image encoder to convert the parsed information from an image pixel space to a latent space; decoding is performed by an image decoder matched with the image encoder to convert the model image from the latent space to the image pixel space.

在一个具体实施例中，基于图像分割技术分割服装区域，基于人体解析技术提取虚拟模特区域，并基于图像抠图技术对所述模特区域进行适应不同场景的不同粒度的处理。In a specific embodiment, the clothing area is segmented based on the image segmentation technology, the virtual model area is extracted based on the human body analysis technology, and the model area is Adapt to different granularity processing in different scenarios.

第二部分，本公开提出了一种基于生成网络的虚拟模特真实感增强系统，包括如下模块：In the second part, the present disclosure proposes a virtual model reality enhancement system based on a generative network, including the following modules:

输入单元，用于获取关于穿着数字服装的虚拟模特的渲染图；An input unit, for obtaining a rendering of a virtual model wearing digital clothing;

解析信息获取单元，用于判断所述渲染图中关于所述数字服装的解析信息是否已知，所述解析信息包括只涉及数字服装的服装区域和只涉及人体结构的模特区域；若未知，则基于预设算法提取出服装区域和模特区域；A parsing information acquisition unit, configured to determine whether parsing information about the digital clothing in the rendering is known, the parsing information including a clothing area related only to the digital clothing and a model area related only to the human body structure; if unknown, extracting the clothing area and the model area based on a preset algorithm;

第一转换单元，用于将所述解析信息由图像像素空间转换到隐空间后输入到生成网络中；A first conversion unit, used to convert the parsing information from the image pixel space to the latent space and then input it into the generation network;

生成单元，用于在包括模特信息在内的生成控制条件的控制下，使所述生成网络基于所述模特信息在所述模特区域生成一个真实感更强的模特形象，所述模特形象维持所述虚拟模特的形态且穿着所述数字服装；A generating unit, configured to enable the generating network to generate a model image with a stronger sense of reality in the model area based on the model information under the control of generating control conditions including the model information, wherein the model image maintains the form of the virtual model and wears the digital clothing;

第二转换单元，用于将所述模特形象由隐空间转换到图像像素空间后，得到穿着所述数字服装的模特生成图。The second conversion unit is used to convert the model image from the latent space to the image pixel space to obtain a generated image of the model wearing the digital clothing.

在一个具体实施例中，所述生成控制条件中还包括服装信息，所述服装信息包括数字服装的边缘、轮廓、法线及色彩；In a specific embodiment, the generation control condition further includes clothing information, and the clothing information includes the edge, contour, normal and color of the digital clothing;

通过所述服装信息使所述生成网络强化对数字服装的局部细节和全局信息的理解和认知，进一步确保模特形象不超出所述模特区域。The clothing information enables the generation network to strengthen its understanding and cognition of local details and global information of the digital clothing, further ensuring that the model image does not exceed the model area.

在一个实施例中，所述模特信息包括姿态、年龄、性别、肤色、脸型、五官以及发型；和/或，所述生成网络包括生成对抗网络和扩散模型；In one embodiment, the model information includes posture, age, gender, skin color, face shape, facial features and hairstyle; and/or, the generative network includes a generative adversarial network and a diffusion model;

和/或，基于图像分割技术分割服装区域，基于人体解析技术提取虚拟模特区域，并基于图像抠图技术对所述模特区域进行适应不同场景的不同粒度的处理。And/or, the clothing area is segmented based on the image segmentation technology, the virtual model area is extracted based on the human body analysis technology, and the model area is processed with different granularities adapted to different scenes based on the image cutout technology.

第三部分，本公开提出了一种基于生成网络的虚拟模特真实感增强方法，包括如下步骤：In the third part, this disclosure proposes a virtual model realism enhancement method based on a generative network. The method comprises the following steps:

接收穿着数字服装的虚拟模特的渲染图；receiving a rendering of a virtual model wearing the digital garment;

提取所述渲染图的解析信息，所述解析信息包括服装区域和模特区域；Extracting parsed information of the rendering, the parsed information including a clothing area and a model area;

将所述解析信息、模特信息输入生成网络，生成穿着有所述数字服装的且与模特信息对应的模特形象。The analysis information and the model information are input into a generation network to generate a model image wearing the digital clothing and corresponding to the model information.

在一些具体实施例中，所述服装区域为数字服装所在的区域，所述模特区域为虚拟模特裸露于数字服装之外的人体结构，所述模特区域不包括被数字服装所覆盖的人体结构。In some specific embodiments, the clothing area is the area where the digital clothing is located, the model area is the human body structure of the virtual model exposed outside the digital clothing, and the model area does not include the human body structure covered by the digital clothing.

在一些具体实施例中，获取渲染图的解析信息之前，还包括：In some specific embodiments, before obtaining the parsed information of the rendering image, the method further includes:

判断所述渲染图中是否包含解析信息；Determining whether the rendering contains parsing information;

若不包含所述解析信息，则基于预设算法提取服装区域和模特区域,对所述服装区域和所述模特区域的边缘进行精确表达。If the analysis information is not included, the clothing area and the model area are extracted based on a preset algorithm, and the edges of the clothing area and the model area are accurately expressed.

在一些具体实施例中，基于图像分割算法提取所述服装区域，基于人体解析算法提取所述模特区域，并基于图像抠图算法处理所述服装区域和所述模特区域的边缘，以实现对所述服装区域和所述模特区域的边缘进行精确表达。In some specific embodiments, the clothing area is extracted based on an image segmentation algorithm, the model area is extracted based on a human body analysis algorithm, and the edges of the clothing area and the model area are processed based on an image cutout algorithm to achieve accurate expression of the edges of the clothing area and the model area.

在一些具体实施例中，获取所述模特信息的方式包括：In some specific embodiments, the method of obtaining the model information includes:

生成参考模特模型，将该模型的信息参数作为模特信息；和/或，Generate a reference model and use information parameters of the model as model information; and/or,

通过可配置的模特自定义输入，通过文本编码器将模特配置项文本转换为特征表示，将该模特的特征表示作为模特信息。Through the configurable model custom input, the model configuration item text is converted into feature representation through a text encoder, and the feature representation of the model is used as the model information.

在一些具体实施例中，所述模特区域包括区域轮廓信息和人体部位信息。In some specific embodiments, the model region includes region contour information and human body part information.

在一些具体实施例中，将所述解析信息、预设模特信息输入生成网络时，还包括：将服装信息输入生成网络，所述服装信息包括服装的边缘、轮廓、法线、色彩、纹理中的一项或多项特征。In some specific embodiments, when the parsed information and preset model information are input into the generation network, it also includes: inputting clothing information into the generation network, wherein the clothing information includes one or more features of the clothing's edge, contour, normal, color, and texture.

第四部分，本公开提出了一种电子设备，包括：处理器；用于存储处理器可执行指令的存储器； In the fourth part, the present disclosure proposes an electronic device, comprising: a processor; a memory for storing instructions executable by the processor;

其中，所述处理器通过运行所述可执行指令以实现如第三部分中任一项所述的方法。The processor implements the method as described in any one of the third parts by running the executable instructions.

第五部分，本公开提出了一种计算机可读存储介质，其上存储有计算机指令，该指令被处理器执行时实现如第三部分中任一项所述方法的步骤。In the fifth part, the present disclosure proposes a computer-readable storage medium having computer instructions stored thereon, which, when executed by a processor, implement the steps of any method described in the third part.

有益效果：本公开提供了一种基于生成网络的虚拟模特真实感增强方法及其系统，基于可控性高的生成式AI技术，充分利用数字服装与虚拟模特的先验信息，能够显著提高服装模特的真实感，同时保持建模服装的高保真度，高适应性地应用于数字世界虚拟模特向现实世界真人模特的对齐。Beneficial effects: The present disclosure provides a method and system for enhancing the realism of a virtual model based on a generative network. Based on highly controllable generative AI technology, it makes full use of the prior information of digital clothing and virtual models, can significantly improve the realism of clothing models, while maintaining high fidelity of modeled clothing, and is highly adaptable to the alignment of virtual models in the digital world to real-life models in the real world.

BRIEF DESCRIPTION OF THE DRAWINGS

图1是本公开实施例的真实感增强方法流程图；FIG1 is a flow chart of a method for enhancing the sense of reality according to an embodiment of the present disclosure;

图2是本公开实施例的真实感增强方法原理示意图；FIG2 is a schematic diagram showing the principle of a method for enhancing the sense of reality according to an embodiment of the present disclosure;

图3是本公开实施例的另一种真实感增强方法流程图；FIG3 is a flow chart of another method for enhancing the sense of reality according to an embodiment of the present disclosure;

图4是本公开实施例的真实感增强系统模块示意图；FIG4 is a schematic diagram of a module of a reality enhancement system according to an embodiment of the present disclosure;

图5是本公开实施例的电子设备结构示意图。FIG. 5 is a schematic diagram of the structure of an electronic device according to an embodiment of the present disclosure.

附图标记：1-输入单元；2-解析信息获取单元；3-第一转换单元；4-生成单元；5-第二转换单元；702-处理器；704-内部总线；706-网络接口；708-内存；710-非易失性存储器。Figure numerals: 1-input unit; 2-parsed information acquisition unit; 3-first conversion unit; 4-generation unit; 5-second conversion unit; 702-processor; 704-internal bus; 706-network interface; 708-memory; 710-non-volatile memory.

DETAILED DESCRIPTION

在下文中，将更全面地描述本公开公开的各种实施例。本公开公开可具有各种实施例，并且可在其中做出调整和改变。然而，应理解：不存在将本公开公开的各种实施例限于在此公开的特定实施例的意图，而是应将本公开公开理解为涵盖落入本公开公开的各种实施例的精神和范围内的所有调整、等同物和/或可选方案。In the following, various embodiments of the present disclosure will be described more fully. The present disclosure may have various embodiments, and adjustments and changes may be made therein. However, it should be understood that there is no intention to limit the various embodiments of the present disclosure to the specific embodiments disclosed herein, but the present disclosure should be understood to cover all adjustments, equivalents and/or alternatives that fall within the spirit and scope of the various embodiments of the present disclosure.

本实施例公开了一种基于生成网络的虚拟模特真实感增强方法，能够显著提高服装模特的真实感，同时保持建模服装的高保真度。真实感增强方法的具体流程如说明书附图1所示，完整的流程如附图2所示。 This embodiment discloses a method for enhancing the sense of reality of a virtual model based on a generative network, which can significantly improve the sense of reality of a clothing model while maintaining high fidelity of the modeled clothing. The specific process of the method for enhancing the sense of reality is shown in Figure 1 of the specification, and the complete process is shown in Figure 2.

具体方案如下：The specific plan is as follows:

一种基于生成网络的虚拟模特真实感增强方法，包括如下步骤：A method for enhancing the sense of reality of a virtual model based on a generative network comprises the following steps:

101、获取关于穿着数字服装的虚拟模特的渲染图；101. Obtaining a rendering of a virtual model wearing digital clothing;

102、判断渲染图中关于数字服装的解析信息是否已知，解析信息包括只涉及数字服装的服装区域和只涉及人体结构的模特区域；若未知，则基于预设算法提取出服装区域和模特区域；102. Determine whether analytical information about the digital clothing in the rendering is known, the analytical information including a clothing area related only to the digital clothing and a model area related only to the human body structure; if unknown, extract the clothing area and the model area based on a preset algorithm;

103、将解析信息由图像像素空间转换到隐空间后输入到生成网络中；103. Convert the parsed information from the image pixel space to the latent space and input it into the generative network;

104、在包括模特信息在内的生成控制条件的控制下，使生成网络基于模特信息在模特区域生成一个真实感更强的模特形象，模特形象维持虚拟模特的形态并且穿着数字服装；104. Under the control of generation control conditions including model information, the generation network generates a model image with a stronger sense of reality in the model area based on the model information, wherein the model image maintains the form of the virtual model and wears digital clothing;

105、将模特形象由隐空间转换到图像像素空间后，得到穿着数字服装的模特生成图。105. After converting the model image from latent space to image pixel space, a generated image of the model wearing digital clothing is obtained.

本实施例的方案是针对渲染图中的模特进行真实度增强，而服装保留原有的保真度。渲染图是指通过计算机图形学技术创建的虚拟场景或物体的图像。渲染图可以通过调整模型、材质、灯光等参数来生成不同的视觉效果。本实施例的渲染图是虚拟模特穿着数字服装的图像。The solution of this embodiment is to enhance the realism of the model in the rendering, while the clothing retains its original fidelity. A rendering is an image of a virtual scene or object created by computer graphics technology. A rendering can generate different visual effects by adjusting parameters such as models, materials, and lighting. The rendering of this embodiment is an image of a virtual model wearing digital clothing.

本实施例的方案需要对渲染图进行精细化解析，获取虚拟模特穿着衣服的解析信息。解析信息包括服装区域和模特区域。服装区域为数字服装所在的区域。模特区域为虚拟模特裸露于数字服装之外的人体结构。例如，渲染图中为模特穿着一件连衣裙，则连衣裙整个所在的区域为服装区域，模特的身体结构裸露于外的器官如头部、脚部、手臂即为模特区域。需要注意的是，模特区域并不是完整的模特。换个角度，模特区域与服装区域结合恰好构成一个完整的、穿着衣服的模特。服装区域和模特区域的分离，是为了保持数字服装高保真度，同时确保只对非服装区域的模特进行真实感增强，而服装区域依旧保持原有的形态。本实施例将模特区域和服装区域先区分开，分别输入生成网络，这样生成网络可以针对模特区域和服装区域有不同的处理方式，让整个系统处理图像的方式变得更加灵活，尤其是在算力紧张的情况下，系统更容易把计算资源投入在用户需要重点增强的区域，同时生成网络针对不同区域可以采取不同的增强策略，这样就会使增强效果更好。此外对模特区域和服装区域的拆分做进一步要求，本实施例中的模特区域仅需要获取裸露于服装之外的各个人体部分结构，这样生成网络可以做到选择性的对模特和服装进行分别处理。具体来讲，本实施例可以支持用户自行选择增强模特区域或者服装区域的真实感，或者两者都进行增强，以及在默认的预设环境下，也可以根据当前系统所能调动的计算资源自行选择真实感增强对象，如果计算资源不足以同时增强模特区域和服装区域的真实感，那么系统就会优先对模特区域进行真实感增强，之后由用户选择是否继续对服装进行真实感增强；在一些情况下，本方案还可以提供更详细的配置选项，支持用户针对模特区域和服装区域输入不同的细节信息，控制生成网络分别增强模特区域和服装区域，例如用户可以针对模特区域的真实感增强输入较多需求，如纹身、肤色、疤痕等细节，而对服装区域仅仅输入纹理细节，这样本方案就只针对模特区域实现纹身、肤色、疤痕等细节的生成，而服装区域仅仅处理纹理的强化，并且在本实施例的系统之下，属于模特区域的细节不会错乱的生成在服装区域。The solution of this embodiment requires a refined analysis of the rendering to obtain the analysis information of the virtual model wearing clothes. The analysis information includes clothing area and model area. The clothing area is the area where the digital clothing is located. The model area is the human body structure of the virtual model exposed outside the digital clothing. For example, in the rendering, the model is wearing a dress, then the area where the dress is located is the clothing area, and the organs of the model's body structure exposed to the outside, such as the head, feet, and arms, are the model area. It should be noted that the model area is not a complete model. From another perspective, the model area and the clothing area combine to form a complete, clothed model. The separation of the clothing area and the model area is to maintain the high fidelity of the digital clothing, while ensuring that only the model in the non-clothing area is enhanced in realism, while the clothing area still maintains its original form. In this embodiment, the model area and the clothing area are first distinguished and input into the generation network separately, so that the generation network can have different processing methods for the model area and the clothing area, making the way the entire system processes images more flexible, especially when the computing power is tight, the system is more likely to invest computing resources in the areas that users need to focus on, and at the same time, the generation network can be used for different areas Different enhancement strategies are adopted, which will achieve better enhancement effects. In addition, further requirements are made for the separation of the model area and the clothing area. In this embodiment, the model area only needs to obtain the structures of the human body parts exposed outside the clothing, so that the generation network can selectively process the model and clothing separately. Specifically, this embodiment can support users to choose to enhance the sense of reality of the model area or the clothing area, or enhance both. In the default preset environment, users can also select the object of sense of reality enhancement according to the computing resources that the current system can mobilize. If the computing resources are not enough to enhance the sense of reality of the model area and the clothing area at the same time, the system will give priority to enhancing the sense of reality of the model area, and then the user can choose whether to continue to enhance the sense of reality of the clothing. In some cases, this solution can also provide more detailed configuration options, support users to input different detail information for the model area and the clothing area, and control the generation network to enhance the model area and the clothing area respectively. For example, the user can input more requirements for the sense of reality enhancement of the model area, such as tattoos, skin color, scars and other details, and only input texture details for the clothing area. In this way, this solution only realizes the generation of details such as tattoos, skin color, scars and the like for the model area, and the clothing area only processes the enhancement of texture, and under the system of this embodiment, the details belonging to the model area will not be generated in the clothing area in a disordered manner.

本实施例针对不同的虚拟模特来源，提出了自适应的、不同粒度的服装与模特的分离技术方案，以得到高质量的服装与模特分离信息，使数字服装与虚拟模特的自适应性更加完整可靠。同时为虚拟模特真实感增强，提供数据支撑。This embodiment proposes adaptive and different-granularity clothing and model separation technology solutions for different virtual model sources to obtain high-quality clothing and model separation information, making the adaptability of digital clothing and virtual models more complete and reliable. At the same time, it provides data support for enhancing the realism of virtual models.

1.虚拟模特自带穿着数字服装的解析信息，则直接进行后续处理；1. If the virtual model has its own analytical information of wearing digital clothing, the subsequent processing will be carried out directly;

2.当不知晓渲染图中的解析信息、仅有虚拟模特时，只能通过算法进行提取。根据不同粒度，分别采用如下方案，如附图2所示。2. When the analytical information in the rendering is unknown and only the virtual model is available, the extraction can only be performed through the algorithm. According to different granularities, the following schemes are adopted, as shown in Figure 2.

基于图像分割(image segmentation)技术，分割服装区域，对于非服装区域的模特部分，进行下一步处理。图像分割是指将图像分成若干个具有相似性质的区域，以便更好地进行分析和处理。图像分割技术可以应用于图像识别、目标检测等领域。本实施例的图像分割技术包括基于阈值的分割、基于区域的分割、基于边缘的分割。Based on image segmentation technology, the clothing area is segmented, and the model part that is not in the clothing area is processed in the next step. Image segmentation refers to dividing an image into several areas with similar properties for better analysis and processing. Image segmentation technology can be applied to image recognition, target detection and other fields. The image segmentation technology of this embodiment includes threshold-based segmentation, region-based segmentation, and edge-based segmentation.

本实施例提到的基于阈值的分割、基于区域的分割、基于边缘的分割等只是为了描述本方案所能选用的各种技术手段，来实现图像分割的目的。此外还有，基于深度学习的图像分割、基于GAN的凸显分割、基于扩散模型的图像分割等等。图像分割的作用主要是将图像中的服装区域、模特区域、其他非服装非模特区域隔离开，以便于处理图像中的模特区域，实现服装区域信息保持，而模特区域更加符合真实模特的特征。The threshold-based segmentation, region-based segmentation, edge-based segmentation, etc. mentioned in this embodiment are only used to describe the various technical means that can be used in this solution to achieve the purpose of image segmentation. In addition, there are image segmentation based on deep learning, salient segmentation based on GAN, and Image segmentation based on diffusion model, etc. The main function of image segmentation is to separate the clothing area, model area, and other non-clothing and non-model areas in the image, so as to facilitate the processing of the model area in the image, maintain the information of the clothing area, and make the model area more consistent with the characteristics of the real model.

基于人体解析(human parsing)技术，提取虚拟模特区域，进行下一步处理。人体解析是是更加细粒度的图像分割技术，除了将图像分解为服装区域、模特区域、其他区域之外，还会对模特区域进一步的更高精度的分割，如头发、脸部、脖子、胳膊、手、脚等，以实现细颗粒度的模特理解和分析。人体解析一般采用基于传统机器学习算法、深度学习算法，如实例分割(instance segmentation)算法等。比如，典型的，采用基于Mask RCNN算法对图像中各像素进行像素级细粒度分离，提取得到模特区域的细粒度的语义信息。同时，更高精度的标注数据集，对于结果更加重要。但是整体而言，可以理解的是人体解析算法的种类有很多，本实施例仅仅列举了一些常见的算法，本领域人员可以根据想要的效果自行选择合适的人体解析算法。Based on human parsing technology, the virtual model area is extracted for the next step of processing. Human parsing is a more fine-grained image segmentation technology. In addition to decomposing the image into clothing areas, model areas, and other areas, the model area will be further segmented with higher precision, such as hair, face, neck, arms, hands, feet, etc., to achieve fine-grained model understanding and analysis. Human parsing generally uses traditional machine learning algorithms and deep learning algorithms, such as instance segmentation algorithms. For example, typically, the Mask RCNN algorithm is used to perform pixel-level fine-grained separation of each pixel in the image to extract fine-grained semantic information of the model area. At the same time, a more accurate annotated data set is more important for the results. However, overall, it can be understood that there are many types of human parsing algorithms. This embodiment only lists some common algorithms. People in this field can choose the appropriate human parsing algorithm according to the desired effect.

基于图像抠图(image matting)技术，精细化提取虚拟模特区域，进行下一步处理。图像抠图是指通常使用计算机视觉技术，通过识别和提取图像的重要特征，如颜色、边缘、纹理等，将需要截取的部分与背景进行分离。图像抠图算法，除了像素级的理解图像信息，还对服装区域、模特区域、其他区域的边缘进一步的精确表达。相对于分割算法，抠图得到的区域边缘更加的柔化，能够以不同透明度的方式表达区域效果，尤其对于毛发类区域，非常重要。Based on the image matting technology, the virtual model area is refined and processed for the next step. Image matting refers to the use of computer vision technology to separate the part to be intercepted from the background by identifying and extracting important features of the image, such as color, edges, texture, etc. In addition to understanding image information at the pixel level, the image matting algorithm also further accurately expresses the edges of clothing areas, model areas, and other areas. Compared with the segmentation algorithm, the edges of the area obtained by matting are softer and can express the regional effect in different transparencies, which is especially important for hair areas.

此外，基于大模型的算法，如SAM(Segment Anything)，借鉴NLP任务中Prompt思路，通过对大量的数据进行学习，可以实现通过Prompt提示输入的方式来完成目标物体的快速分割，但其仍是一项图像分割技术。In addition, algorithms based on large models, such as SAM (Segment Anything), draw on the prompt idea in NLP tasks and, by learning from large amounts of data, can achieve rapid segmentation of target objects through prompt input, but it is still an image segmentation technology.

本实施例针对服装区域、模特区域的区分，以及人体结构的划分，选择三种不同的算法进行处理，这会使生成效果更好，并且由于处理的不是全局细节，因此耗费的计算资源也很小。本实施例可以针对性的对模特各个部分进行真实感增强，因此需要引入人体解析算法，以实现精确识别人体的各个部位，这样才能支持用户对模特进行定制化细节增强的需求。例如，现实中一个令人印象深刻的模特往往在身体或者面部有一些个性化特征，这些特征的存在又刚好可以和服装形成呼应关系，容易理解的一个例子就是个性纹身和潮流街头服装的相互呼应，这会让整体的视觉效果更好。因此在数字化的场景中，本方案也需要支持用户可以自定义的去设计这些模特与服装的元素搭配，例如用户可能需要在模特的左臂上设计一块纹身图案，人体解析算法可以帮助系统实现人体左臂的识别，这样就能控制生成网络仅在模特的左臂上生成纹身，而不是生成在右臂。本实施例精确的将图像拆解并强化细节，以加工、输出高保真的试穿渲染图，并且因为需要高保真，所以系统需要将算力资源投入在更需要提高真实感的区域，此外图像抠图算法保证让图像一直保持整体性，使整张图的视觉效果更好。This embodiment selects three different algorithms for processing the distinction between clothing areas and model areas, as well as the division of human body structure, which will make the generation effect better, and because it does not process global details, the computing resources consumed are also small. This embodiment can enhance the sense of reality of each part of the model in a targeted manner, so it is necessary to introduce a human body analysis algorithm to accurately identify various parts of the human body, so as to support users to customize the details of the model. needs. For example, in reality, an impressive model often has some personalized features on the body or face, and the existence of these features just happens to echo the clothing. An easy-to-understand example is the mutual echo between personalized tattoos and trendy street clothing, which will make the overall visual effect better. Therefore, in the digital scene, this solution also needs to support users to customize the design of these models and clothing elements. For example, the user may need to design a tattoo pattern on the model's left arm. The human body analysis algorithm can help the system realize the recognition of the human left arm, so that the generation network can be controlled to generate tattoos only on the model's left arm, rather than on the right arm. This embodiment accurately disassembles the image and strengthens the details to process and output high-fidelity try-on renderings, and because high fidelity is required, the system needs to invest computing resources in areas that need to improve the sense of reality. In addition, the image cutout algorithm ensures that the image always maintains its integrity, making the visual effect of the entire picture better.

获取到服装区域和模特区域后，即可送到生成网络进行虚拟模特的真实化生成。本实施例在将解析信息输入到生成网络之前需要将网络计算由图像像素空间转换到隐空间，实现数据降维和压缩，减少计算量。主要方式如，采用VAE编码器处理，在生成网络生成完模特形象之后，同样需要由隐空间转换到图像像素空间。两个转换步骤能够显著降低计算量。在一个具体实施例中，通过图像编码器进行编码以将解析信息由图像像素空间转换到隐空间；通过与图像编码器匹配的图像解码器进行解码以将模特形象由隐空间转换到图像像素空间。具体如附图2所示。After obtaining the clothing area and the model area, they can be sent to the generation network for realistic generation of the virtual model. In this embodiment, before the parsing information is input into the generation network, it is necessary to convert the network calculation from the image pixel space to the latent space to achieve data dimension reduction and compression and reduce the amount of calculation. The main method is, for example, using a VAE encoder for processing. After the generation network generates the model image, it also needs to be converted from the latent space to the image pixel space. The two conversion steps can significantly reduce the amount of calculation. In a specific embodiment, encoding is performed by an image encoder to convert the parsing information from the image pixel space to the latent space; decoding is performed by an image decoder matching the image encoder to convert the model image from the latent space to the image pixel space. As shown in Figure 2.

优选地，本实施例方案主要采用基于扩散模型的生成方式。扩散模型是一种比较先进的机器学习算法，通过逐步向数据中添加随机噪声，然后学习逆向扩散的过程，进而从噪声中生成高质量的数据样本。在生成过程中，通过迭代的方式，反复的扩散和去噪过程来生成结构连贯、细节丰富的图像。扩散模型是一种基于扩散的生成模型，它通过多个阶段来生成图像。在每个阶段，模型会向输入的噪声添加一些噪声，从而逐渐改变图像的内容。通过多次迭代，最终可以生成高质量的图像。为了更好的表征高质量的模特生成，构建了大规模的训练数据集，以能够更有效的输出真实的模特效果。Preferably, the scheme of this embodiment mainly adopts a generation method based on a diffusion model. The diffusion model is a relatively advanced machine learning algorithm that generates high-quality data samples from the noise by gradually adding random noise to the data and then learning the reverse diffusion process. In the generation process, a structurally coherent and detailed image is generated through an iterative approach, repeated diffusion and denoising processes. The diffusion model is a diffusion-based generation model that generates images through multiple stages. At each stage, the model adds some noise to the input noise, thereby gradually changing the content of the image. Through multiple iterations, a high-quality image can eventually be generated. In order to better characterize the generation of high-quality models, a large-scale training data set is constructed to be able to more effectively output a real model effect.

此外，GAN(生成对抗网络)也能够达到生成的任务目的。GAN是一种无监督学习模型，通过一个生成器网络和一个判别器网络进行对抗训练，从而生成逼真的图像。生成器网络尝试生成与训练数据相似的图像，而判别器网络则尝试区分生成器生成的图像和训练数据中的真实图像。通过这种对抗训练，GAN可以生成高质量的图像。扩散模型是一种基于扩散的生成模型，它通过多个阶段来生成图像。在每个阶段，模型会向输入的噪声添加一些噪声，从而逐渐改变图像的内容。通过多次迭代，最终可以生成高质量的图像。In addition, GAN (Generative Adversarial Network) can also achieve the purpose of generation. GAN is an unsupervised learning model that generates realistic images through adversarial training of a generator network and a discriminator network. The generator network tries to generate images similar to the training data. The discriminator network tries to distinguish between the images generated by the generator and the real images in the training data. Through this adversarial training, GAN can generate high-quality images. The diffusion model is a diffusion-based generative model that generates images through multiple stages. At each stage, the model adds some noise to the input noise, gradually changing the content of the image. Through multiple iterations, high-quality images can eventually be generated.

其中，生成网络需要额外的生成控制条件来控制模特形象的生成，确保生成过程的可控性。生成控制条件中涉及模特信息和服装信息。通过服装信息来保持服装的整体效果，使生成网络强化对数字服装与数字模特的局部细节和全局信息的理解和认知，有效的对数字模特针对性处理，同时保持数字服装的设计细节和元素的高度保真性。局部细节和全局信息分别通过图像的局部区域特征提取，如服装区域、模特区域等，理解局部特点；全局信息是对整张图像的全局理解，如模特与服装的匹配度、呈现的自然程度等。Among them, the generative network requires additional generative control conditions to control the generation of the model image and ensure the controllability of the generation process. The generative control conditions involve model information and clothing information. The overall effect of the clothing is maintained through clothing information, so that the generative network strengthens the understanding and cognition of the local details and global information of digital clothing and digital models, effectively processes the digital model in a targeted manner, and maintains the high fidelity of the design details and elements of digital clothing. Local details and global information are respectively extracted through local area features of the image, such as clothing area, model area, etc., to understand local characteristics; global information is a global understanding of the entire image, such as the matching degree between the model and clothing, the naturalness of the presentation, etc.

通过模特信息来保证新生成的模特与输入图片中的模特在姿态和位置上一致。服装信息具体为服装边缘、轮廓、法线、色彩、纹理的细节信息，模特信息则涉及姿态、年龄、性别、肤色、脸型、五官以及发型等各个方面，只要能够影响模特形象的特征都可纳入到模特信息中，有效增强虚拟模特形象的真实感。此外，本实施例还支持更多的预分析处理信息作为生成控制条件，为生成式AI注入更多先验知识，以进一步提升了生成式AI的可控性、真实性。这里，为了实现模特的可控性，主要利用两种方式来实现对姿态、年龄、性别、肤色、脸型、五官以及发型等方面的特征进行编辑；一种方式是利用数字建模软件的能力，在生成模特建模的环节预定义符合需求的数字模特；另一种方式是，通过可配置的模特自定义输入，如输入文字字段等，基于文本编码器，将模特配置项文本转换为特征表示，再通过特征融合的方式，如特征拼接、交叉注意力等，实现对模特生成效果的控制与调节；上述两种方式可以择一选用或者同时选用。同理，服装信息也可以采用生成参考服装模型，将该模型的信息参数作为服装信息，和/或通过可配置的服装自定义输入，基于文本编码器，将服装配置项文本转换为特征表示；但是服装信息不是必须的，这取决于用户是否需要通过服装信息进一步提高服装的真实感，例如在一些算力不足的情况下，用户可以选择不输入服装信息，仅依赖模特信息增强模特真实感，或者用户觉得服装也需要增强真实感，则此时可以将服装信息也输入到生成网络中。这样可以使系统有更好的交互性，使用户可以自由选择控制条件的输入方式，如果用户拥有自己的模特模型，或者希望先编辑一个属于自己的模特模型，那么系统提供了利用模特模型的参数作为控制条件的功能，这样用户对生成网络的细节控制性更强；如果用户希望通过模特自定义输入，这样就无需提前编辑模特模型，对用户而言更容易操作，能更快捷的做出想要的效果；Model information is used to ensure that the newly generated model is consistent with the model in the input image in posture and position. Clothing information specifically refers to detailed information on clothing edges, contours, normals, colors, and textures, while model information involves posture, age, gender, skin color, face shape, facial features, and hairstyle. Any features that can affect the model's image can be included in the model information, effectively enhancing the realism of the virtual model's image. In addition, this embodiment also supports more pre-analysis processing information as generation control conditions, injecting more prior knowledge into generative AI to further improve the controllability and authenticity of generative AI. Here, in order to achieve the controllability of the model, two main methods are used to edit the features of posture, age, gender, skin color, face shape, facial features and hairstyle; one method is to use the capabilities of digital modeling software to predefine digital models that meet the needs in the process of generating model modeling; the other method is to use configurable model custom input, such as inputting text fields, etc., based on the text encoder, to convert the model configuration item text into feature representation , and then use feature fusion methods such as feature splicing and cross-attention to achieve control and adjustment of the model generation effect; the above two methods can be selected one by one or at the same time. Similarly, clothing information can also be used to generate a reference clothing model, using the information parameters of the model as clothing information, and/or through configurable clothing custom input, based on the text encoder, to convert the clothing configuration item text into feature representation; however, clothing information is not required, which depends on whether the user needs to further improve the realism of the clothing through clothing information. For example, in some cases where computing power is insufficient, the user can choose not to enter clothing information, only If you rely on model information to enhance the realism of the model, or if the user feels that the clothing also needs to enhance its realism, then the clothing information can also be input into the generative network. This will make the system more interactive and allow users to freely choose the input method of the control conditions. If the user has his own model, or wants to edit a model of his own first, the system provides the function of using the parameters of the model as the control conditions, so that the user has greater control over the details of the generative network; if the user wants to use the model to customize the input, there is no need to edit the model in advance, which is easier for the user to operate and can achieve the desired effect more quickly;

同时，如果用户对于部分人体细节的真实度和精细要求很高，其他人体细节要求不高，则可以同时采用上述两种方式控制生成网络，例如用户对于模特左臂的纹身有特殊的个性化设计，希望生成网络输出的图像原样保留，而模特右臂的纹身并不做特殊要求，仅仅需要有个纹身提供视觉效果，那么用户可以将模特模型的左臂纹身通过模型参数输入生成网络，模特右臂拥有纹身则通过文本的形式输入，这样生成网络处理后的模特渲染图上，模特左臂的纹身将会精细化的原样保留，而右臂则会生成一个大致符合用户预期的纹身；服装的真实感增强也同理。At the same time, if the user has high requirements for the authenticity and refinement of some human body details, but not for other human body details, the above two methods can be used at the same time to control the generative network. For example, the user has a special personalized design for the tattoo on the model's left arm, and hopes that the image output by the generative network will be retained as it is, while the tattoo on the model's right arm does not have special requirements, and only needs a tattoo to provide a visual effect. In this way, the user can input the model's left arm tattoo into the generative network through model parameters, and input the model's right arm tattoo in the form of text. In this way, on the model rendering processed by the generative network, the tattoo on the model's left arm will be retained as it is, while the right arm will generate a tattoo that roughly meets the user's expectations; the same is true for the enhanced sense of reality of clothing.

本实施例还公开了另一种基于生成网络的虚拟模特真实感增强方法，流程如附图3所示，具体包括如下步骤：This embodiment also discloses another method for enhancing the sense of reality of a virtual model based on a generative network, the process of which is shown in FIG3 and specifically includes the following steps:

301、接收穿着数字服装的虚拟模特的渲染图；301. Receiving a rendering of a virtual model wearing digital clothing;

302、提取渲染图的解析信息，解析信息包括服装区域和模特区域；302. Extracting parsed information of the rendering, where the parsed information includes a clothing area and a model area;

303、将解析信息、模特信息输入生成网络，生成穿着有数字服装的且与模特信息对应的模特形象。303. Input the parsed information and the model information into a generation network to generate a model image wearing the digital clothing and corresponding to the model information.

本实施例将模特区域和服装区域先区分开，然后输入生成网络，这样生成网络可以针对这两种不同的区域有各自的处理方式，让整个系统处理图像的方式变得更加灵活，尤其是在算力紧张的情况下，系统更容易把计算资源投入在特定区域；同时生成网络针对不同区域可以采取不同的增强策略，这样就会使增强效果更好。In this embodiment, the model area and the clothing area are first distinguished and then input into the generative network, so that the generative network can have different processing methods for these two different areas, making the way the entire system processes images more flexible, especially when computing power is limited, the system can more easily invest computing resources in specific areas; at the same time, the generative network can adopt different enhancement strategies for different areas, which will make the enhancement effect better.

在一些具体实施例中，服装区域为数字服装所在的区域，模特区域为虚拟模特裸露于数字服装之外的人体结构，模特区域不包括被数字服装所覆盖的人体结构。在一些具体实施例中，模特区域包括区域轮廓信息和人体部位信息。对模特区域和服装区域的拆分做进一步要求，本实施例中的模特区域仅需要获取裸露于服装之外的各个人体部分结构，使得生成网络可以做到选择性的对模特区域和和服装区域进行分别处理。具体来讲，本实施例可以支持用户选择增强模特区域或者服装区域的真实感，或者两者都进行增强，以及在默认的预设环境下，系统也可以根据当前系统所能调动的计算资源自行选择真实感增强对象，如在本实施例中，如果计算资源不足以同时增强模特区域和服装区域的真实感，那么系统就会优先对模特区域进行真实感增强，之后由用户选择是否继续对服装进行真实感增强；在一些情况下，本系统还可以提供更详细的配置选项，支持用户针对模特区域和服装区域输入不同的细节信息，控制生成网络分别增强模特区域和服装区域，例如用户可以针对模特区域的真实感增强输入较多需求，如纹身、肤色、疤痕等细节，而对服装区域仅仅输入纹理细节，这样本系统就只针对模特区域实现纹身、肤色、疤痕等细节的生成，而服装区域仅仅处理纹理的强化，并且在本实施例的系统之下，属于模特区域的细节不会错乱的生成在服装区域。In some specific embodiments, the clothing area is the area where the digital clothing is located, and the model area is the human body structure of the virtual model exposed outside the digital clothing. The model area does not include the human body structure covered by the digital clothing. Further requirements are made for the separation of the model area and the clothing area. In this embodiment, the model area only needs to obtain the structure of each human body part exposed outside the clothing, so that the generation network can selectively process the model area and the clothing area separately. Specifically, the present embodiment can support the user to choose to enhance the sense of reality of the model area or the clothing area, or to enhance both, and under the default preset environment, the system can also select the object of sense of reality enhancement according to the computing resources that the current system can mobilize. For example, in the present embodiment, if the computing resources are not enough to enhance the sense of reality of the model area and the clothing area at the same time, then the system will give priority to enhancing the sense of reality of the model area, and then the user can choose whether to continue to enhance the sense of reality of the clothing; in some cases, the present system can also provide more detailed configuration options, support users to input different detail information for the model area and the clothing area, and control the generation network to enhance the model area and the clothing area respectively. For example, the user can input more requirements for the sense of reality enhancement of the model area, such as details such as tattoos, skin color, scars, etc., and only input texture details for the clothing area. In this way, the present system only realizes the generation of details such as tattoos, skin color, scars, etc. for the model area, and the clothing area only processes the enhancement of texture, and under the system of the present embodiment, the details belonging to the model area will not be generated in the clothing area in a disordered manner.

在一些具体实施例中，获取渲染图的解析信息之前，还包括：判断渲染图中是否包含解析信息；若不包含解析信息，则基于预设算法提取服装区域和模特区域；并对服装区域和模特区域的边缘进行精确表达。In some specific embodiments, before obtaining the analytical information of the rendering, it also includes: determining whether the rendering contains analytical information; if it does not contain analytical information, extracting the clothing area and the model area based on a preset algorithm; and accurately expressing the edges of the clothing area and the model area.

在一些具体实施例中，基于图像分割算法提取服装区域，基于人体解析算法提取模特区域，并基于图像抠图算法处理服装区域和模特区域的边缘，以实现对服装区域和模特区域的边缘进行精确表达。针对服装、模特、人体结构所涉及的细节，选择了三种不同的方案，使得对细节的表达效果更好。本方案是要精确的将图像拆解并强化细节，以完成高保真的试穿渲染图，而人体结构识别不精确就无法满足市场业务需求。从另一个角度而言，正是因为需要高保真，所以系统需要将有限的算力资源投入在更重要的地方，游戏行业等涉及渲染的领域，则没有这样的问题。此外，图像抠图算法能够保证让图像的整体性，使整张图的视觉效果更好，而仅优化人脸不需要这个步骤，因为不需要考虑服装对人脸的影响，也不无需考虑如何去增强其他部位的质感和个性感。In some specific embodiments, the clothing area is extracted based on the image segmentation algorithm, the model area is extracted based on the human body analysis algorithm, and the edges of the clothing area and the model area are processed based on the image cutout algorithm to achieve accurate expression of the edges of the clothing area and the model area. Three different schemes are selected for the details involved in clothing, models, and human body structures, so that the expression of details is better. This scheme is to accurately disassemble the image and strengthen the details to complete a high-fidelity try-on rendering, and the inaccurate recognition of human body structure cannot meet the market business needs. From another perspective, it is precisely because of the need for high fidelity that the system needs to invest limited computing resources in more important places. There is no such problem in the field of rendering such as the game industry. In addition, the image cutout algorithm can ensure the integrity of the image and make the visual effect of the whole picture better, while this step is not required for optimizing only the face, because there is no need to consider the impact of clothing on the face, nor is there any need to consider how to enhance the texture and personality of other parts.

在一些具体实施例中，获取模特信息的方式包括：生成参考模特模型，将该模型的信息参数作为模特信息；和/或，通过可配置的模特自定义输入，通过文本编码器将模特配置项文本转换为特征表示，将该模特的特征表示作为模特信息。模特信息作为模特的核心生成要素，支持用户建模自定义或者语义输入。而服装信息作为提供全局信息的参考，所以只着重考虑边缘信息。In some specific embodiments, the method of obtaining model information includes: generating a reference model and using information parameters of the model as model information; and/or, using a configurable model self-definition. The text encoder is used to convert the model configuration item text into feature representation, and the feature representation of the model is used as model information. Model information is the core generation element of the model, supporting user modeling customization or semantic input. Clothing information is used as a reference to provide global information, so only edge information is considered.

在一些具体实施例中，将解析信息、预设模特信息输入生成网络时，还包括：将服装信息输入生成网络，服装信息包括服装的边缘、轮廓、法线、色彩、纹理中的一项或多项特征。In some specific embodiments, when the parsing information and the preset model information are input into the generation network, it also includes: inputting the clothing information into the generation network, and the clothing information includes one or more features of the clothing's edge, contour, normal, color, and texture.

本实施例提供了一种基于生成网络的虚拟模特真实感增强方法，基于可控性高的生成式AI技术，充分利用数字服装与虚拟模特的先验信息，能够显著提高服装模特的真实感，同时保持建模服装的高保真度，高适应性地应用于数字世界虚拟模特向现实世界真人模特的对齐。This embodiment provides a method for enhancing the realism of a virtual model based on a generative network. Based on highly controllable generative AI technology, it makes full use of the prior information of digital clothing and virtual models, can significantly improve the realism of clothing models, while maintaining high fidelity of modeled clothing, and is highly adaptable to the alignment of virtual models in the digital world to real-life models in the real world.

本公开实施例还公开了一种基于生成网络的虚拟模特真实感增强系统，将上述实施例的一种基于生成网络的虚拟模特真实感增强方法系统化，使其更具实用性。基于生成网络的虚拟模特真实感增强系统的整体结构图如说明书附图4所示，具体方案如下：The disclosed embodiment also discloses a virtual model reality enhancement system based on a generative network, which systematizes the virtual model reality enhancement method based on a generative network in the above embodiment to make it more practical. The overall structure diagram of the virtual model reality enhancement system based on a generative network is shown in Figure 4 of the specification, and the specific scheme is as follows:

一种基于生成网络的虚拟模特真实感增强系统，包括如下模块：A virtual model reality enhancement system based on a generative network includes the following modules:

输入单元1，用于获取关于穿着数字服装的虚拟模特的渲染图；An input unit 1 is used to obtain a rendering of a virtual model wearing digital clothing;

解析信息获取单元2，用于判断渲染图中关于数字服装的解析信息是否已知，解析信息包括只涉及数字服装的服装区域和只涉及人体结构的模特区域；The analytical information acquisition unit 2 is used to determine whether the analytical information about the digital clothing in the rendering image is known, and the analytical information includes a clothing area related only to the digital clothing and a model area related only to the human body structure;

第一转换单元3，用于将解析信息由图像像素空间转换到隐空间后输入到生成网络中；A first conversion unit 3, used to convert the parsing information from the image pixel space to the latent space and then input it into the generation network;

生成单元4，用于在包括模特信息在内的生成控制条件的控制下，使生成网络基于模特信息在模特区域生成一个真实感更强的模特形象，模特形象维持虚拟模特的形态并且穿着数字服装；A generating unit 4 is used to enable the generating network to generate a more realistic model image in the model area based on the model information under the control of the generating control conditions including the model information, wherein the model image maintains the form of the virtual model and wears the digital clothing;

第二转换单元5，用于将模特形象由隐空间转换到图像像素空间后，得到穿着数字服装的模特生成图。 The second conversion unit 5 is used to convert the model image from the latent space to the image pixel space to obtain a generated image of the model wearing the digital clothing.

其中，生成控制条件中还包括涉及数字服装的边缘、轮廓、法线及色彩的服装信息；通过服装信息使生成网络强化对数字服装的局部细节和全局信息的理解和认知，进一步确保模特形象不超出模特区域。Among them, the generation control conditions also include clothing information related to the edges, contours, normals and colors of digital clothing; through clothing information, the generation network strengthens its understanding and cognition of the local details and global information of digital clothing, further ensuring that the model image does not exceed the model area.

优选地，模特信息包括年龄、性别、肤色、脸型、五官以及发型；和或，生成网络包括生成对抗网络和扩散模型；和/或，基于图像分割技术分割服装区域，基于人体解析技术提取虚拟模特区域，并基于图像抠图技术对模特区域进行适应不同场景的不同粒度的处理。Preferably, the model information includes age, gender, skin color, face shape, facial features and hairstyle; and or, the generative network includes a generative adversarial network and a diffusion model; and/or, the clothing area is segmented based on image segmentation technology, the virtual model area is extracted based on human body analysis technology, and the model area is processed at different granularities to adapt to different scenes based on image cutout technology.

优选地，通过图像编码器进行编码以将解析信息由图像像素空间转换到隐空间；通过与图像编码器匹配的图像解码器进行解码以将模特形象由隐空间转换到图像像素空间。Preferably, encoding is performed by an image encoder to convert the parsing information from the image pixel space to the latent space; decoding is performed by an image decoder matched with the image encoder to convert the model image from the latent space to the image pixel space.

本领域技术人员可以理解附图只是一个优选实施场景的示意图，附图中的模块或流程并不一定是实施本公开所必须的。本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中，也可以进行相应变化位于不同于本实施场景的一个或多个装置中。Those skilled in the art will appreciate that the accompanying drawings are only schematic diagrams of a preferred implementation scenario, and the modules or processes in the accompanying drawings are not necessarily required for implementing the present disclosure. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario can be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or can be changed accordingly and located in one or more devices different from the present implementation scenario.

本说明书实施例还提供一种电子设备，包括：处理器；用于存储处理器可执行指令的存储器；其中，处理器通过运行所述可执行指令以实现上述任一方法实施例。An embodiment of the present specification also provides an electronic device, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor implements any of the above method embodiments by running the executable instructions.

图5是一示例性实施例提供的一种设备的结构示意图。请参考图5，在硬件层面，该设备包括处理器702、内部总线704、网络接口706、内存708以及非易失性存储器710，当然还可能包括其他业务所需要的硬件。本说明书一个或多个实施例可以基于软件方式来实现，比如由处理器702从非易失性存储器710中读取对应的计算机程序到内存708中然后运行。当然，除了软件实现方式之外，本说明书一个或多个实施例并不排除其他实现方式，比如逻辑器件抑或软硬件结合的方式等等，也就是说以下处理流程的执行主体并不限定于各个逻辑单元，也可以是硬件或逻辑器件。FIG5 is a schematic diagram of the structure of a device provided by an exemplary embodiment. Please refer to FIG5. At the hardware level, the device includes a processor 702, an internal bus 704, a network interface 706, a memory 708, and a non-volatile memory 710, and may also include hardware required for other services. One or more embodiments of this specification may be implemented based on software, such as the processor 702 reading the corresponding computer program from the non-volatile memory 710 into the memory 708 and then running it. Of course, in addition to the software implementation, one or more embodiments of this specification do not exclude other implementations, such as logic devices or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or logic devices.

上述实施例阐明的系统、装置、模块或单元，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机，计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, which may be a personal computer, a laptop computer, a cellular phone, or a tablet computer. A home phone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

在一个典型的配置中，计算机包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computer includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

本说明书实施例还提供一种计算机可读存储介质，其上存储有计算机指令，该指令被处理器执行时实现上述任一方法实施例的步骤。The embodiments of the present specification also provide a computer-readable storage medium on which computer instructions are stored. When the instructions are executed by a processor, the steps of any of the above method embodiments are implemented.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带、磁盘存储、量子存储器、基于石墨烯的存储介质或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information. Information can be computer readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined in this article, computer readable media does not include temporary computer readable media (transitory media), such as modulated data signals and carrier waves.

本公开提供了一种基于生成网络的虚拟模特真实感增强方法及其系统，基于可控性高的生成式AI技术，充分利用数字服装与虚拟模特的先验信息，能够显著提高服装模特的真实感，同时保持建模服装的高保真度，高适应性地应用于数字世界虚拟模特向现实世界真人模特的对齐。The present disclosure provides a method and system for enhancing the sense of reality of a virtual model based on a generative network. Based on a highly controllable generative AI technology, the method makes full use of the prior information of digital clothing and virtual models, can significantly improve the sense of reality of clothing models, while maintaining the high fidelity of modeled clothing, and is highly adaptable to the alignment of virtual models in the digital world to real-life models in the real world.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device that includes a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of more restrictions, the elements defined by the sentence "comprises a..." do not exclude the elements in the inclusion. There are other identical elements in the process, method, product or equipment that includes the elements.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The above is a description of a specific embodiment of the specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in an order different from that in the embodiments and still achieve the desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or continuous order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

在本说明书一个或多个实施例使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in one or more embodiments of this specification are only for the purpose of describing specific embodiments, and are not intended to limit one or more embodiments of this specification. The singular forms of "a", "said" and "the" used in one or more embodiments of this specification and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used herein refers to and includes any or all possible combinations of one or more associated listed items.

应当理解，尽管在本说明书一个或多个实施例可能采用术语第一、第二、第三等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本说明书一个或多个实施例范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used to describe various information in one or more embodiments of this specification, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of one or more embodiments of this specification, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining".

以上所述仅为本说明书一个或多个实施例的较佳实施例而已，并不用以限制本说明书一个或多个实施例，凡在本说明书一个或多个实施例的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本说明书一个或多个实施例保护的范围之内。 The above description is merely a preferred embodiment of one or more embodiments of the present specification and is not intended to limit one or more embodiments of the present specification. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of one or more embodiments of the present specification shall be included in the scope of protection of one or more embodiments of the present specification.

Claims

A method for enhancing the sense of reality of a virtual model based on a generative network, characterized by comprising the following steps:

Get renderings of virtual models wearing digital clothing;

Determining whether analytical information about the digital clothing in the rendering is known, the analytical information including a clothing area related only to the digital clothing and a model area related only to the human body structure;

If unknown, the clothing area and model area are extracted based on a preset algorithm;

The parsed information is converted from the image pixel space to the latent space and then input into the generation network;

Under the control of generation control conditions including model information, the generation network generates a model image with a stronger sense of reality in the model area based on the model information, wherein the model image maintains the form of the virtual model and wears the digital clothing;

After converting the model image from latent space to image pixel space, a generated image of the model wearing the digital clothing is obtained.

The method for enhancing the reality of a virtual model according to claim 1, characterized in that the generation control condition also includes clothing information, and the clothing information includes the edge, contour, normal and color of the digital clothing;

The clothing information enables the generation network to strengthen its understanding and cognition of local details and global information of the digital clothing, further ensuring that the model image does not exceed the model area.

The method for enhancing the reality of a virtual model according to claim 1 is characterized in that the model information includes posture, age, gender, skin color, face shape, facial features and hairstyle.

The method for enhancing the sense of reality of a virtual model according to claim 1 is characterized in that the generative network includes a generative adversarial network and a diffusion model.

The method for enhancing the sense of reality of a virtual model according to claim 1, characterized in that encoding is performed by an image encoder to convert the parsed information from an image pixel space to a latent space;

The model image is decoded by an image decoder matched with the image encoder to transform the model image from the latent space into the image pixel space.

The method for enhancing the reality of a virtual model according to claim 1 is characterized in that the clothing area is segmented based on image segmentation technology, the virtual model area is extracted based on human body analysis technology, and the model area is processed at different granularities to adapt to different scenes based on image cutout technology.

A virtual model reality enhancement system based on a generative network, characterized by comprising the following modules:

An input unit, for obtaining a rendering of a virtual model wearing digital clothing;

a parsing information acquisition unit, configured to determine whether parsing information about the digital clothing in the rendering is known, the parsing information including a clothing area related only to the digital clothing and a model area related only to the human body structure;

A first conversion unit, used to convert the parsing information from the image pixel space to the latent space and then input it into the generation network;

A generating unit, configured to enable the generating network to generate a model image with a stronger sense of reality in the model area based on the model information under the control of generating control conditions including the model information, wherein the model image maintains the form of the virtual model and wears the digital clothing;

The second conversion unit is used to convert the model image from the latent space to the image pixel space to obtain a generated image of the model wearing the digital clothing.

The virtual model reality enhancement system according to claim 7, characterized in that the generation control conditions also include clothing information, and the clothing information includes the edge, contour, normal and color of the digital clothing;

The virtual model reality enhancement system according to claim 7, characterized in that the model information includes posture, age, gender, skin color, face shape, facial features and hairstyle;

And/or, the generative network includes a generative adversarial network and a diffusion model;

And/or, the clothing area is segmented based on the image segmentation technology, the virtual model area is extracted based on the human body analysis technology, and the model area is processed with different granularities adapted to different scenes based on the image cutout technology.

The virtual model reality enhancement system according to claim 7, characterized in that the parsed information is converted from the image pixel space to the latent space by encoding through an image encoder;

receiving a rendering of a virtual model wearing the digital garment;

Extracting parsed information of the rendering, the parsed information including a clothing area and a model area;

The analysis information and the model information are input into a generation network to generate a model image wearing the digital clothing and corresponding to the model information.

The method for enhancing the reality of a virtual model according to claim 11 is characterized in that the clothing area is the area where the digital clothing is located, the model area is the human body structure of the virtual model exposed outside the digital clothing, and the model area does not include the human body structure covered by the digital clothing.

The method for enhancing the sense of reality of a virtual model according to claim 11, characterized in that before obtaining the analytical information of the rendering image, it also includes:

Determining whether the rendering contains parsing information;

If the analysis information is not included, the clothing area and the model area are extracted based on a preset algorithm, and the edges of the clothing area and the model area are accurately expressed.

The method for enhancing the reality of a virtual model according to claim 13 is characterized in that the clothing area is extracted based on an image segmentation algorithm, the model area is extracted based on a human body analysis algorithm, and the edges of the clothing area and the model area are processed based on an image cutout algorithm to achieve accurate expression of the edges of the clothing area and the model area.

The method for enhancing the reality of a virtual model according to claim 11, wherein the method of obtaining the model information comprises:

Generate a reference model and use information parameters of the model as model information; and/or,

Through the configurable model custom input, the model configuration item text is converted into feature representation through a text encoder, and the feature representation of the model is used as the model information.

The method for enhancing the reality of a virtual model according to claim 11 is characterized in that the model area includes area contour information and human body part information.

The method for enhancing the reality of a virtual model according to claim 11 is characterized in that when the parsing information and the preset model information are input into the generation network, it also includes: inputting clothing information into the generation network, and the clothing information includes one or more features of the edge, contour, normal, color, and texture of the clothing.

An electronic device, comprising:

processor;

a memory for storing processor-executable instructions;

The processor implements the method according to any one of claims 11 to 17 by running the executable instructions.

A computer-readable storage medium, characterized in that computer instructions are stored thereon, and when the instructions are executed by a processor, the steps of the method as described in any one of claims 11 to 17 are implemented.