WO2025060317A1

WO2025060317A1 - Workpiece surface morphology generation method and apparatus based on multimodal image generation

Info

Publication number: WO2025060317A1
Application number: PCT/CN2024/073663
Authority: WO
Inventors: 孙立剑; 曹卫强; 李杨阳
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-09-22
Filing date: 2024-01-23
Publication date: 2025-03-27
Anticipated expiration: 2026-03-22
Also published as: JP2025535208A; CN116977652A; CN116977652B

Abstract

Disclosed in the present invention are a workpiece surface morphology generation method and apparatus based on multimodal image generation, belonging to the technical field of machining data processing. The method comprises: on the basis of multimodal information of historical data of different machining modes, constructing a guide vector; on the basis of a diffusion model, adding noise to a low-dimensional representation of a surface morphology grayscale image to obtain a noise vector, inputting the guide vector, a time step length, and the noise vector into a reverse diffusion process, and reducing noise layer by layer to obtain another low-dimensional representation, thereby training the model; extracting target multimodal information to construct a target guide vector, inputting a random noise hidden variable and the target guide vector into the trained diffusion model to obtain a target low-dimensional representation, and obtaining a target surface morphology grayscale image by means of a decoder; and using an image quality comprehensive evaluation module to perform quality evaluation. The present invention uses a diffusion model to achieve accurate mapping of multimodal information to a surface morphology image, having the characteristics of rapid generation and high fidelity, and having great potential for real-time surface morphology prediction.

Description

Method and device for generating workpiece surface morphology based on multimodal image generation

Technical Field

本发明属于机加工数据处理技术领域，具体涉及一种基于多模态图像生成的工件表面形貌生成方法和装置。The present invention belongs to the technical field of machining data processing, and in particular relates to a method and device for generating workpiece surface topography based on multimodal image generation.

Background Art

在机械加工领域，机加工工艺是其基础，不同的加工参数影响零件的表面质量，包括零件表面粗糙度、面形误差等，关系着零件表面的力学性能和光学性能等。由于不同的加工参数对应不同的加工形貌，在实际实验中已有很多加工参数和对应的形貌测量历史数据，如何利用已有的数据作为先验，为后续加工参数提供经验知识，是实现智能制造的关键因素之一。In the field of mechanical processing, machining technology is its foundation. Different processing parameters affect the surface quality of parts, including surface roughness and surface shape error, which are related to the mechanical and optical properties of the surface of parts. Since different processing parameters correspond to different processing topography, there are many historical data of processing parameters and corresponding topography measurements in actual experiments. How to use the existing data as a priori to provide empirical knowledge for subsequent processing parameters is one of the key factors in realizing intelligent manufacturing.

现有的常规技术通常需要使用昂贵的检测设备对机加工零件的表面形貌进行离线或在位测量，而且需要不断进行加工-检测-再加工的流程，多次迭代才能收敛到目标的加工精度，不仅消耗大量的人力和金钱，加工效率也较为低下，因此通过零件表面形貌仿真和表面形貌灰度图像生成方法更具优势，目前的表面形貌仿真方法分为两类，通过解析模型求解工件的三维表面形貌以及近年来利用学习类方法进行三维表面形貌的生成，但是基于表面形貌仿真求解表面形貌图像的方法，通常建模复杂，计算量大，耗时长。Existing conventional technologies usually require the use of expensive detection equipment to measure the surface morphology of machined parts offline or in-situ, and require continuous processing-detection-reprocessing processes, and multiple iterations to converge to the target processing accuracy. This not only consumes a lot of manpower and money, but also has relatively low processing efficiency. Therefore, part surface morphology simulation and surface morphology grayscale image generation methods are more advantageous. The current surface morphology simulation methods are divided into two categories, solving the three-dimensional surface morphology of the workpiece through analytical models and using learning methods to generate three-dimensional surface morphology in recent years. However, methods based on surface morphology simulation to solve surface morphology images are usually complex in modeling, large in calculation, and time-consuming.

公开号为CN112387995A的专利文献公开了一种自由曲面超精密车削后表面形貌预测方法，包括：通过结合刀具轨迹规划和表面形貌仿真两个研究方向，根据基于主动控制加工精度规划的刀具轨迹将需要进行表面形貌仿真的区域L_x×L_y按照分辨率为d_x和d_y划分为m×n的网格，根据各网格点与已规划刀具轨迹上的刀触点的几何位置关系计算出仿真区域内所有网格点的坐标数据，再应用计算得到的坐标数据进行曲面重构，即可实现曲面单点金刚石车削的表面形貌仿真建模，将得到的仿真模型去除掉曲面的形状成分即可对加工误差进行预测。但是该发明采用的基于解析建模的方法求解表面形貌存在建模复杂、计算量大、耗时较长的问题。The patent document with publication number CN112387995A discloses a method for predicting the surface morphology after ultra-precision turning of a free-form surface, including: combining the two research directions of tool path planning and surface morphology simulation, according to the tool path planned based on active control machining accuracy, the surface morphology to be processed is predicted. The area _Lx × _Ly of the topography simulation is divided into m×n grids according to the resolutions _dx and _dy , and the coordinate data of all the grid points in the simulation area are calculated according to the geometric position relationship between each grid point and the tool contact point on the planned tool trajectory, and then the calculated coordinate data is used to reconstruct the surface, so that the surface topography simulation modeling of the single-point diamond turning of the surface can be realized, and the shape component of the surface can be removed from the obtained simulation model to predict the processing error. However, the method based on analytical modeling used in the invention to solve the surface topography has the problems of complex modeling, large amount of calculation, and long time consumption.

公开号为CN116012480A的专利文献公开了数据驱动的切削加工表面形貌灰度图像生成方法，采用基于神经网络的生成对抗网络模型中的生成器和判别器，将加工信号频谱图和刀具刚度数据直接转换为切削表面形貌灰度图像数据，但是该发明采用的生成对抗网络模型精度不足，生成的数据无法很好地模拟实际数据的分布，泛化性能较差。Patent document with publication number CN116012480A discloses a data-driven method for generating grayscale images of cutting surface morphology. The method uses a generator and a discriminator in a generative adversarial network model based on a neural network to directly convert the processing signal spectrum and tool stiffness data into cutting surface morphology grayscale image data. However, the generative adversarial network model used in the invention is not accurate enough, and the generated data cannot simulate the distribution of actual data well, and the generalization performance is poor.

发明内容Summary of the invention

本发明的目的是提供一种基于多模态图像生成的工件表面形貌生成方法和装置，采用扩散模型，实现了从工件多模态信息到工件表面形貌图像的准确映射，在历史数据的利用上具有更加强大的能力，一方面，能够实现快速、高保真性的表面形貌灰度图像生成，另一方面，能够通过实时的加工信息准确预测当前的表面形貌信息。The purpose of the present invention is to provide a method and device for generating workpiece surface morphology based on multimodal image generation, which adopts a diffusion model to achieve accurate mapping from workpiece multimodal information to workpiece surface morphology images, and has more powerful capabilities in utilizing historical data. On the one hand, it can achieve fast and high-fidelity surface morphology grayscale image generation, and on the other hand, it can accurately predict current surface morphology information through real-time processing information.

为实现上述发明目的，本发明提供的技术方案如下：In order to achieve the above-mentioned invention object, the technical solution provided by the present invention is as follows:

第一方面，本发明实施例提供的一种基于多模态图像生成的工件表面形貌生成方法，包括以下步骤：In a first aspect, an embodiment of the present invention provides a method for generating a workpiece surface morphology based on multimodal image generation, comprising the following steps:

步骤1：采集不同加工方式的表面形貌图像和加工信号频谱图并进行标注得到多模态信息，并对多模态信息处理得到引导向量；Step 1: Collect surface morphology images and processing signal spectra of different processing methods and annotate them to obtain multimodal information, and process the multimodal information to obtain guidance vectors;

步骤2：通过编码器将表面形貌图像对应的灰度图压缩为第一低维表征，输入到扩散模型中，通过前向扩散过程对第一低维表征层层加噪得到噪声向量，基于引导向量、时间步长和所述噪声向量经过逆向扩散过程的层层降噪还原出第二低维表征，并对扩散模型进行训练；Step 2: The grayscale image corresponding to the surface morphology image is compressed into the first low-dimensional representation through the encoder, and input into the diffusion model. The first low-dimensional representation is noised layer by layer through the forward diffusion process to obtain The noise vector is restored to a second low-dimensional representation based on the guide vector, the time step and the noise vector through layer-by-layer denoising in the reverse diffusion process, and the diffusion model is trained;

步骤3：提取应用时目标多模态信息构建目标引导向量，将随机生成的高斯噪声的隐变量、时间步长和所述目标引导向量输入到训练好的扩散模型中，经过逆向扩散过程得到目标低维表征，将所述目标低维表征通过解码器得到目标表面形貌灰度图；Step 3: Extract the target multimodal information during application to construct a target guidance vector, input the latent variable of the randomly generated Gaussian noise, the time step and the target guidance vector into the trained diffusion model, obtain the target low-dimensional representation through the reverse diffusion process, and obtain the target surface morphology grayscale image through the decoder;

步骤4：将所述目标表面形貌灰度图输入图像质量综合评价模块，用以评价所述目标表面形貌灰度图的保真性。Step 4: Input the target surface morphology grayscale image into the image quality comprehensive evaluation module to evaluate the fidelity of the target surface morphology grayscale image.

本发明采用扩散模型，首先基于机加工工艺历史数据，采集不同加工方式的多模态信息，构建引导向量；模型训练过程中，通过编码器将表面形貌图像对应的灰度图压缩为第一低维表征，利用扩散模型的前向扩散过程对第一低维表征层层加噪得到噪声向量，再利用扩散模型的逆向扩散过程层层降噪，将噪声向量还原为第二低维表征，并对扩散模型进行训练；实际应用过程中，根据目标多模态信息构建目标引导向量，将随机生成的二维高斯噪声隐变量、时间步长以及目标引导向量输入训练好的扩散模型得到目标低维表征，采用解码器将目标低维表征转化为表面形貌灰度图；最后采用图像质量综合评价模块评价生成图像的保真性。本发明提出的方法实现了多模态信息到表面形貌图像的准确映射，充分利用了历史数据，能够实现快速、高保真性的表面形貌灰度图像生成，也非常适用于实时加工场景中的表面形貌图像预测。The present invention adopts a diffusion model. First, based on the historical data of machining process, multimodal information of different processing methods is collected to construct a guide vector. During the model training process, the grayscale image corresponding to the surface morphology image is compressed into a first low-dimensional representation through an encoder, and the first low-dimensional representation is denoised layer by layer using the forward diffusion process of the diffusion model to obtain a noise vector, and then the reverse diffusion process of the diffusion model is used to reduce the noise layer by layer, and the noise vector is restored to a second low-dimensional representation, and the diffusion model is trained. In the actual application process, the target guide vector is constructed according to the target multimodal information, and the randomly generated two-dimensional Gaussian noise hidden variables, time steps and target guide vectors are input into the trained diffusion model to obtain the target low-dimensional representation, and the decoder is used to convert the target low-dimensional representation into a surface morphology grayscale image; finally, the image quality comprehensive evaluation module is used to evaluate the fidelity of the generated image. The method proposed by the present invention realizes the accurate mapping of multimodal information to surface morphology images, makes full use of historical data, can realize fast and high-fidelity surface morphology grayscale image generation, and is also very suitable for surface morphology image prediction in real-time processing scenarios.

进一步的，步骤1中，对表面形貌图像和加工信号频谱图进行标注得到多模态信息，包括：Furthermore, in step 1, the surface morphology image and the processing signal spectrum diagram are annotated to obtain multimodal information, including:

对所述表面形貌图像进行标注得到对应的文本信息，其中，文本信息包括加工方法、进给量、工件材料、刀具的几何形状、刀具与工件之间的振动；The surface topography image is annotated to obtain corresponding text information, wherein the text information includes processing method, feed rate, workpiece material, tool geometry, and the relationship between the tool and the workpiece. vibration;

对所述加工信号频谱图进行标注得到对应的加工频谱信号；Annotating the processed signal spectrum diagram to obtain a corresponding processed spectrum signal;

所述多模态信息包含文本信息和加工频谱信号。The multimodal information includes text information and processed spectrum signals.

进一步的，步骤1中，所述对多模态信息处理得到引导向量，包括：Furthermore, in step 1, the processing of the multimodal information to obtain the guidance vector includes:

将所述文本信息和所述加工频谱信号分别通过文本编码器和频谱信号编码器，转换为表征形式并级联，得到的嵌入特征向量作为引导向量，所述文本编码器和频谱信号编码器采用对比语言-图像预训练模型CLIP。The text information and the processed spectrum signal are converted into representation forms and cascaded through a text encoder and a spectrum signal encoder respectively, and the obtained embedded feature vector is used as a guide vector. The text encoder and the spectrum signal encoder adopt a comparative language-image pre-training model CLIP.

进一步的，所述编码器与所述解码器组成变分自编码器。Furthermore, the encoder and the decoder form a variational autoencoder.

进一步的，步骤2中，所述逆向扩散过程采用基于交叉注意力机制的Unet噪声估计网络，所述Unet噪声估计网络用于生成估计噪声，所述估计噪声用于每时间步进行降噪。Furthermore, in step 2, the reverse diffusion process adopts a Unet noise estimation network based on a cross-attention mechanism, and the Unet noise estimation network is used to generate estimated noise, and the estimated noise is used for noise reduction at each time step.

进一步的，步骤4中，所述图像质量综合评价模块包含高维语义特征提取器、低维形变特征提取器以及回归模型；Furthermore, in step 4, the image quality comprehensive evaluation module includes a high-dimensional semantic feature extractor, a low-dimensional deformation feature extractor, and a regression model;

目标表面形貌灰度图分别经过高维语义特征提取器和低维形变特征提取器提取语义特征和形变特征，所述语义特征和形变特征经过特征融合后输入回归模型，经过回归模型的逻辑回归预测质量得分，该质量得分用于评价所述目标表面形貌灰度图的保真性。The target surface morphology grayscale image is respectively subjected to a high-dimensional semantic feature extractor and a low-dimensional deformation feature extractor to extract semantic features and deformation features. The semantic features and deformation features are input into the regression model after feature fusion. The quality score is predicted by the logistic regression of the regression model. The quality score is used to evaluate the fidelity of the target surface morphology grayscale image.

进一步的，步骤4中，所述高维语义特征提取器包括预训练的EfficientNetV2网络，所述低维形变特征提取器包括预训练的VGG16网络。Furthermore, in step 4, the high-dimensional semantic feature extractor includes a pre-trained EfficientNetV2 network, and the low-dimensional deformation feature extractor includes a pre-trained VGG16 network.

第二方面，为实现上述发明目的，本发明实施例还提供了一种基于多模态图像生成的工件表面形貌生成装置，包括引导向量构建单元、模型训练单元、模型应用单元、质量评估单元；In a second aspect, in order to achieve the above-mentioned purpose of the invention, an embodiment of the present invention further provides a workpiece surface morphology generation device based on multimodal image generation, comprising a guidance vector construction unit, a model training unit, a model application unit, and a quality assessment unit;

所述引导向量构建单元用于采集不同加工方式的表面形貌图像和加工信号频谱图并进行标注得到多模态信息，并对多模态信息处理得到引导向量；The guidance vector construction unit is used to collect surface morphology images and processing signal spectrum diagrams of different processing methods and annotate them to obtain multimodal information, and to process the multimodal information to obtain guidance. vector;

所述模型训练单元用于通过编码器将表面形貌图像对应的灰度图压缩为第一低维表征，输入到扩散模型中，通过前向扩散过程对第一低维表征层层加噪得到噪声向量，基于引导向量、时间步长和所述噪声向量经过逆向扩散过程的层层降噪还原出第二低维表征，并对扩散模型进行训练；The model training unit is used to compress the grayscale image corresponding to the surface morphology image into a first low-dimensional representation through an encoder, input it into the diffusion model, add noise to the first low-dimensional representation layer by layer through a forward diffusion process to obtain a noise vector, restore the second low-dimensional representation through layer-by-layer noise reduction through a reverse diffusion process based on the guide vector, the time step and the noise vector, and train the diffusion model;

所述模型应用单元用于提取应用时目标多模态信息构建目标引导向量，将随机生成的高斯噪声的隐变量、时间步长和所述目标引导向量输入到训练好的扩散模型中，经过逆向扩散过程得到目标低维表征，将所述目标低维表征通过解码器得到目标表面形貌灰度图；The model application unit is used to extract the target multimodal information during application to construct a target guidance vector, input the hidden variable of the randomly generated Gaussian noise, the time step and the target guidance vector into the trained diffusion model, obtain the target low-dimensional representation through the reverse diffusion process, and obtain the target surface morphology grayscale image through the decoder;

所述质量评估单元用于将所述目标表面形貌灰度图输入图像质量综合评价模块，用以评价所述目标表面形貌灰度图的保真性。The quality assessment unit is used to input the target surface morphology grayscale image into the image quality comprehensive assessment module to evaluate the fidelity of the target surface morphology grayscale image.

第三方面，为实现上述发明目的，本发明实施例还提供了一种基于多模态图像生成的工件表面形貌生成设备，包括存储器和处理器，所述存储器用于存储计算机程序，所述处理器用于当执行所述计算机程序时，实现第一方面本发明实施例提供的基于多模态图像生成的工件表面形貌生成方法。In the third aspect, in order to achieve the above-mentioned purpose of the invention, an embodiment of the present invention also provides a workpiece surface morphology generation device based on multimodal image generation, including a memory and a processor, the memory is used to store a computer program, and the processor is used to implement the workpiece surface morphology generation method based on multimodal image generation provided by the embodiment of the present invention in the first aspect when executing the computer program.

第四方面，为实现上述发明目的，本发明实施例还提供了一种计算机可读的存储介质，所述存储介质上存储有计算机程序，所述计算机程序使用计算机时，实现第一方面本发明实施例提供的基于多模态图像生成的工件表面形貌生成方法。In a fourth aspect, in order to achieve the above-mentioned purpose of the invention, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored. When the computer program uses a computer, the method for generating workpiece surface morphology based on multimodal image generation provided in the embodiment of the present invention in the first aspect is implemented.

本发明的有益效果如下：The beneficial effects of the present invention are as follows:

(1)本发明提出的方法实现了从多模态信息到表面形貌灰度图像的准确映射，相较于传统的检测设备和表面形貌仿真方法，本发明提出的方法大大减少了数据采集时间，在历史数据的利用上具有更加强大的能力； (1) The method proposed in the present invention realizes accurate mapping from multimodal information to surface morphology grayscale images. Compared with traditional detection equipment and surface morphology simulation methods, the method proposed in the present invention greatly reduces data collection time and has a more powerful ability in utilizing historical data;

(2)本发明提出的方法仅仅包含生成器的训练过程，即前向扩散过程和逆向扩散过程的训练，训练过程相对稳定，相较于现有技术中包含生成器和判别器两部分的训练，且两个网络都需要收敛，本发明提出的方法大大减少了图像生成时间，同时避免了生成对抗网络在训练过程的模式崩溃、泛化性能差等问题；(2) The method proposed in the present invention only includes the training process of the generator, that is, the training of the forward diffusion process and the reverse diffusion process. The training process is relatively stable. Compared with the prior art, which includes the training of both the generator and the discriminator, and both networks need to converge, the method proposed in the present invention greatly reduces the image generation time and avoids the problems of mode collapse and poor generalization performance of the generative adversarial network during the training process.

(3)本发明提出的方法在实际应用中采用随机生成的噪声作为扩散模型的输入，具备随机性和多样性，因此本发明提出的方法能够生成各种高质量、多样性且符合真实情况的表面形貌图像；(3) In practical applications, the method proposed in the present invention uses randomly generated noise as the input of the diffusion model, which has randomness and diversity. Therefore, the method proposed in the present invention can generate various high-quality, diverse and realistic surface morphology images;

(4)本发明提出的方法能够实现通过实时的加工信息准确预测当前的表面形貌信息，从而帮助加工者及时获取加工过程中的表面形貌图像，有利于工人快速判断加工零件的产品质量，及时调整加工参数。(4) The method proposed in the present invention can accurately predict the current surface morphology information through real-time processing information, thereby helping the processor to obtain the surface morphology image of the processing process in a timely manner, which is conducive to workers to quickly judge the product quality of the processed parts and adjust the processing parameters in time.

BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例提供的基于多模态图像生成的工件表面形貌生成方法的流程图；1 is a flow chart of a method for generating a workpiece surface topography based on multimodal image generation provided by an embodiment of the present invention;

图2是本发明实施例提供的基于交叉注意力机制的逆向扩散过程的流程示意图；FIG2 is a schematic diagram of a flow chart of a reverse diffusion process based on a cross-attention mechanism according to an embodiment of the present invention;

图3是本发明实施例提供的基于扩散模型的实际应用过程的流程示意图；3 is a schematic diagram of a flow chart of an actual application process based on a diffusion model provided by an embodiment of the present invention;

图4是本发明实施例提供的图像质量综合评价模块的结构示意图；4 is a schematic diagram of the structure of an image quality comprehensive evaluation module provided by an embodiment of the present invention;

图5是本发明实施例提供的基于多模态图像生成的工件表面形貌生成装置的结构示意图；5 is a schematic diagram of the structure of a workpiece surface topography generation device based on multimodal image generation provided by an embodiment of the present invention;

图6是本发明实施例提供的基于多模态图像生成的工件表面形貌生成设备的结构示意图。FIG6 is a schematic diagram of the structure of a workpiece surface topography generation device based on multimodal image generation provided by an embodiment of the present invention.

DETAILED DESCRIPTION

为使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例对本发明进行进一步的详细说明。应当理解，此处所描述的具体实施方式仅用以解释本发明，并不限定本发明的保护范围。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific implementation methods described herein are only used to explain the present invention and do not limit the scope of protection of the present invention.

本发明的发明构思为：针对目前基于解析建模方法求解表面形貌存在的建模复杂、计算量大、耗时较长，而基于生成对抗网络模型精度不足，生成的数据无法很好地模拟实际数据的分布，泛化性能较差的问题，本发明提出一种基于多模态图像生成的工件表面形貌生成方法和装置，通过扩散模型建立加工信号、加工方法、进给量、工件材料、刀具的几何形状以及刀具与工件之间的振动等多模态信息与表面形貌灰度图像的映射模型。相较于直接采用检测设备进行表面形貌采样和基于解析建模方法计算表面形貌的方法，本发明大大减少了数据采样时间和图像生成时间；相较于生成对抗网络的方法，本方法的训练过程相对稳定，不容易出现训练不稳定的情况，而且可以生成高质量、多样性且符合真实情况的样本，具有快速生成、高保真性的特点，对实时表面形貌预测有极大潜力。The inventive concept of the present invention is: in view of the problems that the current analytical modeling method for solving surface morphology is complex, the amount of calculation is large, and the time consumption is long, and the accuracy of the generative adversarial network model is insufficient, the generated data cannot well simulate the distribution of actual data, and the generalization performance is poor, the present invention proposes a workpiece surface morphology generation method and device based on multimodal image generation, and establishes a mapping model between multimodal information such as processing signals, processing methods, feed rates, workpiece materials, tool geometry, and vibration between tools and workpieces and surface morphology grayscale images through a diffusion model. Compared with the method of directly using detection equipment to sample surface morphology and calculating surface morphology based on analytical modeling methods, the present invention greatly reduces the data sampling time and image generation time; compared with the method of generating adversarial networks, the training process of this method is relatively stable, and it is not easy to have unstable training, and it can generate high-quality, diverse and realistic samples, with the characteristics of fast generation and high fidelity, and has great potential for real-time surface morphology prediction.

扩散模型是一种广泛的数学模型，在概率论、统计学和相关领域中应用广泛。具体来说，扩散模型描述了在一段时间内某种现象从初始状态向另一个状态演变或扩散的过程。扩散模型可以被认为是隐变量模型的一种形式，该模型试图学习数据分布下的噪声分布，使用马尔科夫链来逐步添加噪声到数据中，并在这个过程中学习数据的后验概率分布，在处理工件多模态信息这种复杂的数据分布时，扩散模型能够提供有效的解决方案。Diffusion model is a broad mathematical model that is widely used in probability theory, statistics and related fields. Specifically, diffusion model describes the process of a phenomenon evolving or spreading from an initial state to another state over a period of time. Diffusion model can be considered as a form of latent variable model, which attempts to learn the noise distribution under the data distribution, using Markov chain to gradually add noise to the data, and learn the posterior probability distribution of the data in the process. Diffusion model can provide an effective solution when dealing with complex data distribution such as multimodal information of artifacts.

图1是本发明实施例提供的基于多模态图像生成的工件表面形貌生成方法的流程图。如图1所示，实施例提供了一种基于多模态图像生成的工件表面形貌生成方法，包括以下步骤：FIG1 is a flow chart of a method for generating a workpiece surface topography based on multimodal image generation provided by an embodiment of the present invention. As shown in FIG1 , an embodiment provides a method for generating a workpiece surface topography based on multimodal image generation, comprising the following steps:

S110，采集不同加工方式的表面形貌图像和加工信号频谱图并进行标注得到多模态信息，并对多模态信息处理得到引导向量。S110, collect surface morphology images and processing signal spectrum diagrams of different processing methods and mark them The multimodal information is obtained by annotating the multimodal information and the guidance vector is obtained by processing the multimodal information.

本实施例中，以切削加工过程为例，采用三维扫描仪采集切削加工过程的实际加工零件表面数据，其中包含表面形貌图像和加工信号频谱图，将表面形貌图像投影到图像空间转换为表面形貌灰度图，作为对应的目标数据，用来监督生成的数据。In this embodiment, taking the cutting process as an example, a three-dimensional scanner is used to collect the actual surface data of the processed parts in the cutting process, which includes a surface morphology image and a processing signal spectrum diagram. The surface morphology image is projected into the image space and converted into a surface morphology grayscale image as the corresponding target data to supervise the generated data.

标注表面形貌灰度图对应的文本信息，所述文本信息包括加工方法、进给量、工件材料、刀具的几何形状以及刀具与工件之间的振动；标注加工信号频谱图的加工频谱信号，文本信息和加工频谱信号作为多模态信息。采用基于对比语言-图像预训练模型CLIP的文本编码器和频谱信号编码器，将文本信息和加工频谱信号转换为表征形式并级联，得到嵌入特征向量E_T，将得到的嵌入特征向量映射到与表面形貌灰度图的联合空间中，建立表面形貌灰度图和多模态信息之间的语义关系，所述嵌入特征向量作为引导向量，用于提供表面形貌灰度图生成条件，引导扩散模型的噪声估计网络在噪声向量上进行逆向扩散过程。The text information corresponding to the surface morphology grayscale image is annotated, and the text information includes the processing method, feed rate, workpiece material, tool geometry, and vibration between the tool and the workpiece; the processing spectrum signal of the processing signal spectrum image is annotated, and the text information and the processing spectrum signal are used as multimodal information. A text encoder and a spectrum signal encoder based on the contrastive language-image pre-training model CLIP are used to convert the text information and the processing spectrum signal into a representation form and cascade them to obtain an embedded feature vector _ET . The obtained embedded feature vector is mapped to a joint space with the surface morphology grayscale image, and a semantic relationship between the surface morphology grayscale image and the multimodal information is established. The embedded feature vector is used as a guiding vector to provide a surface morphology grayscale image generation condition to guide the noise estimation network of the diffusion model to perform a reverse diffusion process on the noise vector.

训练阶段主要采用历史加工参数和经过形貌检测仪、三维扫描仪测量的零件表面形貌数据，保证模型的准确性，实际应用阶段根据实际加工参数进行预测。The training phase mainly uses historical processing parameters and part surface morphology data measured by morphology detectors and 3D scanners to ensure the accuracy of the model. In the actual application phase, predictions are made based on actual processing parameters.

S120，通过编码器将表面形貌图像对应的灰度图压缩为第一低维表征，输入到扩散模型中，通过前向扩散过程对第一低维表征层层加噪得到噪声向量，基于引导向量、时间步长和所述噪声向量经过逆向扩散过程的层层降噪还原出第二低维表征，并对扩散模型进行训练。S120, compressing the grayscale image corresponding to the surface morphology image into a first low-dimensional representation through an encoder, inputting it into a diffusion model, adding noise to the first low-dimensional representation layer by layer through a forward diffusion process to obtain a noise vector, restoring the second low-dimensional representation through layer-by-layer denoising based on the guide vector, the time step and the noise vector through a reverse diffusion process, and training the diffusion model.

基于预训练的变分自编码器，包括编码器和解码器两个部分，采用编码器将S110中的表面形貌灰度图压缩成第一低维表征z₀，将第一低维表征z₀输入扩散模型。扩散模型具体包括前向扩散过程和逆向扩散过程，训练阶段包括训练前向扩散过程和优化逆向扩散过程，实际应用阶段使用训练好的逆向扩散过程。在训练阶段，基于随机生成的二维高斯噪声∈，前向扩散过程将输入的第一低维表征x₀进行层层加噪处理，经过T时间步之后得到加噪的噪声向量z_T。逆向扩散过程采用了基于多头Attention(Q,K,V)结构的Unet噪声估计网络，所述多头Attention(Q,K,V)结构属于一种交叉注意力机制，如图2所示，将噪声向量z_T，时间步T以及嵌入特征向量E_T，输入到基于多头Attention(Q,K,V)结构的Unet噪声估计网络，得到估计噪声∈_θ，将估计噪声∈_θ与随机生成的二维高斯噪声∈进行比较，建立估计噪声∈_θ与随机生成的二维高斯噪声∈之间的损失函数，所述损失函数用公式表示为：
Based on the pre-trained variational autoencoder, including an encoder and a decoder, the encoder is used to compress the surface morphology grayscale image in S110 into the first low-dimensional representation z ₀ , and the first low-dimensional representation z ₀ is input into the diffusion model. The diffusion model specifically includes a forward diffusion process and a reverse diffusion process. The training phase includes training the forward diffusion process and optimizing the reverse diffusion process. The trained reverse diffusion process is used in the actual application phase. In the training phase, based on the randomly generated two-dimensional Gaussian noise ∈, the forward diffusion process performs layer-by-layer noise processing on the first low-dimensional representation x ₀ of the input, and obtains the noisy noise vector z _T after T time steps. The reverse diffusion process adopts the Unet noise estimation network based on the multi-head Attention (Q, K, V) structure. The multi-head Attention (Q, K, V) structure belongs to a cross attention mechanism. As shown in Figure 2, the noise vector z _T , the time step T and the embedded feature vector _ET are input into the Unet noise estimation network based on the multi-head Attention (Q, K, V) structure to obtain the estimated noise ∈ _θ . The estimated noise ∈ _θ is compared with the randomly generated two-dimensional Gaussian noise ∈, and the loss function between the estimated noise ∈ _θ and the randomly generated two-dimensional Gaussian noise ∈ is established. The loss function is expressed by the formula:

其中，表示关于θ的微分，θ表示网络超参数，基于优化迭代算法，最小化损失函数，得到满足收敛条件的估计噪声∈_θ，经过多轮去噪，得到第二低维表征，当第二低维表征无限接近第一低维表征时，得到训练好的扩散模型。所述多头Attention(Q,K,V)结构用公式表示为：
in, represents the differential about θ, θ represents the network hyperparameter, based on the optimization iterative algorithm, the loss function is minimized to obtain the estimated noise ∈ _θ that meets the convergence condition, and after multiple rounds of denoising, the second low-dimensional representation is obtained. When the second low-dimensional representation is infinitely close to the first low-dimensional representation, the trained diffusion model is obtained. The multi-head Attention (Q, K, V) structure is expressed by the formula:

其中，Q表示查询(query)，K表示键(key)，V表示值(value)，Q＝W_Q·z_T，W_Q，K＝W_K·E_T，V＝W_V·E_T,W_Q、W_K和W_V分别表示Q、K、V对应的权值矩阵，softmax(·)表示一种归一化函数，d表示Q、K、V的嵌入维度，主要是为了缩小Q和K的点积范围，确保softmax(·)函数的梯度稳定性。Among them, Q represents query, K represents key, V represents value, Q = W _Q ·z _T , W _Q , K = W _K · _ET , V = W _V ·ET , W _Q , W _K and W _V represent the weight matrices corresponding to Q, K and V respectively, softmax(·) represents a normalization function, d represents the embedding dimension of Q, K and V, which is mainly to reduce the dot product range of Q and K _and ensure the gradient stability of the softmax(·) function.

S130，提取应用时目标多模态信息构建目标引导向量，将随机生成的高斯噪声的隐变量、时间步长和所述目标引导向量输入到训练好的扩散模型中，经过逆向扩散过程得到目标低维表征，将所述目标低维表征通过解码器得到目标表面形貌灰度图。S130, extracting the target multimodal information during application to construct a target guidance vector, inputting the latent variable of the randomly generated Gaussian noise, the time step and the target guidance vector into the trained diffusion model, obtaining a low-dimensional representation of the target through a reverse diffusion process, and converting the target low-dimensional representation into a target vector through a solution. The encoder obtains the grayscale image of the target surface morphology.

在实际应用过程中，提取目标多模态信息，包括文本信息(切削加工、进给量、工件材料、刀具的几何形状以及刀具与工件之间的振动)和加工频谱信号，通过文本编码器和频谱信号编码器分别将文本信息和加工频谱信号转换为表征形式并级联，得到的嵌入表征向量作为目标引导向量。将随机生成的二维高斯噪声信号转换为潜空间的表征向量，具体而言，通过对二维高斯噪声信号做卷积处理，得到大小为64×64的隐变量，将所述隐变量和目标引导向量输入到训练好的扩散模型中，进行逆向扩散过程，得到处理后的64×64大小的条件隐变量，经过优化迭代算法的多次去噪，得到目标低维表征，将目标低维表征送入到预训练的变分自编码器中的解码器部分，如图3所示，解码器将隐式的目标低维表征还原为图像信息，从而生成目标表面形貌灰度图。In the actual application process, the target multimodal information is extracted, including text information (cutting process, feed rate, workpiece material, tool geometry, and vibration between the tool and the workpiece) and processing spectrum signals. The text information and processing spectrum signals are converted into representation forms by the text encoder and the spectrum signal encoder respectively and cascaded. The obtained embedded representation vector is used as the target guidance vector. The randomly generated two-dimensional Gaussian noise signal is converted into a representation vector of the latent space. Specifically, by convolution processing the two-dimensional Gaussian noise signal, a hidden variable of size 64×64 is obtained. The hidden variable and the target guidance vector are input into the trained diffusion model, and the reverse diffusion process is performed to obtain the processed conditional hidden variable of size 64×64. After multiple denoising by the optimized iterative algorithm, the target low-dimensional representation is obtained, and the target low-dimensional representation is sent to the decoder part of the pre-trained variational autoencoder. As shown in Figure 3, the decoder restores the implicit target low-dimensional representation to image information, thereby generating a grayscale image of the target surface morphology.

S140，将所述目标表面形貌灰度图输入图像质量综合评价模块，用以评价所述目标表面形貌灰度图的保真性。S140, inputting the target surface morphology grayscale image into an image quality comprehensive evaluation module to evaluate the fidelity of the target surface morphology grayscale image.

本实施例中，采用预训练的变分自编码器的解码器将目标低维表征还原为目标表面形貌灰度图的3个不同视角的灰度图像，将生成的多视角灰度图像输入到一个图像质量综合评价模块中，进行特征提取和失真评估，用以评价生成的目标表面形貌灰度图在语义内容方面和保真性方面是否满足要求。In this embodiment, a decoder of a pre-trained variational autoencoder is used to restore the low-dimensional representation of the target into grayscale images of the target surface morphology grayscale map at three different perspectives. The generated multi-perspective grayscale image is input into a comprehensive image quality evaluation module for feature extraction and distortion assessment to evaluate whether the generated target surface morphology grayscale image meets the requirements in terms of semantic content and fidelity.

如图4所示，图像质量综合评价模块包括高维语义特征提取器和低维形变特征提取器，所述高维语义特征提取器采用预训练的EfficientNetV2网络的最后4层，提取高维特征作为高维语义失真特征，所述高维语义失真特征包括内容信息、物理特性、内容之间的时空关系等语义特征；低维形变特征提取器采用预训练的VGG16网络的前四层，提取低维特征作为低维形变失真特征，所述低维形变失真特征包括压缩、噪声、模糊、过曝、或过暗、色差、锐度、块状效应。将高维语义失真特征和低维形变失真特征通过特征融合后，输入到由三层全连接层组成的图像失真质量回归模型中，得到相应的质量得分。所述图像失真质量回归模型用来评估生成图像的质量，结合主客观评价指标，将高维特征(物体的物理特性、物体之间的时空关系、以及物体的内容信息等)和低维特征(压缩、噪声、模糊、过曝或过暗、色差、锐度、块状效应等)作为失真指标。以对真实场景下获得的高质量图像的失真处理为例，将高质量图像的失真指标设立为0，对这些高质量图像进行不同程度的失真化处理，并进行0-1之间的打分，得分越接近1失真程度越高，因此采用本发明方法生成的表面形貌灰度图在经过一系列处理之后，通过图像失真质量回归模型的逻辑回归得到的质量得分越低，表明失真越少，则质量越高，保真性越好。As shown in Figure 4, the comprehensive image quality evaluation module includes a high-dimensional semantic feature extractor and a low-dimensional deformation feature extractor. The high-dimensional semantic feature extractor uses the last 4 layers of the pre-trained EfficientNetV2 network to extract high-dimensional features as high-dimensional semantic distortion features. The high-dimensional semantic distortion features include semantic features such as content information, physical properties, and spatiotemporal relationships between contents. The low-dimensional deformation feature extractor uses the first four layers of the pre-trained VGG16 network to extract low-dimensional features as Low-dimensional deformation distortion features, the low-dimensional deformation distortion features include compression, noise, blur, overexposure, or too dark, color difference, sharpness, and block effect. After the high-dimensional semantic distortion features and the low-dimensional deformation distortion features are fused, they are input into the image distortion quality regression model composed of three fully connected layers to obtain the corresponding quality score. The image distortion quality regression model is used to evaluate the quality of the generated image, combining subjective and objective evaluation indicators, and taking high-dimensional features (physical properties of objects, spatiotemporal relationships between objects, and content information of objects, etc.) and low-dimensional features (compression, noise, blur, overexposure or too dark, color difference, sharpness, block effect, etc.) as distortion indicators. Taking the distortion processing of high-quality images obtained in real scenes as an example, the distortion index of the high-quality images is set to 0, and these high-quality images are distorted to different degrees and scored between 0 and 1. The closer the score is to 1, the higher the degree of distortion. Therefore, after a series of processing, the surface morphology grayscale image generated by the method of the present invention has a lower quality score obtained by the logistic regression of the image distortion quality regression model, indicating that the less the distortion, the higher the quality and the better the fidelity.

基于同样的发明构思，本发明实施例还提供了一种基于多模态图像生成的工件表面形貌生成装置500，如图5所示，包括引导向量构建单元510、模型训练单元520、模型应用单元530、质量评估单元540；Based on the same inventive concept, an embodiment of the present invention further provides a workpiece surface topography generation device 500 based on multimodal image generation, as shown in FIG5 , comprising a guidance vector construction unit 510 , a model training unit 520 , a model application unit 530 , and a quality assessment unit 540 ;

其中，引导向量构建单元510用于采集不同加工方式的表面形貌图像和加工信号频谱图并进行标注得到多模态信息，并对多模态信息处理得到引导向量；The guidance vector construction unit 510 is used to collect surface morphology images and processing signal spectrum diagrams of different processing methods and annotate them to obtain multimodal information, and process the multimodal information to obtain the guidance vector;

模型训练单元520用于通过编码器将表面形貌图像对应的灰度图压缩为第一低维表征，输入到扩散模型中，通过前向扩散过程对第一低维表征层层加噪得到噪声向量，基于引导向量、时间步长和所述噪声向量经过逆向扩散过程的层层降噪还原出第二低维表征，并对扩散模型进行训练；The model training unit 520 is used to compress the grayscale image corresponding to the surface morphology image into a first low-dimensional representation through an encoder, input it into the diffusion model, add noise to the first low-dimensional representation layer by layer through a forward diffusion process to obtain a noise vector, restore the second low-dimensional representation through layer-by-layer noise reduction through a reverse diffusion process based on the guide vector, the time step and the noise vector, and train the diffusion model;

模型应用单元530用于提取应用时目标多模态信息构建目标引导向量，将随机生成的高斯噪声的隐变量、时间步长和所述目标引导向量输入到训练好的扩散模型中，经过逆向扩散过程得到目标低维表征，将所述目标低维表征通过解码器得到目标表面形貌灰度图；The model application unit 530 is used to extract the target multimodal information during application to construct a target guidance vector, and input the hidden variable of the randomly generated Gaussian noise, the time step and the target guidance vector into the training In the trained diffusion model, a low-dimensional representation of the target is obtained through a reverse diffusion process, and the low-dimensional representation of the target is passed through a decoder to obtain a grayscale image of the target surface morphology;

质量评估单元540用于将所述目标表面形貌灰度图输入图像质量综合评价模块，用以评价所述目标表面形貌灰度图的保真性。The quality evaluation unit 540 is used to input the target surface morphology grayscale image into the image quality comprehensive evaluation module to evaluate the fidelity of the target surface morphology grayscale image.

对于本发明实施例提供的基于多模态图像生成的工件表面形貌生成装置而言，由于其基本对应于方法实施例，所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。For the workpiece surface morphology generation device based on multimodal image generation provided by the embodiment of the present invention, since it basically corresponds to the method embodiment, the relevant parts can refer to the partial description of the method embodiment. The device embodiment described above is only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the present invention. Those of ordinary skill in the art can understand and implement it without paying creative labor.

基于同样的发明构思，实施例还提供了一种基于多模态图像生成的工件表面形貌生成设备，如图6所示，包括存储器和处理器，其中，存储器用于存储计算机程序，处理器用于当执行所述计算机程序时，实现上述基于多模态图像生成的工件表面形貌生成方法。Based on the same inventive concept, an embodiment also provides a workpiece surface morphology generation device based on multimodal image generation, as shown in Figure 6, including a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to implement the above-mentioned workpiece surface morphology generation method based on multimodal image generation when executing the computer program.

本发明实施例提出的基于多模态图像生成的工件表面形貌生成设备可以为诸如计算机等设备。设备实施例能够通过软件实现，也能够通过硬件或者软硬件结合的方式实现。以软件实现为例，是通过其所在任意具备数据处理能力的设备的处理器，将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言，图6是本发明实施例提供的基于多模态图像生成的工件表面形貌生成设备的结构示意图，除了图6所示的处理器、内存、网络接口、以及非易失性存储器之外，本发明实施例提供的基于多模态图像生成的工件表面形貌生成设备通常根据该任意具备数据处理能力的设备的实际功能，还可以包括其他硬件，对此不再赘述。The workpiece surface morphology generation device based on multimodal image generation proposed in the embodiment of the present invention can be a device such as a computer. The device embodiment can be implemented by software, and can also be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, it is formed by the processor of any device with data processing capability in which it is located, reading the corresponding computer program instructions in the non-volatile memory into the internal memory for execution. From a hardware perspective, Figure 6 is a schematic diagram of the structure of the workpiece surface morphology generation device based on multimodal image generation provided in the embodiment of the present invention. In addition to the processor, memory, network interface, and non-volatile memory shown in Figure 6, the workpiece surface morphology generation device based on multimodal image generation provided in the embodiment of the present invention is usually based on the arbitrary The actual functions of the device with data processing capabilities may also include other hardware, which will not be described in detail.

基于同样的发明构思，实施例还提供了一种计算机可读的存储介质，存储介质上存储有计算机程序，计算机程序使用计算机时，实现上述基于多模态图像生成的工件表面形貌生成方法。Based on the same inventive concept, an embodiment further provides a computer-readable storage medium, on which a computer program is stored. When the computer program is used by a computer, the above-mentioned method for generating workpiece surface morphology based on multimodal image generation is implemented.

所述计算机可读的存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元，例如硬盘或内存。所述计算机可读的存储介质也可以是风力发电机的外部存储设备，例如所述设备上配备的插接式硬盘、智能存储卡(Smart Media Card，SMC)、SD卡、闪存卡(Flash Card)等。进一步的，所述计算机可读的存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读的存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数据，还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of any device with data processing capability described in any of the aforementioned embodiments, such as a hard disk or a memory. The computer-readable storage medium may also be an external storage device of a wind turbine, such as a plug-in hard disk, a smart media card (SMC), an SD card, a flash card, etc. equipped on the device. Furthermore, the computer-readable storage medium may also include both an internal storage unit and an external storage device of any device with data processing capability. The computer-readable storage medium is used to store the computer program and other programs and data required by any device with data processing capability, and may also be used to temporarily store data that has been output or is to be output.

需要说明的是，上述实施例提供的基于多模态图像生成的工件表面形貌生成装置、基于多模态图像生成的工件表面形貌生成设备和计算机可读的存储介质，均与基于多模态图像生成的工件表面形貌生成方法实施例属于同一构思，其具体实现过程详见基于多模态图像生成的工件表面形貌生成方法实施例，这里不再赘述。It should be noted that the workpiece surface morphology generation device based on multimodal image generation, the workpiece surface morphology generation equipment based on multimodal image generation and the computer-readable storage medium provided in the above-mentioned embodiments all belong to the same concept as the workpiece surface morphology generation method embodiment based on multimodal image generation. The specific implementation process is detailed in the workpiece surface morphology generation method embodiment based on multimodal image generation, which will not be repeated here.

以上所述，仅为本发明的优选实施案例，并非对本发明做任何形式上的限制。虽然前文对本发明的实施过程进行了详细说明，对于熟悉本领域的人员来说，其依然可以对前述各实例记载的技术方案进行修改，或者对其中部分技术特征进行同等替换。凡在本发明精神和原则之内所做修改、同等替换等，均应包含在本发明的保护范围之内。 The above is only a preferred implementation case of the present invention and does not limit the present invention in any form. Although the implementation process of the present invention is described in detail above, for those familiar with the art, they can still modify the technical solutions recorded in the above examples, or replace some of the technical features therein with equivalents. All modifications, equivalent replacements, etc. made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

A method for generating workpiece surface morphology based on multimodal image generation, characterized in that it comprises the following steps:

Step 1: Collect surface morphology images and processing signal spectra of different processing methods and annotate them to obtain multimodal information, and process the multimodal information to obtain guidance vectors;

Step 2: compress the grayscale image corresponding to the surface morphology image into a first low-dimensional representation through an encoder, input it into the diffusion model, add noise to the first low-dimensional representation layer by layer through a forward diffusion process to obtain a noise vector, restore the second low-dimensional representation through layer-by-layer denoising based on the guide vector, time step and the noise vector through a reverse diffusion process, and train the diffusion model;

Step 3: Extract the target multimodal information during application to construct a target guidance vector, input the latent variable of the randomly generated Gaussian noise, the time step and the target guidance vector into the trained diffusion model, obtain the target low-dimensional representation through the reverse diffusion process, and obtain the target surface morphology grayscale image through the decoder;

Step 4: Input the target surface morphology grayscale image into the image quality comprehensive evaluation module to evaluate the fidelity of the target surface morphology grayscale image.

The method for generating workpiece surface topography based on multimodal image generation according to claim 1 is characterized in that the surface topography image and the processing signal spectrum diagram are annotated to obtain multimodal information, including:

Annotating the surface topography image to obtain corresponding text information, wherein the text information includes a processing method, a feed rate, a workpiece material, a geometric shape of a tool, and vibration between the tool and the workpiece;

Annotating the processed signal spectrum diagram to obtain a corresponding processed spectrum signal;

The multimodal information includes text information and processed spectrum signals.

The method for generating workpiece surface morphology based on multimodal image generation according to claim 2 is characterized in that the processing of multimodal information to obtain a guidance vector comprises:

The text information and the processed spectrum signal are converted into representation forms and cascaded through a text encoder and a spectrum signal encoder respectively, and the obtained embedded feature vector is used as a guide vector. The text encoder and the spectrum signal encoder adopt a comparative language-image pre-training model CLIP.

The method for generating workpiece surface morphology based on multimodal image generation according to claim 1 is characterized in that the encoder and the decoder form a variational autoencoder.

According to the method for generating workpiece surface morphology based on multimodal image generation according to claim 1, it is characterized in that the reverse diffusion process adopts a Unet noise estimation network based on a cross-attention mechanism, and the Unet noise estimation network is used to generate estimated noise, and the estimated noise is used for denoising at each time step.

The workpiece surface morphology generation method based on multimodal image generation according to claim 1 is characterized in that the image quality comprehensive evaluation module comprises a high-dimensional semantic feature extractor, a low-dimensional deformation feature extractor and a regression model;

The target surface morphology grayscale image is respectively subjected to a high-dimensional semantic feature extractor and a low-dimensional deformation feature extractor to extract semantic features and deformation features. The semantic features and deformation features are input into the regression model after feature fusion. The quality score is predicted by the logistic regression of the regression model. The quality score is used to evaluate the fidelity of the target surface morphology grayscale image.

According to the method for generating workpiece surface morphology based on multimodal image generation according to claim 6, it is characterized in that the high-dimensional semantic feature extractor includes a pre-trained EfficientNetV2 network, and the low-dimensional deformation feature extractor includes a pre-trained VGG16 network.

A workpiece surface morphology generation device based on multimodal image generation, characterized in that it includes a guidance vector construction unit, a model training unit, a model application unit, and a quality assessment unit;

The guiding vector construction unit is used to collect surface morphology images and processing signal spectrum diagrams of different processing methods and annotate them to obtain multimodal information, and process the multimodal information to obtain the guiding vector;

The model training unit is used to compress the grayscale image corresponding to the surface morphology image into a first low-dimensional representation through an encoder, input it into the diffusion model, add noise to the first low-dimensional representation layer by layer through a forward diffusion process to obtain a noise vector, restore the second low-dimensional representation through layer-by-layer noise reduction through a reverse diffusion process based on the guide vector, the time step and the noise vector, and train the diffusion model;

The model application unit is used to extract the target multimodal information during application to construct a target guidance vector, input the hidden variable of the randomly generated Gaussian noise, the time step and the target guidance vector into the trained diffusion model, obtain the target low-dimensional representation through the reverse diffusion process, and obtain the target surface morphology grayscale image through the decoder;

The quality assessment unit is used to input the target surface morphology grayscale image into the image quality comprehensive assessment module to evaluate the fidelity of the target surface morphology grayscale image.

A workpiece surface morphology generation device based on multimodal image generation, comprising a memory and a processor, wherein the memory is used to store a computer program, and wherein the processor is used to implement the workpiece surface morphology generation method based on multimodal image generation according to any one of claims 1 to 7 when executing the computer program.

A computer-readable storage medium having a computer program stored thereon, characterized in that the computer program, when used with a computer, implements the method for generating workpiece surface morphology based on multimodal image generation as described in any one of claims 1 to 7.