CN120525619A

CN120525619A - Virtual fitting method and device, storage medium and electronic device

Info

Publication number: CN120525619A
Application number: CN202511033383.7A
Authority: CN
Inventors: 周俊熙; 沈建雄; 冯诚; 田魁; 龚宇; 柳梦丽; 刘繁
Original assignee: Xiaomang E Commerce Co ltd
Current assignee: Xiaomang E Commerce Co ltd
Priority date: 2025-07-25
Filing date: 2025-07-25
Publication date: 2025-08-22
Anticipated expiration: 2045-07-25
Also published as: CN120525619B

Abstract

The present application discloses a virtual fitting method and device, storage medium and electronic equipment, which are applied to the field of computer image processing technology, including: processing a target video to obtain human posture data of each video frame in the target video; using the human posture data of each video frame to construct a 3D human body model of each video frame; for the 3D human body model of each video frame, obtaining clothing rendering information of the 3D human body model, rendering virtual clothing on the 3D human body model based on a dynamic fitting algorithm and clothing rendering information, and obtaining a virtual rendering video frame of the video frame; generating an AR fitting video based on each virtual rendering video frame, and displaying the AR fitting video to the user. In this way, the selected clothing can be tried on in the form of AR, showing the user the effect of trying on the clothing, solving the problem of inappropriate size caused by the inability to try on clothing purchased online, reducing the probability of returns and exchanges, and providing users with a good shopping experience.

Description

Virtual fitting method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of computer image processing technologies, and in particular, to a virtual fitting method and apparatus, a storage medium, and an electronic device.

Background

With the popularization of internet technology and the iterative upgrade of mobile terminal equipment, the global electronic commerce market has shown explosive growth. Clothing is used as an important product in the retail industry, and has become the core competition field of an e-commerce platform due to the characteristics of low standardization degree, strong personalized demand, high consumption frequency and the like.

According to statistics, the global clothing electronic commerce market scale breaks through trillion dollars, on-line shopping breaks through region and time limitation, and shopping habits of consumers are remodeled through massive commodity display, accurate marketing recommendation and a convenient payment system.

However, the traditional online clothes purchasing mode cannot provide fitting experience for consumers, so that the problems of unsuitable size, unsatisfactory style and the like of the online clothes purchased by the consumers are very easy to occur, and the problems of high return goods rate, poor consumption experience of the consumers and the like are caused.

Disclosure of Invention

In view of the above, embodiments of the present application provide a virtual fitting method and apparatus, a storage medium, and an electronic device, by applying the solution provided by the present application, a user may try on a selected garment in an AR manner, so as to view a try-on effect of the garment, and the user may determine whether to purchase the garment through the effect, thereby effectively solving the problem that online purchase of the garment cannot be tried on, reducing the probability of online purchase of a return product of the garment, and providing a good shopping experience for the user.

In order to achieve the above object, the embodiment of the present application provides the following technical solutions:

the first aspect of the application discloses a virtual fitting method, comprising the following steps:

Acquiring a target video containing a user to be fitted;

Performing human body posture detection and human body posture tracking processing on the target video to acquire human body posture data of the user in each video frame of the target video;

For each video frame, determining human body model parameters of multiple dimensions based on human body posture data of the video frame, and constructing a 3D human body model of the user in the video frame by using each human body model parameter;

obtaining virtual clothes of target clothes from a preset clothes model library, wherein the target clothes are clothes to be tried on selected by a user;

For the 3D human body model of each video frame, acquiring clothing rendering information of the 3D human body model, and rendering the virtual clothing on the 3D human body model based on a preset dynamic fitting algorithm and the clothing rendering information to obtain a virtual rendering video frame corresponding to the video frame;

and generating AR fitting videos based on the virtual rendering video frames, and displaying the AR fitting videos to the user.

A second aspect of the present application discloses a virtual fitting device comprising:

the first acquisition unit is used for acquiring a target video containing a user to be fitted;

The second acquisition unit is used for carrying out human body posture detection and human body posture tracking processing on the target video and acquiring human body posture data of the user in each video frame of the target video;

a building unit, configured to determine, for each of the video frames, a plurality of dimension mannequin parameters based on the mannequin data of the video frame, and build a 3D mannequin of the user in the video frame using each of the mannequin parameters;

the third acquisition unit is used for acquiring virtual clothes of target clothes from a preset clothes model library, wherein the target clothes are clothes to be tried on selected by the user;

The rendering unit is used for acquiring clothing rendering information of the 3D human body model for each 3D human body model of the video frame, and rendering the virtual clothing on the 3D human body model based on a preset dynamic fitting algorithm and the clothing rendering information to obtain a virtual rendering video frame corresponding to the video frame;

And the display unit is used for generating AR fitting videos based on the virtual rendering video frames and displaying the AR fitting videos to the user.

A third aspect of the present application discloses a storage medium comprising stored instructions, wherein the instructions, when executed, control a device in which the storage medium is located to perform a virtual fitting method as described above.

A fourth aspect of the application discloses an electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to implement a virtual fitting method as described above.

Compared with the prior art, the application has the following advantages:

The application discloses a virtual fitting method, a virtual fitting device, a storage medium and electronic equipment, which comprise the steps of processing target videos of users to be fitted, obtaining human body posture data of each video frame in the target videos, constructing a 3D human body model of each video frame by using the human body posture data of each video frame, obtaining clothing rendering information of the 3D human body model for the 3D human body model of each video frame, rendering virtual clothing on the 3D human body model based on a dynamic fitting algorithm and the clothing rendering information to obtain virtual rendering video frames of the video frames, generating AR fitting videos based on each virtual rendering video frame, and displaying the AR fitting videos to the users. By the method, the selected clothes can be tried on in the AR mode, the try-on effect of the clothes is shown for the user, the problems that the clothes cannot be tried on after being purchased online, the size is unsuitable, the style is not overlapped and the like are effectively solved, the probability of returning goods of consumers is reduced, and good shopping experience is provided for the user.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a virtual fitting method according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for acquiring a target video including a user to be dressed according to an embodiment of the present application;

FIG. 3 is a flowchart of rendering virtual clothes on a 3D human model based on a preset dynamic fitting algorithm and clothes rendering information to obtain a virtual rendered video frame corresponding to a video frame according to an embodiment of the present application;

FIG. 4 is a flowchart of another virtual fitting method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a virtual fitting device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the present disclosure, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The present application may be used in an intelligent video dynamic fitting system, or an AR fitting system, built from a number of general or special purpose computing device environments or configurations.

Referring to fig. 1, a flowchart of a virtual fitting method according to an embodiment of the present application is specifically described below:

s101, acquiring a target video containing a user to be fitted.

In the embodiment provided by the application, the target video can be a video after preprocessing, and the preprocessing process comprises scene self-adaptive denoising, fitting scene special color correction, fitting optimization resolution standardization, human motion perception frame rate stabilization, fitting enhancement preprocessing and other aspects of processing.

Referring to fig. 2, a flowchart of a method for acquiring a target video including a user to be dressed according to an embodiment of the present application is specifically described below:

S201, acquiring an initial video containing a user to be fitted.

The initial video can be a video recorded in real time or a video file uploaded by a user.

S202, determining scene complexity in an initial video, and adjusting preset filtering parameters of denoising filtering based on the scene complexity.

The preset denoising filter can be a filter combining improved bilateral filter and Gaussian filter, and the filtering parameters of the denoising filter are dynamically adjusted according to the impetuous degrees of the scene.

S203, processing the initial video by using the denoising filter after the parameter adjustment to obtain a first video.

When the initial video is processed by using the denoising filtering after the parameters are adjusted, the initial video is analyzed through the pixel neighborhood, filtering with different intensities is applied to the human body area and the background area in the initial video to ensure the definition of the human body outline, and for the fitting scene of the initial video, the noise points are removed while the human body edge details in the video are maintained, and the processing can be performed by adopting the self-adaptive window size (5 multiplied by 5 to 9 multiplied by 9) and sigma value (0.6-1.2).

Therefore, the scene self-adaptive denoising processing of the initial video is completed, and the first video is further obtained.

S204, correcting the scene main tone in the first video by using a preset color correction strategy to obtain a corrected second video.

The process of correcting the scene main tone in the first video by using the color correction strategy can comprise the steps of automatically detecting and correcting the color temperature difference according to the scene main tone, enhancing the local contrast in the video, thereby improving the distinguishing degree of the clothing edge and the human body outline, and when the distinguishing degree of the clothing edge and the human body outline is improved, the dynamic adjustment range of the adjustment weight gamma is 0.85-1.15, and the video can be subjected to color standardization processing, thereby ensuring the consistent performance of clothing colors in the video under different illumination conditions, thereby completing the correction of the special colors of the fitting scene in the video, thereby completing the processing of the special color correction of the fitting scene of the first video, and further obtaining the second video.

S205, performing resolution optimization processing on the second video by using a preset resolution optimization strategy to obtain a third video.

The method for optimizing the resolution of the second video comprises the steps of dynamically adjusting the optimal resolution according to the duty ratio of a human body in a picture to avoid information loss caused by fixed resolution, realizing non-uniform scaling of region perception, keeping higher resolution for important regions of the human body, moderately reducing the resolution of background regions, applying an edge-keeping interpolation algorithm to ensure that garment edges and texture details are not blurred in resolution conversion, and completing fitting optimization resolution standardization processing of the second video to obtain a third video.

S206, performing frame rate stabilization processing on the third video by using a preset frame rate stabilization strategy, and performing fitting enhancement processing on the processed video based on the preset fitting enhancement strategy to obtain a target video.

The process of carrying out frame rate stabilization processing on the third video by using a preset frame rate stabilization strategy comprises the steps of developing a self-adaptive frame rate control algorithm based on the human body movement speed, based on the algorithm, increasing the sampling rate when a user in the video acts rapidly, reducing the frame rate when the user is in a static state or acts slowly, realizing an intelligent frame inserting technology assisted by movement prediction, carrying out key calculation on clothing folds and waving areas in the video, using an inter-frame consistency maintenance mechanism, ensuring the time consistency during subsequent clothing rendering, and inhibiting flickering phenomenon, thereby completing the processing of the human body movement perception frame rate, and further obtaining the processed video.

The fitting enhancement processing is carried out on the processed video, and concretely comprises the steps of carrying out enhancement processing on clothing textures of a user in the video, improving the identification degree of cloth textures and patterns, carrying out sharpening processing on human body contours in the video, enhancing the accuracy of subsequent gesture recognition, adopting directional sharpening operation, setting up a clothing region interest map (ROI map) on a human image of the user in the video, and providing priority guidance for subsequent processing, thereby completing fitting enhancement preprocessing, and obtaining a target video.

In the embodiment provided by the embodiment of the application, the accuracy of the subsequent human body contour recognition is improved by preprocessing the video, and the accuracy of the human body contour recognition is also improved under the conditions of complex background and low illumination, in addition to the above, the texture detail retention of the clothing in the video is improved, the error rate of subsequent gesture detection is effectively reduced, subsequent processing delay is also reduced, the real-time processing capacity of equipment is improved, the overall stability of the system is improved, and the abnormal frame rate of the system is reduced.

Compared with the traditional common video preprocessing, the preprocessing method provided by the application has the advantages that the human body contour recognition accuracy is higher, the texture detail retention degree is higher, the gesture detection error rate is lower, the processing delay is less, the abnormal frame rate is less, and the performance of the whole system is stronger.

S102, human body posture detection and human body posture tracking processing are carried out on the target video, and human body posture data of a user in each video frame of the target video are obtained.

In the embodiment of the application, for each video frame of a target video, a preset gesture detection module is used for detecting human body key points of human body images of a user to obtain human body skeleton data of the video frame, a preset multi-person separation network is used for dividing human body examples of the video frame to determine human body examples of the user in the video frame, bone key point data of the user is determined in the human body skeleton data based on the human body examples, relevance of the user between the video frame and adjacent video frames is established, and motion tracks of all bone key points of the user between the video frame and the adjacent video frames are subjected to smoothing processing to obtain the human body gesture data of the video frame.

The human body posture data of the video frames comprises, but is not limited to, information of each skeleton key point of the user, association information of the user between the video frames and between the connected video frames, and motion track information of each skeleton key point of the user between the video frames and between the adjacent video frames.

The preset gesture detection module comprises a self-grinding dual-path hierarchical gesture estimation network (DP-HPENet), which is integrated with a lightweight improved ResNet-50 and transducer mixed architecture and can identify 17-25 skeletal key points including head, shoulder, elbow, wrist, hip, knee, ankle and the like. The system not only efficiently identifies the spatial positions of the skeletal key points, but also evaluates the reliability of each point position through a key point confidence self-calibration mechanism, thereby remarkably improving the detection stability in a rapid motion scene. The DP-HPENet can be adopted to detect the key points of the human body for each video frame, and the size of the receptive field is dynamically adjusted by combining with the self-adaptive space-time attention module, so that the detection accuracy is improved, and the calculated amount in the detection process is reduced.

In another embodiment provided by the application, the gesture detection module further comprises a layering occlusion inference module (ST-GCN-HOI) based on a space-time diagram convolution network, wherein the layering occlusion inference module can accurately distinguish three conditions of self occlusion, occlusion of others and object occlusion of a scene, infer the positions of the occluded skeleton key points, infer the positions of the skeleton key points under a severe occlusion scene by adopting a differentiation inference strategy, and can improve the accuracy of inference results.

When a plurality of persons appear in the video, the obtained human skeleton data includes skeleton data of the plurality of persons. Human body example segmentation is needed to be carried out on the video frame by using a multi-person separation network, so that human body examples of users in the video frame are determined, then bone key point data corresponding to the human body examples of the users are determined in human body bone data, and the bone key point data comprises position information of a plurality of bone key points of the users.

It should be noted that, the multi-person separation network merges boundary perception, which may be referred to as a boundary perception adaptive multi-person separation network (BA-MPSNet), by using the network, high-precision human body example segmentation can be realized, and human body example segmentation is a crucial link in an AR fitting system, which is not only a basis for determining a target person (i.e. a user), but also a precondition for accurate fitting of a subsequent virtual garment.

The BA-MPSNet (boundary-aware self-adaptive multi-person separation network) adopts an improved example segmentation architecture to carry out high-precision human body segmentation, wherein the example segmentation architecture comprises an edge-aware mechanism, a gesture-guided segmentation attention mechanism, self-adaptive resolution processing, semantic enhancement segmentation and other contents.

The edge perception mechanism is introduced with a specific edge enhancement module, the detection capability of human body contour boundaries is enhanced through multi-stage gradient feature extraction, the boundary accuracy is improved by 15.3% compared with that of a conventional method, the segmentation attention mechanism guided by gestures guides the segmentation process by utilizing previously detected bone information, a bone-segmentation joint optimization cycle is formed, the segmentation result is highly consistent with the human body skeleton structure, computing resources are dynamically allocated according to the complexity of a human body region in self-adaptive resolution processing, high-resolution processing is adopted in a detail complex region (such as hair, fingers and the like), and low-resolution processing is adopted in a flat region, different regions such as clothing and skin can be identified in semantic enhancement segmentation, and semantic information is provided for the accurate fitting of subsequent virtual clothing.

By means of human body example segmentation, different individuals can be accurately distinguished in a multi-person scene, dislocation of fitting effects is avoided or the method is applied to non-target persons, accurate human body boundary information is provided for virtual clothes, the clothes can be attached to human bodies naturally, a system is helped to understand shielding relations among all parts of the human bodies, correct rendering of the virtual clothes can be guaranteed under shielding conditions, the position and the gesture of the human bodies in a 3D space can be more accurately understood by the system through combination of segmentation results and depth estimation, more accurate hand segmentation results are provided for subsequent gesture recognition, interaction experience of users and the virtual clothes is improved, and through high-quality human body example segmentation, the system can still maintain accurate fitting effects under complex backgrounds and multi-person scenes, and is an advantage which cannot be achieved by a traditional virtual fitting system based on single static images.

It should be noted that, when determining the user to be dressed (which may be regarded as the target user), it may be determined through the interaction between the system and the user, and when the user opens the AR dressing application, the user may designate the target user through gesture selection (such as pointing to a specific person), voice command, or touch screen operation, which is the most basic recognition manner, and is suitable for the scenario that the user actively selects.

Secondly, aiming at a complex scene, the system introduces a boundary-aware self-adaptive multi-person separation network (BA-MPSNet) to carry out high-precision human body instance segmentation, and determines a target person by combining position priority analysis. The BA-MPSNet improves the accuracy of human body contours through the integrated edge enhancement module, and different individuals can still be accurately distinguished in a complex scene with overlapped character boundaries. Specifically, the system determines according to at least one priority rule that (1) the center region is prioritized, wherein characters in the center region of the screen are usually determined as target users, (2) the size is prioritized, wherein characters occupying a larger proportion of the screen are prioritized, and (3) the foreground is prioritized, wherein the foreground is prioritized, based on depth estimation, as the target users, over the background.

The application can also keep the ID tracking of the target person, and avoid the target switching in the multi-person crossing scene. During the continuous interaction, the system keeps continuous identification of the target person through a time sequence memory mechanism. Even if the target person is temporarily blocked or re-entered after leaving the screen, the system can re-identify the same person based on previously established feature descriptors (including clothing texture, body type features, facial features, etc.), enabling continuity of the fitting experience.

The time sequence tracking of the user is realized by establishing the relevance between the video frames and adjacent video frames, the specific process can be that the inter-frame character relevance is established based on an adaptive key point relevance framework (MF-AKA) of multi-dimensional feature fusion, the ID retention rate under a complex scene can be improved through dynamic feature weight distribution and a double-stage relevance strategy, the ID switching rate is reduced by 31.4%, and meanwhile, the multi-granularity motion prediction can be carried out by applying a hierarchical skeleton constraint Kalman filtering tracking framework (HS-KF).

It should be noted that, the dynamic feature weight distribution is one of the core innovations of MF-AKA (adaptive key point association framework of multidimensional feature fusion), which has a close association with bone key points, and the dynamic feature weight distribution mechanism of the system adaptively adjusts the weight proportion of each dimension feature for different types of bone key points and motion states based on real-time scene analysis. The specific implementation process is as follows:

1) The system firstly extracts multidimensional feature vectors from each key point, wherein the multidimensional feature vectors comprise, but are not limited to, spatial position features (x, y coordinates and depth estimation), appearance features (visual descriptors of areas around the key point), motion features (speed and acceleration information of the key point), structural features (relative position relation between the key point and other key points), time sequence features (motion track features of the key point in the previous frames);

2) The scene self-adaptive analysis, wherein the system evaluates the characteristics of the current scene in real time;

3) Detecting whether the human body moves fast or slowly;

4) Evaluating whether the key points are in a shielding state or not;

5) Judging whether the ambient light is stable or not;

6) Detecting whether the visual angle of the camera changes obviously;

7) The specific weight distribution of key points comprises different weight strategies for different bone key points, such as stabilizing key points (such as shoulders and hips), increasing structural feature weights, rapidly moving key points (such as wrists and ankles), increasing movement feature weights, and easily shielding key points (such as elbows and lateral view angles), and increasing appearance feature weights;

8) The system adopts an iterative weighted learning algorithm to continuously optimize a weight distribution strategy according to the success rate and the prediction error of skeleton key point tracking, and specifically comprises the following steps:

;

wherein, the The weight of the characteristic f at the moment t is represented, alpha and beta are learning rate parameters, and specific numerical values can be set according to actual requirements; the tracking success rate is represented, namely the ratio of the feature f to the successfully associated key points in the tracking process; the prediction error refers to the amount of positional or state deviation when predicting using the feature f.

In particular, the method comprises the steps of,For measuring the effectiveness of the feature in maintaining the identity continuity of the human skeleton key points, the higher the value is, the more valuable the feature is for stably tracking specific key points (such as wrists, elbows and the like), and in practical calculation, the number of times that the key points are correctly associated between continuous frames is generally divided by the total tracking frame number; For measuring the accuracy of the keypoint prediction based on the feature, a lower value indicates that the feature is able to more accurately predict the future location of the keypoint, typically in terms of pixel distance or normalized distance.

The relevance between the dynamic feature weight distribution and the key points is that the system can respectively set an optimal weight strategy for the characteristics (such as joint type, motion characteristics and shielding frequency) of each key point. For example, for a key point of high-speed motion such as the wrist, the system may dynamically increase the weight of the motion feature, while for a relatively stable key point such as the head, the weight of the appearance feature may be increased. Through the dynamic adjustment of the feature weight, the system can still maintain stable key point tracking performance under a complex scene, and effectively reduces the ID switching rate by 31.4%, which is important for ensuring the continuity of fitting experience.

The application adopts a multi-level association strategy, namely adopts a coarse-to-fine three-layer association method, and specifically comprises the following steps:

1) Firstly, establishing a corresponding relation between character examples of a front frame and a rear frame on an overall level;

2) Body part association, namely dividing a human body into main parts such as a head part, a trunk part, four limbs and the like for sub-area association;

3) Fine association of key points, namely, finally, accurately associating each key point and establishing a complete skeleton corresponding relation;

4) The method comprises the steps of matching space-time consistency, short-term association, mid-term association, long-term identity descriptor establishment based on character appearance characteristics, long-term tracking and re-identification support, wherein the short-term association is to predict the possible position of a key point in the next frame through Kalman filtering, and establish a matching cost matrix;

5) The two-stage association decision comprises a first stage of rapidly associating high-confidence matching pairs by using a greedy matching algorithm, a second stage of introducing a multi-hypothesis tracking (MHT) algorithm for fuzzy matching, and delaying the decision until enough evidence is collected;

6) The conflict resolution mechanism is used for identifying a physically impossible matching result based on conflict detection of skeleton structure constraint, optimizing a correlation result in a global range by using a Hungary algorithm so as to realize global optimization solution;

7) The method comprises the steps of carrying out shielding and re-identification treatment, namely carrying out shielding prediction, namely starting appearance descriptor memory in advance by a system when potential shielding is detected, carrying out tracking in shielding, namely carrying out deduction of possible position re-identification on shielded key points based on structural constraint, and carrying out joint judgment on appearance matching and motion consistency when a person re-appears;

8) And a time sequence memory mechanism, which is used for maintaining a cached character feature library containing all character features appearing recently, applying a sliding window strategy, balancing real-time performance and stability, and continuously updating character feature representations according to new observation results.

By the method, the system can accurately track the target person in a complex multi-person scene, and can keep stable ID association even under the conditions of partial shielding, person crossing or short-time leaving of the field of view, so as to provide continuous and consistent human skeleton data for AR fitting. This high-precision inter-frame character association is the basis for achieving a natural and smooth dynamic fitting experience.

The core purpose of multi-granularity motion prediction is to improve the accurate understanding and prediction capability of the system on human body motion, and particularly to keep stable tracking performance under different time scales and motion complexity. In the system, the technology is realized through a hierarchical skeletal constraint Kalman filtering tracking framework (HS-KF), and the method has the main advantages that 1) the method is adaptive to different motion types, and multi-granularity prediction can simultaneously process a plurality of motion types, wherein the plurality of motion types comprise, but are not limited to, slow continuous motion (such as standing and walking slowly), sudden rapid motion (such as turning body and waving hands), periodic motion (such as walking and running cyclic motion) and aperiodic complex motion (such as various posture changes when trying on clothes); 2) tracking stability is improved, tracking loss caused by sudden movement is reduced by comprehensively analyzing movement modes of different time scales, the tracking stability is improved by 43.2%, interference factors such as camera shake and the like are met, stability of a framework structure is kept, accurate movement prediction is still kept under the condition of low frame rate, network fluctuation is adapted, 3) physical simulation fidelity is improved, accurate movement prediction is crucial to physical simulation of clothing, future positions of key points can be accurately predicted, virtual clothing can more naturally follow movement of a human body, movement acceleration change is predicted, inertia effects such as waving and fold change of clothing are accurately simulated, fine body state change is captured, finer interaction between clothing and the human body is realized, 4) computing resource optimization is realized, multi-granularity prediction allows the system to allocate computing resources at different levels, for example, important key points (such as joints) are predicted by using a high-precision multi-model, integral model prediction is used in a secondary area, overall computing burden is reduced, computing resources are dynamically allocated according to prediction difficulty, and the response speed of the system is improved.

The multi-granularity motion measurement can divide a human skeleton into a plurality of layers such as a global position, a body trunk, limbs and the like, wherein different prediction models are adopted for each layer, the overall displacement and rotation are predicted for the prediction model of the global position layer, the large-range motion is captured, the stability of a trunk structure is maintained for the prediction model of the body trunk layer, the posture change is processed, the flexible and changeable arm and leg motions are processed for the prediction model of the limb layer, and the fine motions such as fingers, faces and the like are captured for the prediction model of the detail layer. It should be noted that different motion states also use different data models, such as a linear model for prediction in uniform motion, an acceleration model for prediction in variable motion, a periodic model for identifying and predicting repetitive motion, and a data driving model for learning complex motion modes based on historical data, further, multi-model fusion can be used for prediction, such as prediction models of different skeleton levels and corresponding data processing models, when multi-model fusion prediction is performed, human biomechanical constraint is introduced to ensure that the prediction result accords with a physical rule, and joint angle limitation prevents unnatural gestures from being predicted, speed and acceleration constraint is increased, so that motion accords with the motion capability range of a human body, and skeletal length fixing constraint is increased to maintain the consistency of a human body structure.

By using multi-granularity motion prediction, the system can realize accurate human motion understanding in a complex dynamic scene, provide high-quality bone motion data for an AR fitting system, realize natural coordination of virtual clothing and real human motion, and greatly improve the sense of reality and user experience of dynamic fitting.

In the embodiment provided by the application, when the motion trail of each skeleton key point of a user between a video frame and an adjacent video frame is subjected to smoothing processing, the trail of each skeleton key point can be intelligently processed by using a motion perception self-adaptive smoothing algorithm (MA-ASA), smoothing parameters are dynamically adjusted according to different motion states, key acceleration information is reserved by combining second derivative constraint, and the detail retention rate is kept to be improved while the jitter effect is restrained from being improved.

In the embodiments provided herein, the trajectories of skeletal keys may be obtained by a multi-stage process, such as:

1) Initial track construction

Through the DP-HPENet network, the system identifies the spatial positions of 17-25 human body key points in each frame of video, establishes the corresponding relation of each skeleton key point among frames by utilizing a time sequence tracking module (MF-AKA frame), ensures the identity consistency of the same skeleton key point among different frames, and constructs an initial track sequence based on the corresponding skeleton key points in the continuous frames, wherein,Represent the firstSpatial location of keypoints in frames.

2) Track completion and correction

For track loss caused by shielding or detection failure, the system uses an ST-GCN-HOI shielding inference system to carry out position estimation, corrects a low confidence detection result by using a space-time diagram convolution network to improve track continuity, and adopts a sliding window interpolation method to process short-time loss, and the long-time loss is inferred by combining framework structure constraint.

3) Multi-scale trajectory representation

The multi-scale track representation comprises a micro track, a macro track and a structured track, wherein the micro track can capture high-frequency and small-amplitude key point motions, is suitable for fine gesture recognition, the macro track focuses on low-frequency and large-amplitude overall motion trends for gesture understanding, and the structured track can combine a plurality of key point tracks into a structural unit (such as an arm consisting of a shoulder, an elbow and a wrist) and analyze the overall motion of the structure.

4) Trajectory feature extraction

The track features comprise position features, speed features, acceleration features, angle features and cooperative features, wherein the position features are used for recording absolute coordinates of key points in each frame, the speed features are used for calculating displacement changes of the key points between adjacent frames, the acceleration features are used for analyzing the speed changes, capturing the start and end of actions, the angle features are used for calculating joint angle changes and describing gesture changes, and the cooperative features are used for analyzing relative motion relations of a plurality of key points.

5) Data smoothing and enhancement

MA-ASA (motion perception self-adaptive smoothing algorithm) can be applied to process original track data, smooth intensity is dynamically adjusted according to motion intensity, weak smoothing is adopted in a rapid motion section, dynamic details are reserved, strong smoothing is adopted in a stable stage, jitter and noise are effectively restrained, key acceleration information cannot be lost in a smoothing process in combination with second derivative constraint, action characteristics are reserved, and an obtained high-quality key point track is a basis of a plurality of functions.

The human body motion data is provided for physical simulation through the motion trail of the skeleton key points, so that the virtual garment is driven to deform naturally along with the human body, and motion recognition is supported, so that the system can understand specific gestures (such as rotation, arm lifting and the like) of a user, accurate human body motion information is provided for dynamic fitting of the garment, and the system is assisted to predict human body gestures of several frames in the future, so that rendering delay is reduced. Through the flow, the system can acquire stable, continuous and accurate key point track data, and after the data is processed by the MA-ASA algorithm, the dynamic characteristics of human body movement are reserved, detection shake and noise are effectively inhibited, and high-quality human body movement basic data is provided for subsequent AR fitting experience.

The human body posture data of each video frame are obtained through a series of processing, and the human body posture data comprises human body skeleton data, associated information of skeleton key points, track information of the skeleton key points and the like. By applying the scheme provided by the application, the acquired high-quality human body gesture data of each video frame is key information for understanding the activities and interactions of multiple persons, and the system can be ensured to realize stable and accurate multi-person gesture analysis under the condition of limited-frequency video. Through the synergistic effect of the innovative technologies, the overall performance of the system is improved compared with the prior art, and the system has obvious advantages in challenging scenes such as complex environments, multi-person interaction, rapid movement and the like.

S103, for each video frame, based on the human body posture data of the video frame, determining human body model parameters of multiple dimensions, and constructing a 3D human body model of the user in the video frame by using each human body model parameter.

The human body posture data includes, but is not limited to, human body posture information including, but not limited to, the contents of human body posture features such as height, weight, obesity, etc. of the user, and human body posture information including contents for determining the action posture of the human body, such as the position, angle, etc. of each skeletal key point.

In the embodiment provided by the application, for each video frame, the index parameter of each human body index is extracted from human body posture data based on each human body index of a preset deformable human body model, each detail description parameter is extracted from the video frame, and each index parameter and each detail description parameter are determined to be each human body model parameter.

Individual body metrics include, but are not limited to, height, shoulder width, waist circumference, hip circumference, etc., related to the body's appearance. The individual detail description parameters include, but are not limited to, parameters of details of clothing wrinkles, muscle contours, and the like. It should be noted that, the index parameters of the individual human indexes of the different video frames are the same.

In the embodiment provided by the application, the proportion and the visual angle of the characters in the video are analyzed, and the index parameters of each real human index are calculated by combining a depth estimation algorithm. The application calculates index parameters of various human indexes by using a multi-view self-adaptive depth fusion frame, particularly constructs a space-time depth field by fusing human motion information in a video frame sequence, captures depth information from different angles when a user makes different postures, matches the depth information through skeleton key points to realize equivalent multi-view depth reconstruction so as to obtain an initial depth estimation result, and develops a human morphology priori database which comprises more than 100000 accurate 3D scanning data of different body types, and utilizes the database to restrain and optimize the initial depth estimation result so as to solve the problem of common depth jump of the traditional depth estimation at the human body contour.

In the embodiment provided by the application, a highly personalized 3D human body model is generated by adopting a parameterized human body model SMPL (Skinned Multi-Person Linear Model), the SMPL model can generate the 3D human body model by adjusting morphological parameters and posture parameters, and the SMPL model comprises about 6890 vertexes and 13776 faces, which is enough to represent the detailed appearance of the human body, and meanwhile, the calculation efficiency is kept.

In the process of constructing a 3D human body model of a user in a video frame by using each human body model parameter, constructing a basic SMPL model by using each index parameter in each human body model parameter, and then adjusting the surface geometric shape of the basic SMPL model by using each detail description parameter, thereby enhancing the realism of the model, calculating surface normal vectors, curvature, deformation hot spots and other surface attributes in the process of adjusting the characterization geometric shape of the basic SMPL model, adjusting the surface of the basic SMPL model by using the calculated surface attributes, and also ensuring the time sequence consistency of the 3D human body model of each video frame, and avoiding model form mutation by ensuring smooth change of human body model parameters so that the reconstructed 3D human body model naturally changes along with the action of the user.

The reconstructed 3D human body model accurately expresses the body shape and the posture of the user, provides a necessary digital human body basis for the subsequent virtual clothes fitting, and the digital human body not only contains a static shape, but also can dynamically respond to various posture changes of the user in a video, thereby being a key premise for realizing a realistic AR fitting effect.

S104, obtaining virtual clothes of target clothes from a preset clothes model library, wherein the target clothes are clothes to be tried on selected by a user.

And determining virtual clothes of the target clothes in a clothes model base based on the clothes data.

The apparel data of the target apparel includes, but is not limited to, apparel numbers, apparel types, styles, apparel vending merchants, and the like.

The garment model library comprises virtual garments of a plurality of garments, the virtual garments can be regarded as garment models, the virtual garments can be manufactured by professional 3D design software or obtained by carrying out 3D scanning on actual garments, and the virtual garments comprise information such as geometric shapes, materials, textures, physical parameters and the like of the garments.

The virtual garment of the target garment may be looked up in a garment model library using the garment number in the garment data.

S105, for the 3D human body model of each video frame, clothing rendering information of the 3D human body model is obtained, virtual clothing is rendered on the 3D human body model based on a preset dynamic fitting algorithm and the clothing rendering information, and a virtual rendering video frame corresponding to the video frame is obtained.

The garment rendering information includes garment size information, environment information, motion prediction information, visual perception information, augmented reality texture information, and scene parameter information.

Referring to fig. 3, a flowchart of rendering virtual garments on a 3D mannequin based on a preset dynamic fitting algorithm and garment rendering information to obtain virtual rendered video frames corresponding to video frames according to an embodiment of the present application is specifically described below:

S301, determining the clothing type of the virtual clothing based on the clothing size information of the virtual clothing in the clothing rendering information, and rendering the virtual clothing at a position corresponding to the clothing type in the 3D human model according to the clothing size information.

The clothing size information of the virtual clothing includes information describing the type of clothing of the virtual clothing, and the clothing types include, but are not limited to, trousers, overskirt, dress, short sleeves, suit, and the like.

The garment size information also includes size data corresponding to the garment type, for example, when the garment type is a short sleeve, the size data includes but is not limited to shoulder width, sleeve length, chest circumference, waistline, etc., the garment type is a pair of trousers, and the size data includes but is not limited to thigh circumference, trousers length, etc.

Dividing the 3D human body model into 37 functional semantic areas, establishing a nonlinear mapping relation between each functional semantic area and the virtual garment, and then rendering the virtual garment on the 3D human body model. The different positions of the virtual garment correspond to different positions of the 3D mannequin, and the 3D mannequin has positions that do not correspond to the virtual garment, for example, when the garment type is a pair of trousers, the positions corresponding to the 3D mannequin are the legs, the buttocks, and the like.

When the virtual garment is rendered on the 3D human body model, a proper initial rendering position can be determined on the 3D human body model according to the garment type of the virtual garment, then the virtual garment is rendered from the initial rendering position, for example, the initial rendering position of the coat on the 3D human body model is above the shoulder, the initial rendering position of the trousers on the 3D human body model is the waist, and predefined fitting rules and garment semantic information can be used for ensuring the rationality of the initial state.

In the method provided by the embodiment of the application, the self-adaptive intelligent size adjustment system (AISS) can be used for determining the clothing type of the virtual clothing based on the clothing size information of the virtual clothing, the step of rendering the virtual clothing at the position corresponding to the clothing type in the 3D human body model is performed according to the clothing size information, and when the virtual clothing is rendered on the 3D human body model, a layering adjustment strategy can be adopted, wherein the layering adjustment strategy comprises a plurality of layers such as a framework layer (such as shoulder width and sleeve length), a contour layer (such as chest circumference and waistline), a detail layer (such as neckline and fold distribution) and the like, the multilevel size optimization is performed, and the clothing structure is specifically adjusted for identifying and adapting to special body type characteristics (such as special body types including chest, breast stiffness and round shoulder) so as to improve the smoking quality of the virtual clothing.

S302, constructing a directed distance field of the 3D human body model after virtual clothing rendering, wherein the directed distance field is used for describing the spatial relationship between the virtual clothing and the 3D human body model.

In the embodiment provided by the application, the directed distance field (SDF) is a three-dimensional space function for describing the shortest distance between each vertex position of the virtual garment in the three-dimensional space and the surface of the 3D human model, wherein the three-dimensional space comprises the virtual garment and the 3D human model. The directional distance field energy is used for rapidly judging the spatial relationship between the virtual garment and the 3D human body model, and collision detection and fitting calculation are efficiently processed.

In the embodiment provided by the application, a space self-adaptive sampling strategy is introduced into the directed distance field, namely, the sampling density is increased in the area with obvious curvature change of the human body (such as joints and necks), the sampling density is reduced in the flat area, and the balance of precision and efficiency is realized. The specific calculation is expressed as follows:

;

where p is a spatial point in three-dimensional space, q is a point on the human body surface S, sign (p) indicates whether the point p is inside (-1) or outside (+1) the human body. This adaptive SDF technique reduces the computational effort over traditional uniform sampling methods while maintaining the accuracy of critical areas.

S303, based on a preset optimization strategy and a directed distance field, adjusting the rendering of the virtual garment in the 3D human body model until a preset convergence condition is met, and stopping adjusting the virtual garment.

According to the embodiment provided by the application, the optimization strategy is an iterative optimization algorithm, the rendering of each vertex position of the virtual garment in the 3D human body model is adjusted by using the preset optimization strategy and the directed distance field, so that the virtual garment is more attached to the 3D human body model, and the adjustment of the virtual garment is stopped until the preset convergence condition is met.

The vertex positions of the virtual garment refer to coordinates of mesh vertices constituting the garment in a three-dimensional space.

The iterative adjustment process includes:

Step1, firstly, the problem of penetration between a virtual garment and a 3D human body model is solved, a garment part penetrating the 3D human body model is pushed out to the surface of the 3D human body model, then internal constraint of the garment is applied, continuity and structural integrity of the fabric are maintained, and finally external force and internal constraint are balanced, and an energy minimization state is found.

Furthermore, the system adopts an innovative hierarchical optimization strategy to decompose the fitting problem into three sub-problems solved sequentially, specifically, the steps 2 to 4 are adopted.

And 2, aligning global positions.

The whole virtual garment is subjected to rigid body transformation, so that key characteristic points (such as necklines and cuffs) are approximately aligned with corresponding parts of a human body, and a good initial state is provided for local fitting.

And 3, iterating collision response.

The system uses the distance field to rapidly detect the penetration of the clothing and the human body, and applies a ladder to each penetration vertex.

And 4, displacement in the degree direction.

;

Wherein, the The self-adaptive step size parameter is dynamically adjusted according to the penetration depth; a new position for the vertex of the virtual garment; is the vertex original position of the virtual garment; the SDF value for the vertex origin of the virtual garment, The SDF gradient direction that is the home position of the vertex of the virtual garment.

Internal constraint maintenance, namely after each collision response, solving a fabric internal constraint equation by the system:

;

wherein M is a mass matrix, K is a stiffness matrix, d is a deformation vector, Is an external force.

The steps are iteratively executed until reaching a preset convergence condition, wherein the preset convergence condition can be that the maximum displacement is smaller than a threshold value or the maximum iteration times are reached. Compared with the traditional global optimization method, the hierarchical iteration method has the advantages that the convergence speed is improved, and meanwhile, the physical rationality is ensured.

In the embodiment provided by the application, the non-linear elastic model based on tetrahedral finite elements is introduced, so that the physical behavior of the anisotropic fabric under the condition of large deformation can be accurately simulated. The system calculates the strain tensor of the fabric in real time, and when the potential abnormal stretching is detected, the local grid structure and the physical parameters are automatically adjusted, so that the non-physical deformation or tearing is effectively prevented. And also provides an intelligent suture processing framework for implementing suture modeling techniques that automatically generates pairing constraints by identifying geometric features of garment opening structures (front flaps, side seams). By introducing the concept of virtual stitching, semi-rigid connection is used to simulate different stitching strengths and dynamically adjust stitching parameters according to the characteristics of the fabric. For closed structures such as buttons and zippers, a parameterized closed model library is established by the system, and a proper closed state can be automatically selected according to the detected human body posture.

Further, the garment rendering information is applied to adjust the rendering details of the virtual garment rendered on the 3D mannequin, and the specific adjustment process refers to S304-S307.

S304, adjusting the material rendering parameters of the virtual garment on the 3D human model based on the environmental information in the garment rendering information.

In the embodiment provided by the application, the environment information comprises but is not limited to the light information, the light source direction, the intensity, the color temperature and the like of the environment, the environment perception dynamic material system (EADMS) can be used for dynamically adjusting the material rendering parameters of the virtual clothes based on the environment information, so that the virtual clothes can present correct visual effects in different illumination environments, the sense of reality is improved, and the BRDF (bidirectional reflectance distribution function) technology is applied to simulate the changing effects of materials such as silk, embroidery and the like under different viewing angles, so that the problem of unreal materials caused by the change of the angles in AR is solved.

The material rendering parameters of the virtual garment comprise, but are not limited to, reflectivity parameters for controlling the reflection degree of the material on light rays with different wavelengths, highlight parameters for enhancing the highlight effect of materials such as silk and the like in a strong light environment, scattering parameters for adjusting the scattering behavior of the light rays in the fabric to influence the softness of materials such as wool and the like, and shadow casting intensity parameters for adjusting the shadow representation of folds according to the intensity of the environment light.

By adjusting the material rendering parameters of the virtual clothes, the virtual clothes are rendered more naturally, and the AR fitting experience is more real and natural.

S305, based on the motion prediction information and the visual perception information in the garment rendering information, determining grid attribute data of different areas of the virtual garment on the 3D human model, and adjusting grid resources and grid densities of the areas based on the grid attribute data of the different areas.

The motion prediction information comprises information for predicting motion tracks of a user, such as lifting hands of the user, turning over the user and the like, and the visual perception information comprises but is not limited to visual observation information of clothes in the fitting process of the user, such as attention of the user to collars, attention of the user to sleeves and the like.

In the method provided by the embodiment of the application, the virtual garment can be divided into a plurality of areas, and each area has different grid attribute data, for example, the deformation probability of each area is determined according to the action prediction information, the deformation probability of each area belongs to the network attribute information of the area, for example, the visual saliency of each area is determined according to the visual perception information, the visual saliency of each area is used for describing the sight line attention of a user to the area, and the higher the visual saliency is, the higher the sight line attention of the user to the area is, and the lower the sight line attention of the user to the area is otherwise. The visual saliency of an area pertains to the network attribute information of the area.

According to the method, a dynamic interaction grid optimization framework (DIMOF) is used for determining grid attribute data of different areas of virtual clothes on a 3D human body model based on action prediction information and visual perception information, grid resources and grid densities of the areas are adjusted based on the grid attribute data of the different areas, namely the grid densities are increased in the areas with high deformation probability, the grid densities are reduced in the areas with low deformation probability, so that calculation load is reduced, the grid resources are increased in the areas with high visual saliency, so that more details are reserved in the areas with high visual saliency, and the grid resources are reduced in the areas with low visual saliency, so that calculation amount is reduced under the condition of equal visual quality.

S306, based on the augmented reality texture information in the garment rendering information, the texture of the virtual garment on the 3D human model is adjusted.

The present application uses an Augmented Reality Texture Projection System (ARTPS) to adjust the texture of virtual clothing on a 3D mannequin based on the augmented reality texture information. The UV space of the 3D human body model can be automatically redistributed along with the human body action, so that the texture is not stretched or compressed unnaturally during large-amplitude movement, the texture deformation is reduced, the texture resolution is dynamically distributed according to the visual angle distance and the importance, the visual quality is maintained, the occupied texture memory is reduced, and the rendering performance is improved.

The scheme provided by the application can intelligently generate detail effects conforming to physical laws according to the local morphological characteristics (such as shoulder contour and waist curve) of the human body and the physical characteristics of cloth. The technology breaks through the limitation of the traditional detail generation method based on simple geometric transformation, can generate natural wrinkles, undulation and shadow effects on mobile equipment with limited computing resources in real time, and improves the sense of reality.

S307, based on scene parameter information in the garment rendering information, the garment physical simulation parameters of the virtual garment on the 3D human model are adjusted.

In the embodiment provided by the application, the clothing physical simulation parameters of the virtual clothing on the 3D human body model are adjusted through the situation intelligent physical parameter system (CIPPS) based on scene parameter information, wherein the scene parameter information comprises, but is not limited to, description information of detected use scenes, such as scenes of markets, families, outdoors and the like, and contact perception dynamic information, such as contact description information of human bodies and clothing, such as contact description of clothing and users when the users sit down.

The adjusted physical simulation parameters of the clothing comprise, but are not limited to, material basic parameters, surface interaction parameters, environment response parameters, local special parameters and the like, wherein the material basic parameters comprise, but are not limited to, density for controlling weight feeling and sagging degree of the clothing, stretching rigidity for controlling resistance degree of the fabric when stretched, bending rigidity for determining formation difficulty degree of folds of the clothing and damping coefficient for influencing vibration attenuation speed of the fabric. The surface interaction parameters include, but are not limited to, a static coefficient of friction that controls the degree of adhesion of the garment when in contact with the human body or other object, a dynamic coefficient of friction that affects the ease with which the garment slides on the surface of the human body, and a crash coefficient of restitution that determines the degree of restitution of the garment after a crash. Environmental response type parameters include, but are not limited to, an air resistance coefficient that controls the air resistance of the garment during movement, a gravity response factor that adjusts the degree of response of the garment to gravity, and a wind influence coefficient that determines the magnitude of the garment's fluttering in the wind. Local special types of parameters include, but are not limited to, seam stiffness that controls physical properties at seams of the garment, fold memory that simulates the persistence of folds for certain fabrics (e.g., wrinkled fabrics), shape retention that simulates the shape retention of structured garments such as business wear.

The dynamic adjustment of the parameters is the key for realizing realistic fitting experience, and by dynamically adjusting the parameters, the physical behaviors of the clothing are enabled to be in natural transition, respond to environmental changes just like real clothing, and the immersion and reality of virtual fitting are greatly improved.

GDTS the framework converts static parameter settings into a dynamic self-adaptive system by integrating the five systems, namely a self-adaptive intelligent size adjustment system (AISS), an environment-aware dynamic material system (EADMS), a dynamic interaction grid optimization framework (DIMOF), an Augmented Reality Texture Projection System (ARTPS) and a situation intelligent physical parameter system (CIPPS), so that the overall performance of the system is improved compared with that of the traditional method, and the user satisfaction is improved. After the prepared virtual garment is processed through GDTS frames, complete geometric, material and physical information is contained, necessary data base is provided for dynamic fitting, and the sense of reality and naturalness of the final fitting effect are improved.

And rendering the virtual garment on the 3D human body model by using a dynamic fitting algorithm, so that the rendering of the virtual garment on the 3D human body model is more realistic. The output of the dynamic fit algorithm is a preliminary fit garment state that takes into account the geometric relationship of the garment and the human body, but does not yet fully simulate the physical and dynamic behavior of the garment. This initial fit will serve as the starting point for the next physical simulation to further simulate the dynamic effects of the garment under the influence of gravity and human movement.

Accurate dynamic fitting is a technical difficulty and core competitiveness of an AR fitting system, and the naturalness and fidelity of virtual fitting are directly determined. An excellent fit algorithm can make the virtual garment appear truly "worn" on the user, rather than simply "attached" to the user.

And S308, performing physical simulation on the 3D human body model containing the virtual clothes by using a preset physical simulation technology to obtain a virtual rendering video frame corresponding to the video frame.

In order to make the virtual fitting experience more real and natural, a physical simulation technology is used for carrying out physical simulation on a 3D human body model containing virtual clothes, so that the virtual clothes can naturally move along with the human body to generate a vivid dynamic effect.

In the embodiment provided by the application, the physical simulation technology is realized by using an adaptive multi-resolution physical simulation framework, a hybrid solver architecture, a physical parameter automatic calibration system based on deep learning and a physical boundary processing system special to an AR environment.

Wherein, using an adaptive multi-resolution physical simulation framework to dynamically allocate computing resources for a 3D human body model containing virtual clothes according to visual importance and deformation liveness; for example, high-precision physical grids (less than or equal to 5mm grid spacing) are adopted in visual critical areas (such as collars, cuffs and lower hem) and large deformation areas (such as joint bending parts) of the 3D human body model containing the virtual clothes, low-precision grids (more than or equal to 15mm grid spacing) are adopted in visual secondary areas (such as back plane areas) of the 3D human body model containing the virtual clothes, a real-time dynamic subdivision technology is adopted, grid density and physical calculation intensity are automatically adjusted according to human body motion states, and an automatic importance assessment algorithm is realized by the system:

;

wherein, each parameter is dynamically adjusted according to the current visual angle, the action amplitude and the fabric characteristic. This approach reduces the amount of computation compared to conventional uniform grid approaches while maintaining visual quality.

Wherein, the Scoring the region importance; As a factor of the degree of visibility, The deformation activity MaterialProperty is a material property value.

It should be noted that the visibility factor measures the extent to which a certain area of the garment is currently seen by the user. For example, if the user tries on a piece of jacket with the front facing the mirror, the visibility factor of the front of the jacket is high, while the visibility of the back is low. The system will track the user's perspective in real time, calculating the likelihood that each region is seen.

The deformation liveness reflects the degree of deformation that is occurring in the clothing region. For example, when a user bends his or her arms, the elbow region of the sleeve may experience significant wrinkles, which may lead to a jerk in the deformation activity of this region, while the relatively stationary torso portion may have a lower deformation activity.

The material characteristic value considers the degree of the demand of the fabric on fine simulation. The special materials such as silk and crepe cloth need finer physical simulation to show the characteristics, so the material characteristic values of the special materials are higher, and the common cotton cloth is relatively lower.

The alpha, beta and gamma are three weight parameters, and the relative importance of the parameters corresponding to the weight parameters is determined. The system can dynamically adjust specific values of the three weight parameters according to different scenes, for example, alpha (visibility weight) can be improved in a static display scene (such as standing observation) to enable a vision focus area to obtain more resources, beta (deformation weight) can be increased in dynamic try-on (such as walking and turning), dynamic effect is guaranteed to be vivid, gamma (material weight) can be adjusted upwards when special fabric clothes are displayed, and material characteristics are highlighted.

By applying the technology, the computing resources can be reasonably distributed, and more computing resources can be distributed in areas with high importance, so that the subsequent overall visual quality can be improved. For example, when the user looks down at the shoe, the system will immediately increase the importance score of the bottom of the leg area, allocate more grid and computing resources thereto, and reduce the accuracy of the temporary invisible collar of the coat. This ensures both the visual quality of the critical areas. And the whole calculated amount is greatly saved, so that the mobile phone cannot burn and reduce the frequency due to complex calculation. In practical applications, the system recalculates the score multiple times per second, so that when the user's viewing angle changes or makes different actions, the computing resource allocation can be seamlessly adjusted, and the best visual effect and performance balance are always maintained.

The hybrid physical solver architecture combines the advantages of multiple physical models, an explicit integral solver optimized for mobile equipment is specifically designed for a mobile phone GPU architecture, a structural layer adopts dynamics (Position Based Dynamics) based on position to process large deformation, a detail layer adopts a minimization method based on energy to generate natural wrinkles, and key points adopt a pre-calculation physical response lookup table to accelerate processing.

The hybrid solution process is expressed as:

;

Wherein X represents a clothing state vector, and comprises information such as position, speed and the like. X (t+Deltat) represents the clothing state at the next moment, which can be specifically understood as a new clothing state after a small time step (Deltat) is passed, including information such as the position, the speed and the like of each vertex, globalSolver (X (t)) represents a global calculation result, namely a calculation result of a global solver, localCorrection (X (t)) represents local detail correction, and DETAILENHANCEMENT (X (t)) represents visual detail enhancement.

It should be noted that GlobalSolver (X (t)) deals with the overall movement and substantial deformation of the garment, such as the basic physical behavior of the garment swinging as the body turns, the skirt sagging due to gravity, etc. It uses a location based dynamics approach (PBD), particularly suited for real-time computation on mobile devices. Imagine it is just like drawing the basic outline and action of a garment with a large brush. LocalCorrection (X (t)) are concerned with areas requiring special treatments such as the point of impact of clothing with the human body, interactions between clothing, etc. This step corrects for the mold penetration (clothing penetration into the body) or unnatural deformations that may occur in the global calculation. Somewhat like an artist corrects errors and irrational in the basic outline. DETAILENHANCEMENT (X (t)) is responsible for generating those details which are small but visually important, such as natural wrinkles of the fabric, minor jitters of silk, minor changes in texture, etc. It adopts energy-based method, and can produce very realistic visual effect. Just like the last added fine strokes by the painter, small area is critical to the final effect.

The multi-level mixed computing method has the advantages of efficiently processing large-scale deformation (suitable for performance limitation of mobile equipment) and presenting fine visual details (meeting the requirements of users on sense of reality). Such as when a user turns around a silk shirt, the system can handle the overall waving of the garment (global), the fit of the cuffs to the wrists (local correction), and the fine wrinkles and gloss changes produced by the fabric (detail enhancement).

The physical parameter automatic calibration system based on deep learning breaks through the limitation that the traditional method relies on manual setting, the system is provided with a physical parameter identification neural network (PHYSICSNET), multiple key physical parameters can be automatically deduced from a single clothing image, a physical characteristic database containing 10000+ real clothing samples comprises common materials such as cotton, silk, wool and synthetic fibers, and the simulation result is ensured to be consistent with the actual clothing performance through a bidirectional mapping model of the physical parameters and visual appearance.

The physical parameter extraction flow is as follows:

;

Where I is the garment image (or video frame of a 3D mannequin including rendered virtual garments), M is the garment grid, and C is the garment type information. These parameters include the critical physical properties of modulus of elasticity, damping coefficient, coefficient of friction, mass distribution, etc. The system can accurately simulate the characteristics of fluidity of silk, stiffness of jean, stretchability of knitted fabric and the like.

Wherein, the The 17 key physical parameters which are output specifically comprise (1) a tensile elastic modulus for controlling resistance when the fabric stretches, (2) a bending elastic modulus for controlling rigidity when the fabric bends, (3) a shearing elastic modulus for controlling resistance of shearing deformation of the fabric, (4) a compression elastic modulus for controlling reaction when the fabric is pressed, (5) a mass density of mass of cloth per unit area, (6) a dynamic friction coefficient for controlling friction characteristics of the fabric when the fabric moves, (7) a static friction coefficient for controlling friction characteristics of the fabric in a static state, (8) a damping coefficient for controlling vibration damping speed, (9) an air resistance coefficient for influencing movement resistance of the fabric in air, (10) a thickness parameter for controlling thickness characteristics of the fabric, (11) an anisotropic coefficient for describing differences of physical characteristics of the fabric in different directions, (12) a wrinkle formation threshold for controlling difficulty degree of wrinkle formation, (13) a coefficient for describing capability of recovering original shape of the material after being pressed, (14) a sagging coefficient for controlling shape of the fabric, (15) a sensitivity coefficient for adjusting sagging of the fabric to gravity, and (16) a rebound resilience coefficient for controlling sagging coefficient of the fabric, and (17) a thermal deformation model for representing the same physical deformation characteristics of the fabric under the same conditions as the temperature, the system can accurately simulate the characteristic performance of various materials.

For the specificity of the augmented reality environment, the application comprises an AR environment physical boundary processing system which is used for detecting and modeling physical boundaries (such as desktops and walls) in the reality environment in real time, carrying out physical interaction processing on virtual clothes and real objects (such as contact deformation of clothes and chairs when sitting down), and dynamically adjusting shadow and fold performance in consideration of visual influence of environment illumination on the gravity deformation of the clothes.

The boundary constraint processing formula in the system is as follows:

;

Where v is the garment vertex (i.e., the vertex position of the virtual garment above), i.e., a small point on the virtual garment, b is the detected environmental boundary point, e.g., the point of the surface of a real object in the environment such as a chair, table, wall, etc., As a set of boundary points, i.eAll real object surface points identified by the system that may collide with the virtual garment are saved.

The expression v-b represents the distance between the calculated garment vertex v and a certain boundary point b, in other words, the spatial distance between a certain garment vertex v and a certain boundary point b.

Min () is an operation of minimum value, here to find the shortest distance from the garment vertex v to all boundary points. ∀ is a symbol in mathematics representing "for all", which means here that the distance is calculated for each point b in the boundary set and then the minimum value is taken.

Constraint (v) is the Constraint force that is ultimately applied to the garment vertex v. In brief, this formula does the matter that for each point on the garment, the system will calculate how far it is from the nearest object in the real environment, and then based on this distance, apply the appropriate constraints to prevent the virtual garment from penetrating the real object.

By applying this technique, when a user sits on a real chair with a virtual skirt, the skirt will naturally lay on the seat instead of uncanny penetrating the chair, and when approaching a real wall, the virtual jacket will be deformed by compression instead of passing through the wall. This innovation enables the system to create a more realistic mixed reality experience in view of real environment factors.

In the embodiment provided by the application, the high-efficiency simulation of the basic physical effect is realized, namely the gravity effect simulation is realized, the system accurately calculates the influence of gravity on clothes at different positions, and the natural falling effect is generated. For example, a dress skirt may naturally sag, while a garment with shoulders conforming thereto may remain in a relatively fixed position. Dynamic pucker generation as the body movement causes deformation of the garment, the system can calculate and generate three typical puckers in real time, namely compression puckers (joint bends), drape puckers (like skirt puckers) and stretch puckers (fine tension textures created when the fabric is stretched). And the system can process physical interactions among multiple layers of clothing, such as superposition effect when the coat covers the shirt, so as to ensure that each layer of clothing has reasonable physical performance without mutual penetration. Human body-clothing interaction, including friction force calculation, collision response and inertia effect, when the human body moves rapidly, the clothing can generate delay following effect due to inertia, and the dynamic sense of reality is enhanced.

Through the integration of the innovative technology, the physical simulation calculation module of the system realizes the high-efficiency and high-quality clothing dynamic simulation on the mobile equipment, and breaks through the traditional trade-off between the limitation of calculation resources and the reality performance. The system can maintain the stable running performance of 60FPS, and simultaneously provides the clothing physical effect close to the professional level, thereby providing a technical foundation for AR fitting application.

The output of the physical simulation is a series of garment model states that exhibit natural dynamic effects that are synchronized with the body movements and exhibit physical behavior similar to that of a real garment. The dynamic realism is an essential difference between the AR fitting system and the static image synthesis method, and is also a key factor for improving the immersion and experience satisfaction of users.

S106, generating AR fitting videos based on the virtual rendering video frames, and displaying the AR fitting videos to a user.

AR rendering and fusion are key steps in seamlessly integrating virtual apparel into real video, with the goal of creating mixed reality effects that are visually indistinguishable from reality. The application adopts advanced neural rendering technology and combines the traditional graphics method to realize the following core functions:

(1) Illumination estimation and matching, wherein the system analyzes the illumination condition of the real environment from the video frame, and the illumination condition comprises parameters such as light source direction, intensity, color temperature, ambient light and the like. These illumination parameters are applied to virtual garment rendering to ensure that the shadows, highlights and overall brightness of the garment are consistent with the real scene. For example, when a user stands in bright natural light, the virtual garment will exhibit a corresponding daylight effect, while under indoor lighting, it will exhibit a suitable yellow shade and soft shade.

(2) The system adopts a physical rendering (PBR) technology to simulate various material characteristics, wherein the various material characteristics comprise, but are not limited to, subsurface scattering, a scattering effect such as semitransparent silk after light penetrates through thin fabrics, anisotropic reflection, a directional luster for fabrics such as silk, satin and the like, a micro-surface structure, a micro-texture for simulating fine textures of rough fabrics (such as wool and jean), refraction and transparency, a perspective effect for processing light and thin or mesh materials, and edge fusion and shielding treatment, and the system is used for accurately calculating the depth relation between virtual clothes and a real human body to solve the complex shielding problem. When the complex shielding problem is solved, when the clothing is partially shielded by a human body (such as the arms are crossed in front of the chest), the system can correctly process the shielding relation, a special edge fusion algorithm is applied to the clothing edge to eliminate saw teeth and hard boundaries, a natural transition effect is created, and the complex rendering of a semitransparent object (such as tissue) is considered, so that the clothing and the partially shielded human body are simultaneously displayed.

(3) The dynamic detail enhancement comprises the steps that the system utilizes physical simulation data calculated in the previous step to enhance the visual detail of the garment, wherein the visual detail comprises, but is not limited to, (a) dynamic fold textures, a realistic fold texture map is generated on the surface of the garment based on deformation data of physical simulation, (b) stress visualization, a fabric stretching area can show proper color change or fabric texture deformation, and (c) dynamic shadows, and self shadows and mutual shadows between the garment and a human body can be updated in real time along with the change of motion.

(4) The system further improves the rendering quality by applying a depth learning technology, and can improve the rendering quality from the aspects of (a) style migration, namely ensuring that the visual style of rendered clothing is consistent with that of a video picture, (b) detail synthesis, namely automatically adding common fine defects and details in the real world to a rendering result by a neural network, (c) super-resolution, namely performing intelligent up-sampling on the preliminary rendering result, increasing the richness of the details, (d) time sequence consistency, namely ensuring that the rendering effect is kept stable between video continuous frames to avoid flickering or jitter, (e) real-time performance optimization, namely realizing smooth experience on mobile equipment, wherein the system adopts a multi-level rendering strategy, and the multi-level rendering strategy comprises the steps of rendering a visual key region (such as a front chest and a shoulder) with higher precision, rendering a secondary region (such as a back and a difficult-to-perceive position) by adopting simplified rendering, and multiplexing the calculation results of the previous frames by utilizing a time re-projection technology to lighten calculation.

The output of AR rendering and fusion is a series of visually highly realistic composite video frames, with virtual clothing in these frames naturally fused with real video scenes, giving the user the visual impression that i are wearing the clothing. The visual effect of this seamless fusion is a key indicator of the success of the AR fitting system and is also a determining factor as to whether the user accepts the technique.

In the embodiment provided by the application, the multi-mode interaction processing module can also collect multi-mode clothing adjustment information of a user after the AR fitting video is displayed to the user, wherein the multi-mode clothing adjustment information comprises at least one of clothing local adjustment information and clothing touch information, and the rendering state of virtual clothing on a 3D human model in the AR fitting video is adjusted by applying the multi-mode clothing adjustment information.

The multi-modal interactive processing module comprises a semantic understanding framework, wherein the semantic understanding framework is used for identifying the adjustment information of the user voice input, the framework comprises a Domain Specific Language (DSL) processing engine for supporting and understanding and executing professional clothing adjustment terms such as waisting, sagging sense and shoulder line enhancement, the multi-modal interactive processing module further comprises a clothing concept analysis system based on a knowledge graph, and the system automatically converts abstract semantics into accurate three-dimensional grid deformation parameters.

The term "user instruction phrase" may be a professional clothing term, such as "waistline", "shoulder line reinforcement" or "draping feeling increase", or a daily expression, such as "more personal care" or "neck opening more. The system can understand the actual intent of these natural language expressions. "garment type" refers to whether the current try-on is a jacket, shirt, skirt, etc., because the same "tighten" instruction may mean adjusting the shoulder line on the suit, and the waist may be contracted on the skirt. The human body context considers the body shape characteristics and posture states of the user, and ensures that the geometric deformation is suitable for the body shape characteristics of the current try-on wearer. For example, for round shoulder body types, the specific implementation of the "reinforcing shoulder line" may be different from the standard body type.

The semantic mapping algorithm can precisely quantize abstract language concepts into specific 3D mesh deformation parameters-which vertices are to be moved, how much distance to move, in what direction, and the like. Thus, when the user says that the neckline is opened a little more, the system not only simply enlarges the neckline, but also takes the structure, the fabric characteristics and the neck shape of the neckline into consideration to make accurate adjustment like a professional tailor. Therefore, the user can communicate with the system in the most natural mode, does not need to learn complicated technical operation, and can easily adjust the virtual clothing like communicating with a real tailor.

The system realizes the understanding rate of the clothing professional terms based on the corpus training of the 6000+ professional clothing adjustment dialogue, and can process composite semantic commands, such as 'slightly tightening waist but keeping chest loose', decompose the composite semantic commands into a plurality of coordinated grid deformation operations, and simultaneously keep the overall aesthetic degree of clothing. This model significantly improves the expertise and accuracy in the virtual fitting process.

The multi-mode interaction processing module also comprises a tactile feedback system based on material characteristics, wherein the system integrates tactile feedback into AR fitting experience, and breaks through the limitation of traditional pure visual interaction, the application develops a material characteristic tactile mapping frame to convert the physical properties of fabrics into a perceivable tactile feedback mode, and a tactile parameter calculation formula is as follows:

HapticProfile(t) = Σ[Wi·Pi(materialProperties)];

Wherein Pi represents different haptic characteristics, wi is a dynamic weight coefficient, hapticProfile (t) is a haptic feedback feature (t), materialProperties is a material property;

The system is provided with a touch characteristic database containing 21 common fabrics, can simulate various textures such as smoothness from silk to roughness of jean, realizes force sensing interaction, ensures that the intensity of touch feedback and the degree of clothing deformation can be influenced by the pressure applied by a user, and ensures that the touch feedback and a physical simulation system work cooperatively.

The multi-mode interaction processing module also comprises a progressive interaction system for intent prediction, which realizes prospective progressive interaction based on intent prediction, such as real-time analysis of user interaction sequences by adopting a time sequence convolution network (TCN) and prediction of possible operation intent in the next step.

Prediction model structure:

IntentProbability = SoftMax(TCN(InteractionSequence<t-n:t>));

The chinese expression of the above formula is intent probability=normalization function (time series convolution network (recent interaction sequence)).

It should be noted that the system records the operation sequence of the user for the last period of time, which is the "recent interaction sequence t-n: t". For example, the user adjusts the neckline first, then looks at the side effects, and then adjusts the waist. This sequence of operations is input into a "time sequential convolutional network (TCN)" for analysis. Such networks are particularly adept at discovering patterns in time series, and can identify the user's operating habits and trends from the last few minutes of activity. The network analysis then outputs a set of raw values representing the preliminary likelihood that the user may perform various subsequent operations. But these raw values require further processing to make practical sense. The "SoftMax" function then converts these raw values into a probability distribution such that the sum of the probabilities for all possible operations is 100%. For example, the system may determine that the probability of the user next adjusting the collar is 60%, the probability of adjusting the sleeve length is 30%, and the other operational probabilities are 10%. The resulting "intent probability" is a ranked list of operational possibilities that the system will preferentially prepare for high probability operations.

The application pre-loads the resources according to the prediction results and calculates possible deformation results in advance, and reduces the perceived delay from the standard 120ms to 38ms. The interactive fluency optimizing engine is developed to convert discrete operation instructions into continuous clothing adjustment process, realize 'partially confirmed' interactive mode, allow the system to start responding before the user completes the whole gesture, greatly promote interactive fluency, and recognize and learn the specific operation mode of the user, such as 'habit sequence of adjusting collar first and sleeve length second', actively optimize UI layout for subsequent operation. Experimental data shows that progressive interaction reduces the time to complete common garment adjustment tasks.

The multi-modal interaction processing module is a multi-modal signal depth fusion framework, integrates gesture, voice, sight and expression data on a characteristic level through a multi-level signal integration framework, enables different modalities to be mutually supplemented and verified through a signal cooperative enhancement algorithm, improves overall recognition accuracy, enables a system to dynamically assign weight based on historical accuracy and current confidence when contradiction exists between different modality inputs through a conflict resolution mechanism, and ensures accurate and consistent user intention understood by the system through cross-modal intention consistency verification.

For example, when the user looks at the collar region while making a downward gesture and speaking "lower than a little," the system can understand that the user wants to lower the collar depth rather than move the garment down entirely. This depth fusion allows the system to still understand the user's intent correctly in noisy environments or ambiguous gestures.

Through the integration of the technology, the multi-mode interaction processing module of the system realizes the unprecedented natural, accurate and personalized interaction experience in the AR virtual fitting scene. Compared with the traditional method, the user satisfaction is improved, the success rate is improved after first use, and the error rate is reduced. The technology not only improves the usability of the system, but also obviously enhances the immersion and the specialty of the virtual fitting, so that the virtual fitting experience is more similar to or even better than the experience quality of the physical fitting.

In the method provided by the embodiment of the application, when the AR fitting video is displayed to the user, the following processing is further performed:

(1) The video stream optimization comprises the steps of comprehensively optimizing a video frame sequence after AR rendering by a system, and ensuring the final presentation quality, wherein the optimization comprises the steps of (a) frame rate matching, namely ensuring that the frame rate of an output video is consistent with that of an original video, usually 24-30 frames per second, avoiding unnatural action speed, (b) time domain smoothing, namely eliminating inter-frame jitter and flicker by applying a time filtering algorithm, such as a real-adaptation (TAA) technology, improving the stability of the video, (c) color calibration, namely adjusting color output according to the characteristics of a display screen of user equipment, ensuring accurate color restoration, and (d) coding optimization, namely selecting proper video coding parameters, and optimizing file size and transmission efficiency while ensuring the visual quality.

(2) The detail optimization processing specifically comprises the following steps:

An AR fitting method comprises the steps of (a) edge finishing, applying a special edge detection and optimization algorithm, mainly treating an edge area of clothing contacted with a human body, eliminating possible artifacts, b) stability enhancement, applying additional stability treatment to an area (such as a flutter cloth) where instability is likely to exist, avoiding unrealistic rapid shake, (c) visual consistency correction, ensuring that the color, brightness and texture of the clothing in the whole course of a video are kept consistent and are not changed due to scene change, (d) user interface integration, wherein the integrated content comprises but is not limited to real-time control options, namely, displaying clothing adjustment, such as color selection and size fine adjustment buttons, at a proper position of a screen, an information overlapping layer, namely, displaying clothing, price, materials and other information according to user requirements, split-screen comparison, namely, supporting different clothing or different color comparison views of the same clothing, and providing visual gesture guidance when the user is detected to have interaction but operation inaccuracy, (e) multi-terminal adaptation, wherein the system performs self-adaptation according to different display equipment characteristics, the self-adaptation comprises not limited to a vertical adjustment mode, namely, providing a full-length display screen, a full-scale display mode, a high-resolution display mode, a user is provided with a full-scale display mode, a user is convenient, a user is provided, a full-view display mode is used, a user is used, a full-scale display mode is used, a user is convenient, a user is provided, a full-screen view display mode is convenient, a user is provided, a full-screen view is convenient, and a user is provided, a full-scale view is convenient, and a user is provided, and a full-screen view is convenient, and a full-screen view is high, and a user is convenient, and a user is high, the method comprises the steps of but not limited to video recording, storing complete fitting process videos including user interaction and action display, carefully selecting frame screenshots, automatically or manually capturing static pictures with optimal fitting effects, social media sharing, one-key sharing, supporting direct release to a main-stream social platform, shopping integration, seamless docking with an e-commerce platform, supporting direct addition of shopping carts or orders from fitting results, data collection and analysis, system collection of anonymous use data on the premise of user permission, and system optimization, wherein optimized content comprises but not limited to recording try duration and interaction degree of different clothes of a user, analyzing expression feedback and clothes adjustment preference of the user, and providing more accurate personalized recommendation for subsequent users according to the collected data.

Through converting the technical achievement of AR fitting into visual and easy-to-use user experience, the user can obtain unprecedented virtual fitting experience on familiar equipment. Perfect output processing and user-friendly interface design are key factors to ensure that technical innovations can be translated into practical commercial value.

In the embodiment of the application, a target video of a user to be dressed is processed, human body posture data of each video frame in the target video is obtained, a 3D human body model of each video frame is constructed by using the human body posture data of each video frame, garment rendering information of the 3D human body model is obtained for the 3D human body model of each video frame, virtual garments are rendered on the 3D human body model based on a dynamic fitting algorithm and the garment rendering information, virtual rendering video frames of the video frames are obtained, AR dressing video is generated based on each virtual rendering video frame, and the AR dressing video is displayed to the user. By the method, the selected clothes can be tried on in the AR mode, the try-on effect of the clothes is shown for the user, the problems that the clothes cannot be tried on after being purchased online, the size is unsuitable, the style is not overlapped and the like are effectively solved, the probability of returning goods of consumers is reduced, and good shopping experience is provided for the user.

Referring to fig. 4, a flowchart of another virtual fitting method according to an embodiment of the present application is specifically described as follows:

The method comprises the steps of obtaining input videos, preprocessing the videos, carrying out human body posture detection and human body posture tracking processing on the preprocessed videos to obtain human body posture data of each video frame, generating a 3D human body model of each video frame based on the human body posture data of each video frame, determining virtual clothes of target clothes selected by a user, rendering the virtual clothes onto the 3D human body model of each video frame by using a dynamic fitting algorithm, carrying out simulation processing on the virtual clothes on the 3D human body model by using a physical simulation technology to obtain AR fitting videos, and further, when the AR fitting videos are displayed to the user, adjusting the rendering state of the virtual clothes in the AR fitting videos based on multi-mode interaction information and displaying the adjusted AR fitting videos to the user.

The application adopts dynamic video processing and real-time physical simulation technology, fundamentally solves the limitation that the traditional virtual fitting can only process static images or preset actions, realizes the natural fitting effect under any action, remarkably improves the technical applicability, realizes the seamless fusion of virtual clothing and real video by combining nerve rendering and the traditional graphics method, ensures that the light and shadow effect, fold change and environmental illumination of the clothing are consistent, and solves the common 'map feeling' problem in the prior AR fitting. The special cloth physical engine developed by the application can efficiently run on terminals such as mobile equipment and the like, meanwhile, the realistic physical effect is maintained, the calculation efficiency is improved by more than 300 percent compared with that of a general physical engine, and the real fitting experience is possible on common consumer-level equipment for the first time. The 3D human body reconstruction algorithm can accurately calculate the whole body size of the user from a single video, the measurement accuracy reaches +/-1.5 cm, the average error is far better than that of +/-3-5 cm in the prior art, and a foundation is laid for accurate fitting of clothing.

By the aid of the method, a user can freely move in videos and view clothing effects in real time, the system supports dynamic clothing display under various daily actions such as rotation, walking and sitting, the gap that wearing comfort and dynamic attractiveness of clothing cannot be evaluated in static fitting is filled, the user can dynamically fit in an immersive mode, in the fitting process, the user can adjust the clothing through various interaction modes such as integration of gestures, voices and expressions in a visual mode, the interaction success rate is more than 95%, the user efficiency is improved by 40% compared with a traditional button type interface, the comprehensive visual assessment can be carried out, viewing at 360 degrees is supported, the fitting effect can be observed from different angles by the user, the limitation that the traditional fitting mirror can only observe limited visual angles is overcome, and more comprehensive wearing effect assessment is provided. And the application also provides personalized fitting suggestions, namely, based on the analysis of the body shape and wearing effect of the user, the system can intelligently recommend more proper sizes and styles, the accuracy is improved by 35% compared with a traditional ruler code table, and the problem of goods returning caused by the fact that the sizes are not matched is effectively reduced.

By applying the scheme provided by the application, through the accurate virtual fitting effect, the consumer can evaluate the suitability of the clothing more accurately before buying, the actual measurement can reduce the refund rate of the clothing electronic commerce from 30% to below 15% on average, effectively reduce the refund rate, directly reduce merchant loss and logistics cost, and can also improve the conversion rate, namely the interactivity and interestingness of the virtual fitting obviously improve the residence time and buying willingness of the user, the fitting data shows that the conversion rate of the clothing class of the electronic commerce platform after the system is deployed is improved by 28% on average, which is far higher than the average industry growth level, and the operation cost can be reduced, namely the fitting space requirement and sample inventory in an entity store are reduced, the single store area can be reduced by 20% and simultaneously more abundant fitting selection is provided, meanwhile, the clothing loss and labor cost caused by fitting are obviously reduced, the multi-channel application capability is provided, the system is suitable for various scenes such as on-line electronic commerce, off-line intelligent mirror surfaces, mobile APP (application program) and the like, the unified virtual fitting image and uniform experience are provided for brands.

Although the present invention depicts operations in a particular order, this should not be construed as requiring that these operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

Corresponding to fig. 1, an embodiment of the present application provides a virtual fitting device, which is configured to support implementation of the method shown in fig. 1, and referring to fig. 5, a schematic structural diagram of the virtual fitting device provided by the embodiment of the present application is specifically described below:

a first obtaining unit 501, configured to obtain a target video including a user to be fitted;

A second obtaining unit 502, configured to perform human body posture detection and human body posture tracking processing on the target video, and obtain human body posture data of the user in each video frame of the target video;

A construction unit 503, configured to determine, for each of the video frames, a mannequin parameter of multiple dimensions based on the mannequin data of the video frame, and construct a 3D mannequin of the user in the video frame using each of the mannequin parameters;

A third obtaining unit 504, configured to obtain a virtual garment of a target garment from a preset garment model library, where the target garment is a garment to be tried on selected by the user;

The rendering unit 505 is configured to obtain, for each 3D mannequin of the video frame, garment rendering information of the 3D mannequin, and render the virtual garment on the 3D mannequin based on a preset dynamic fitting algorithm and the garment rendering information, so as to obtain a virtual rendered video frame corresponding to the video frame;

And a display unit 506, configured to generate an AR fitting video based on each virtual rendering video frame, and display the AR fitting video to the user.

In another embodiment provided by the application, the device further comprises an acquisition unit, wherein the acquisition unit is used for acquiring multi-mode clothing adjustment information of the user, the multi-mode clothing adjustment information comprises at least one of clothing local adjustment information and clothing touch information, and the rendering state of virtual clothing on a 3D human model in the AR fitting video is adjusted by applying the multi-mode clothing adjustment information.

In another embodiment provided by the present application, the first obtaining unit 501 of the apparatus performs the process of obtaining the target video including the user to be dressed, including:

acquiring an initial video containing a user to be fitted;

determining scene complexity in the initial video, and adjusting preset filtering parameters of denoising filtering based on the scene complexity;

processing the initial video by using denoising filtering after parameter adjustment to obtain a first video;

Correcting the scene main tone in the first video by using a preset color correction strategy to obtain a corrected second video;

performing resolution optimization processing on the second video by using a preset resolution optimization strategy to obtain a third video;

And performing frame rate stabilization processing on the third video by using a preset frame rate stabilization strategy, and performing fitting enhancement processing on the processed video based on a preset fitting enhancement strategy to obtain a target video.

In another embodiment of the present application, the second obtaining unit 502 of the apparatus performs the process of performing the human body posture detection and the human body posture tracking on the target video, and obtains human body posture data of the user in each video frame of the target video, where the process includes:

For each video frame of a target video, a preset gesture detection module is used for detecting human body key points of human body images of users to obtain human body skeleton data of the video frames, a preset multi-person separation network is used for dividing human body examples of the video frames, human body examples of the users in the video frames are determined, bone key point data of the users are determined in the human body skeleton data based on the human body examples, relevance of the users between the video frames and adjacent video frames is established, and motion tracks of all bone key points of the users between the video frames and the adjacent video frames are subjected to smoothing processing to obtain the human body gesture data of the video frames.

In another embodiment provided by the present application, the construction unit 503 of the apparatus performs the process of determining a plurality of dimensions of human model parameters based on human posture data of the video frame for each of the video frames, including:

For each video frame, extracting index parameters of each human index from the human posture data based on each human index of a preset deformable human model;

Extracting each detail description parameter from the video frame;

and determining each index parameter and each detail description parameter as each human model parameter.

In another embodiment of the present application, the third obtaining unit 504 of the apparatus performs the process of obtaining the virtual garment of the target garment from the preset garment model library, including:

collecting clothing data of target clothing;

Based on the apparel data, virtual apparel for the target apparel is determined in the apparel model library.

In another embodiment of the present application, the rendering unit 505 of the apparatus executes the process of rendering the virtual garment on the 3D mannequin based on the preset dynamic fitting algorithm and the garment rendering information to obtain a virtual rendered video frame corresponding to the video frame, including:

Determining a clothing class of the virtual clothing based on clothing size information of the virtual clothing in the clothing rendering information, and rendering the virtual clothing at a position corresponding to the clothing class in the 3D human model according to the clothing size information;

constructing a directed distance field of a 3D human body model after virtual clothing is rendered, wherein the directed distance field is used for describing the spatial relationship between the virtual clothing and the 3D human body model;

Based on a preset optimization strategy and the directed distance field, adjusting the rendering of the virtual garment on the 3D human model until a preset convergence condition is met, and stopping adjusting the virtual garment;

Adjusting rendering details of virtual garments rendered on the 3D mannequin by applying the garment rendering information;

And performing physical simulation on the 3D human body model containing the virtual clothes by using a preset physical simulation technology to obtain a virtual rendering video frame corresponding to the video frame.

In another embodiment provided by the present application, the rendering unit 505 of the apparatus executes the application of the garment rendering information to adjust rendering details of the virtual garment rendered on the 3D mannequin, including:

adjusting material rendering parameters of the virtual garment on the 3D human model based on environmental information in the garment rendering information;

Determining grid attribute data of different areas of the virtual garment on the 3D human body model based on action prediction information and visual perception information in the garment rendering information, and adjusting grid resources and grid density of the areas based on the grid attribute data of the different areas;

Based on the augmented reality texture information in the garment rendering information, adjusting the texture of the virtual garment on the 3D human model;

And adjusting clothing physical simulation parameters of the virtual clothing on the 3D human model based on scene parameter information in the clothing rendering information.

The embodiment of the invention also provides a storage medium, which comprises stored instructions, wherein the equipment where the storage medium is located is controlled to execute the virtual fitting method when the instructions run.

The embodiment of the invention also provides an electronic device, the structure of which is shown in fig. 6, specifically including a memory 601, and one or more instructions 602, where the one or more instructions 602 are stored in the memory 601, and configured to be executed by the one or more processors 603 to perform the virtual fitting method described above.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related region.

The specific implementation process and derivative manner of the above embodiments are all within the protection scope of the present invention.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A virtual fitting method, comprising:

Obtain a target video containing a user to try on clothes;

Performing human posture detection and human posture tracking processing on the target video to obtain human posture data of the user in each video frame of the target video;

For each of the video frames, determining human body model parameters in multiple dimensions based on the human body posture data of the video frame, and constructing a 3D human body model of the user in the video frame using the respective human body model parameters;

Obtaining a virtual garment of a target garment from a preset garment model library; the target garment is the garment to be tried on selected by the user;

For each 3D human body model in the video frame, obtaining clothing rendering information of the 3D human body model, and rendering the virtual clothing on the 3D human body model based on a preset dynamic fitting algorithm and the clothing rendering information, to obtain a virtual rendered video frame corresponding to the video frame;

An AR fitting video is generated based on each of the virtual rendering video frames, and the AR fitting video is displayed to the user.

2. The method according to claim 1, further comprising:

Collecting multimodal clothing adjustment information of the user, wherein the multimodal clothing adjustment information includes at least one of clothing local adjustment information and clothing touch information;

The multimodal clothing adjustment information is applied to adjust the rendering state of the virtual clothing on the 3D human body model in the AR fitting video.

3. The method according to claim 1, wherein obtaining a target video containing a user to be fitted with clothing comprises:

Get an initial video of the user who wants to try on clothes;

Determining scene complexity in the initial video, and adjusting filter parameters of a preset denoising filter based on the scene complexity;

Processing the initial video using a denoising filter with adjusted parameters to obtain a first video;

Correcting the main color tone of the scene in the first video using a preset color correction strategy to obtain a corrected second video;

Using a preset resolution optimization strategy, performing resolution optimization processing on the second video to obtain a third video;

The third video is subjected to frame rate stabilization processing using a preset frame rate stabilization strategy, and the processed video is subjected to fitting enhancement processing based on a preset fitting enhancement strategy to obtain a target video.

4. The method according to claim 1, wherein the performing of human posture detection and human posture tracking on the target video to obtain human posture data of the user in each video frame of the target video comprises:

For each video frame of the target video, a preset posture detection module is used to perform human key point detection on the user's human body image to obtain the human skeleton data of the video frame, and a preset multi-person separation network is used to perform human instance segmentation on the video frame to determine the human body instance of the user in the video frame, and based on the human body instance, the skeleton key point data of the user is determined in the human skeleton data; the correlation between the user in the video frame and the adjacent video frames is established, and the motion trajectory of each skeleton key point of the user between the video frame and the adjacent video frames is smoothed to obtain the human posture data of the video frame.

5. The method according to claim 1, wherein for each of the video frames, determining human body model parameters in multiple dimensions based on human body posture data of the video frame comprises:

For each of the video frames, based on various human body indicators of a preset deformable human body model, extracting an indicator parameter of each of the human body indicators from the human body posture data;

Extracting various detail description parameters from the video frame;

The index parameters and the detail description parameters are determined as human body model parameters.

6. The method according to claim 1, wherein the step of rendering the virtual garment on the 3D human body model based on a preset dynamic fitting algorithm and the garment rendering information to obtain a virtual rendered video frame corresponding to the video frame comprises:

determining a clothing category of the virtual clothing based on clothing size information of the virtual clothing in the clothing rendering information, and rendering the virtual clothing at a position corresponding to the clothing category in the 3D human body model according to the clothing size information;

Constructing a signed distance field of the 3D human body model after rendering the virtual clothing, wherein the signed distance field is used to describe the spatial relationship between the virtual clothing and the 3D human body model;

Adjusting the rendering of the virtual garment on the 3D human body model based on a preset optimization strategy and the signed distance field until a preset convergence condition is met, and then stopping adjusting the virtual garment;

Applying the clothing rendering information to adjust rendering details of the virtual clothing rendered on the 3D human body model;

A preset physical simulation technology is used to perform physical simulation on the 3D human body model including the virtual clothing to obtain a virtual rendering video frame corresponding to the video frame.

7. The method according to claim 6, wherein the applying the clothing rendering information to adjust rendering details of the virtual clothing rendered on the 3D human body model comprises:

Adjusting material rendering parameters of the virtual clothing on the 3D human body model based on the environmental information in the clothing rendering information;

determining mesh attribute data of different regions of the virtual garment on the 3D human body model based on the motion prediction information and the visual perception information in the garment rendering information, and adjusting mesh resources and mesh density of the regions based on the mesh attribute data of the different regions;

Adjusting the texture of the virtual clothing on the 3D human body model based on the augmented reality texture information in the clothing rendering information;

Based on the scene parameter information in the clothing rendering information, clothing physical simulation parameters of the virtual clothing on the 3D human body model are adjusted.

8. A virtual fitting device, comprising:

A first acquisition unit is configured to acquire a target video containing a user to be fitted with clothing;

A second acquisition unit is configured to perform human posture detection and human posture tracking processing on the target video to obtain human posture data of the user in each video frame of the target video;

a construction unit, configured to determine, for each video frame, human body model parameters in multiple dimensions based on human body posture data of the video frame, and construct a 3D human body model of the user in the video frame using the human body model parameters;

A third acquiring unit is configured to acquire a virtual garment of a target garment from a preset garment model library; the target garment is the garment to be tried on selected by the user;

a rendering unit, configured to obtain, for each of the 3D human body models in the video frame, clothing rendering information of the 3D human body model, and render the virtual clothing on the 3D human body model based on a preset dynamic fitting algorithm and the clothing rendering information, to obtain a virtual rendered video frame corresponding to the video frame;

A display unit is used to generate an AR fitting video based on each of the virtual rendering video frames and display the AR fitting video to the user.

9. A storage medium, characterized in that the storage medium comprises stored instructions, wherein when the instructions are executed, the device where the storage medium is located is controlled to execute the virtual fitting method according to any one of claims 1 to 7.

10. An electronic device, characterized in that it includes a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to implement the virtual fitting method according to any one of claims 1-7.