WO2025122928A1 - Design concept generation with generative adversarial networks - Google Patents
Design concept generation with generative adversarial networks Download PDFInfo
- Publication number
- WO2025122928A1 WO2025122928A1 PCT/US2024/058962 US2024058962W WO2025122928A1 WO 2025122928 A1 WO2025122928 A1 WO 2025122928A1 US 2024058962 W US2024058962 W US 2024058962W WO 2025122928 A1 WO2025122928 A1 WO 2025122928A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- diversity
- generated
- sample
- gan
- generator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/17—Mechanical parametric or variational design
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/08—Probabilistic or stochastic CAD
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the disclosure is generally directed to generative adversarial networks (GANs), and in particular, to an architecture for design concept generation with GANs.
- GANs generative adversarial networks
- GANs Generative adversarial networks
- DCG-GAN design concept generation with GANs
- a method for training models to generate images may comprise providing an input vector of a latent space of the generative model to a generator of a generative model.
- the method may comprise reading at least one sample generated by the generator based on the input vector.
- the method may comprise providing the at least one sample and at least one corresponding example to a plurality of inspectors of the generative model. Each of the plurality of inspectors may be configured to generate a respective vector characterizing a respective attribute of the at least one sample.
- the method may comprise reading the vectors generated by the plurality of inspectors.
- the method may comprise determining a generator loss value characterizing performance of the generator based on the vectors generated by the plurality of inspectors.
- the method may comprise training the generator based on the generator loss value.
- the at least one sample comprises a generated image.
- training the generator comprises backpropagation.
- each of the plurality of vectors is of a uniform size.
- each row of each of the plurality of vectors corresponds to a sample of the at least one sample.
- the method comprises normalizing each of the plurality of vectors using min-max normalization.
- each inspector is associated with a loss function.
- the method comprises determining a loss value for each inspector based on its associated loss function.
- determining the generator loss value comprises computing a result of a generator loss function.
- the generator loss function comprises the loss function associated with each of the inspectors.
- each respective attribute is selected from a set comprising realism, a shape, novelty, diversity, and desirability.
- the plurality of inspectors comprises a discriminator configured to generate a vector characterizing realism of the at least one sample relative to the at least one example.
- the method may further comprise training the discriminator based on the loss value for the discriminator.
- the plurality of inspectors comprises at least one of a diversity inspector, a novelty inspector, a desirability inspector, or a constraint inspector.
- the diversity inspector may be configured to generate a diversity vector characterizing diversity of the at least one sample.
- the diversity vector may be generated using the Covering Radius Upper Bound (CRUB) method.
- the novelty inspector may be configured to generate a novelty vector characterizing novelty of the at least one sample relative to the at least one example.
- the novelty vector may be generated using a Local Outlier Factor (LOF) method.
- LEF Local Outlier Factor
- the desirability inspector may be configured to generate a desirability vector characterizing predicted desirability of the at least one sample.
- the desirability vector may be generated using a Deep Multimodal Design Evaluation (DMDE) model.
- DMDE Deep Multimodal Design Evaluation
- the constraint inspector may be configured to generate a shape vector characterizing adherence of a silhouette for each sample of the at least one sample with a silhouette for each example of the at least one example using a Structural Similarity Index Measure (SSIM).
- SSIM Structural Similarity Index Measure
- Each silhouette may comprise a representation of the outer shape of one or more objects depicted in its associated sample or example.
- a system comprises a computing node.
- the computing node comprising a computer readable storage medium having program instructions embodied therewith.
- the program instructions may be executable by a processor of the computing node to cause the processor to perform a method comprising any of the aforementioned methods.
- Fig. 1 illustrates a DCG-GAN architecture, in accordance with one or more embodiments of the present disclosure.
- Fig. 2 illustrates examples of StyleGAN2 generated images, in accordance with one or more embodiments of the present disclosure.
- Fig. 3 illustrates examples of DCG-GAN generated images, in accordance with one or more embodiments of the present disclosure.
- Fig. 4A illustrates areas of the design space covered by the original and generated samples of the baseline model’s results, in accordance with one or more embodiments of the present disclosure.
- Fig. 4B illustrates areas of the design space covered by the original and generated samples of the DCG-GAN’ s results, in accordance with one or more embodiments of the present disclosure.
- Fig. 5A illustrates a distribution function and correlating semi-Gaussian function of the template matching confidence scores based on generated-real comparisons for the baseline model, in accordance with one or more embodiments of the present disclosure.
- Fig. 5B illustrates a distribution function and correlating semi-Gaussian function of the template matching confidence scores based on generated-real comparisons for the DCG- GAN model, in accordance with one or more embodiments of the present disclosure.
- FIGs. 6A and 6B illustrate pairs of exemplary generated samples and their most similar real example from the training dataset, in accordance with one or more embodiments of the present disclosure.
- Fig. 7A illustrates a qualitative assessment of blinded experiments, in accordance with one or more embodiments of the present disclosure.
- Fig. 7B illustrates a qualitative assessment of blinded experiments, in accordance with one or more embodiments of the present disclosure.
- Fig. 8 is a flow diagram depicting an exemplary method for training models to generate images, in accordance with one or more embodiments of the present disclosure.
- Fig. 9 depicts an exemplary computing node according to one or more embodiments of the present disclosure.
- Fig. 10 depicts an exemplary algorithm for matching source and template images, in accordance with one or more embodiments of the present disclosure.
- Design is a complex cognitive process that requires designers to make creative connections across different areas of knowledge. This process includes carefully identifying and solving problems that may not have been dealt with before or have been approached in unique ways in the past. Venturing into new territories within the design realm increases the chances of finding new and inventive solutions. However, this kind of exploration can take a long time and may be influenced by preconceived notions, a fixation on initial ideas, and personal biases. Designers often aspire to navigate the design space uniformly or adapt it to meet specific requirements. Computational technologies, particularly generative Artificial Intelligence (Al) methods, offer a promising avenue to accelerate searching and generating novel design concepts within the solution space.
- Al Artificial Intelligence
- Generative design refers to an automated design exploration process that analyzes all possible solutions to a design problem based on the specified requirements and constraints and then selects the passable ones among them.
- Generative design and design concept generation share a common iterative approach to exploring a broad solution space.
- Design concept generation primarily focuses on generating a multitude of approximate solutions aimed at inspiring designers during the ideation phase, rather than optimizing a design for production. Design concept generation applies a bottom-up approach, in contrast to a traditional designerbased top-down approach that enables exploration of a wider range 191 of complex solutions. Since there is no single correct answer to a design problem, given the high and even infinite degrees of freedom in product design, searching for all possible solutions could be resource- exhaustive and not practical to be executed by humans. Most of the well-known generative design methods operate on the basis of a set of defined design rules to iteratively evolve, or possibly optimize, an initial (usually selected randomly) solution to satisfy certain requirements. In contrast, GAN models are not limited to predefined rules, but instead attempt to search the design space based on the distribution of the provided dataset. Thus, GANs are a favorable choice for design concept generation.
- the premise of design concept generation is to enhance the efficiency, quality, and consistency of the design process by automatically generating numerous and diverse samples for designers to synthesize, choose from, and edit, thus elevating their roles to “curators” and “empathizers.”
- Models configured for design concept generation as a visual representation that capture the fundamental idea behind a product’s design are disclosed herein.
- the visual representation takes the form of an image.
- GANs generative adversarial networks
- GANs are a relatively recent method in modern Al and have demonstrated state-of-the-art performance in many generative modeling tasks.
- GANs have been used to solve a variety of generative design problems, from creating 3D aircraft models in native format for detailed simulation to topology optimization of 2D wheel design and generating realistic fashion apparel style recommendations.
- GANs provide a method of generative modeling that allows new forms of training algorithms.
- Al-driven design concept generation can serve as a powerful and transformative tool for designers to efficiently create more original and useful concepts.
- Advanced data-driven models can be developed to automatically analyze large amounts of product and user data, comprehend intricate patterns, invent new ideas, and evaluate them based on existing performance and user data, as well as other requirements and metrics.
- the designer can shift their focus from dragging and dropping to iterating a design, selecting, integrating, and modifying Al-generated concepts.
- GANs are one of the state-of-the-art generative models capable of generating realistic images according to the distribution of the input dataset to an extent that is not recognizable as synthetic data by human eyes.
- GANs are capable of producing a large number of solutions in a relatively short period.
- a standard GAN architecture consists of two neural networks: a generator G and a discriminator D, which are interactively trained by competing against each other in a minimax game. That is, GANs can be regarded as a two-player game of chance between the generator and the discriminator, in which the benefit of each contestant is equal to the detriment of the other.
- the generator attempts to produce realistic samples, while the discriminator attempts to distinguish the fake samples from the real ones.
- the parameters of both networks are updated through backpropagation with the following learning objective: mmmaxIE x ⁇ p data [log (x)] + E z ._ Pz [log (1 - £)(G(Z)) ] where z is a random or encoded vector, p data is the empirical distribution of training images, and p z is the prior distribution of z (e.g., normal distribution).
- z is a random or encoded vector
- p data is the empirical distribution of training images
- p z is the prior distribution of z (e.g., normal distribution).
- StyleGAN a cutting-edge GAN architecture for artificial image generation, called StyleGAN2.
- StyleGAN created by NVIDIA, produces facial images in high-resolution with unprecedented quality and is capable of synthesizing and mixing non-existent photorealistic images.
- StyleGAN and its extension, StyleGAN2 are characterized by an architecture different from that of the standard GAN generator. Conventional generators feed the latent code through the input layer only and map the input to an intermediate latent space.
- a latent vector z is first normalized and then mapped by a mapping network m to an intermediate vector w.
- a synthesis network g starts with a constant layer B with a base size.
- the learned affine transform A is modulated with trainable convolution weights w and then demodulated to reduce artifacts caused by arbitrary amplification of certain feature maps via instance normalization.
- Gaussian noise inputs are scaled by a trainable factor B and added to the convolution output in each style block with bias b.
- the Leaky ReLU is deployed as a non-linear activation for all layers. The output is fed into the 1 x 1 convolutional filter to generate an image at the last layer.
- the loss function used in StyleGAN2 is the logistic loss function with path-length regularization: where w is a is a mapped latent code, g is a generator, y are random images with normally distributed pixel intensities, and a is set as a dynamical constant learned as the long-running exponential moving average of the first term over optimization. Including the bias, a can regularize the gradient magnitude of a generated image g(w) projected on y and adjust it to be similar to the running exponential average, thus making the latent space w smoother.
- the loss function of the discriminator is the same as the standard logistic loss function with Al or R2 regularization.
- StyleGAN/StyleGAN2 architecture is capable of controlling the features of generated images at various scales due to its two sub-network consisted generator. With the inclusion of path length regularization, StyleGAN2 improves the generator condition and reduces the representation error. StyleGAN2 provides superior quality and resolution of generated images compared to previous generative models. However, existing applications and extensions of this architecture are predominantly focused on the quality and realism of generated samples without addressing diversity or novelty.
- Creativity as an indispensable element of the design process, may be defined as “the capacity to produce original and valuable items by flair.” Yet, it is often difficult to objectively assess due to its intangible and subjective nature.
- creativity may be translated into maximizing the degree of novelty and usefulness of the design concepts generated. Novelty may be gauged by how different an idea is relative to others. Usefulness may be measured in terms of the quality and performance of the design.
- the quality, performance, and originality of generated designs often correlate with the diversity of the concepts generated and the design space explored.
- an exemplary architecture for generating images may consider diversity and/or novelty as two fundamental criteria for objectively assessing the performance of GAN-based design concept generation in terms of creativity.
- Methods for measuring diversity may comprise subjective rating and the genealogical tree approach.
- Subjective rating of design space diversity may comprise categorizing a set of design ideas into various idea pools based on intuitive categories. Subjective rating is efficient in terms of time and effort, but the results may not be as valid or reliable since the inferences are based on the rater’s mental models.
- a genealogical tree approach adopts deterministic rules derived from design attributes to rate the diversity of a set of design ideas. The genealogical tree approach is repeatable and relatively more objective. However, they lack sensitivity and accuracy since they use the same set of formulae for all types of design problems.
- Embodiments of the present disclosure present a systematic and objective assessment of the creativity of, building of, and validation of a new architecture to compensate for the traditional GAN architecture.
- the new architecture may be based on GAN-based design concept generation.
- the new architecture was evaluated in terms of diversity and novelty. Findings demonstrate that although the trained generator of the baseline model is capable of producing realistic and authentic-looking images of sneakers, the generated samples strongly resemble existing products (z.e., the training dataset). As the baseline generator solely concentrates on outsmarting the discriminator by creating samples that look like the training dataset, it results in a lack of originality and variety, which limits its ability to generate creative designs.
- a generic architecture for GAN-based design concept generation is proposed herein to address the limitations of GANs in terms of creativity and to guide the generative process with a more diverse set of conditions and criteria beyond the generation of merely “realistic” samples, in accordance with one or more embodiments of the present disclosure.
- the proposed approach may involve incorporating additional inspectors’ feedback into the generator’s loss function, alongside the discriminator’s loss. This regularization enables the generator to simultaneously learn multiple domain-specific and domain-agnostic criteria, making it a versatile and effective generative tool for meeting various predefined benchmarks and performance standards.
- DCG-GAN a customized variant of the proposed generic architecture for design concept generation, may be trained in accordance with one or more embodiments of the present disclosure. This adaptation was specifically designed to meet the demands of generating design concepts that balance aesthetics and functionality.
- the evaluation process involved visual analysis, quantitative metrics, and qualitative assessments in a survey format with 90 participants. This comprehensive approach provided a thorough assessment of DCG- GAN’ s performance in terms of the diversity and novelty of the generated samples.
- the culmination of computational and subjective assessment methodologies consistently showcased the enhanced capabilities of DCG-GAN when juxtaposed with the baseline model.
- the baseline model refers to a state-of-the-art GAN architecture, StyleGAN2, trained to create 2D visual concepts (z.e., images) of sneakers based on a large training dataset scraped from multiple online footwear stores.
- DCG-GAN exhibited superior performance in generating design concepts that transcend mere realism, embracing attributes of novelty, diversity, desirability, and geometrical proportionality.
- Novelty and diversity are central themes in design and engineering innovation. According to the Osborn rule for brainstorming, the availability of a more diverse set of solutions and the uniqueness of the solutions can increase the chances of proposing a successful design instance. Although novelty has had variations on how to define it, a widely accepted definition stems from the work of who defined novelty as “how unusual or unexpected an idea is compared to other ideas.” Other work has centered around the newness or how frequent or infrequent a design solution appears. From the large body of work that defines novelty and evaluates it through various metrics and studies, novelty for the purposes of the panel of inspectors described herein is considered as something different, unique, and new. The novelty of a design concept (also referred to as uniqueness) does not necessarily correspond to the appearance and final results. Rather, the “otherness” or “uncommonness” of any characteristic within the concept or even in the design process can be credited as novelty.
- Diversity or variety is a related but separate construct in design.
- the diversity of the design output has been defined as how different the results are from each other.
- Research in innovation has noted that diversity is a critical component of innovation and has led to a number of strategies to increase the diversity of new knowledge in the design process, such as using communities to seek different design input from a wide variety of contributors.
- Diversity in design can take many forms, 247 from differences in function and working principle, to product architecture configurations, and visual aesthetics.
- GANs Diversity augmentation has emerged as a vital area of research in GANs, with the primary objective of improving the variety while preserving the quality of the generated outputs.
- GANs possess the capability to create realistic data samples from a given distribution, they are often plagued by mode collapse, wherein they produce only a limited subset of samples, failing to encompass the entire diversity of the target distribution.
- Prominent models were selected and categorized based on their strategies for restructuring the traditional GAN architecture. These strategies involve modifying or extending the standard GAN framework by incorporating supplementary components, regularization terms, or loss functions that facilitate the generation of more diverse and novel samples. Table 1 categorizes and provides an overview of selected GAN models that utilize diversity-augmented approaches.
- the loss regularization category comprises several models aimed at mitigating mode collapse in GANs by introducing additional regularization terms into the loss function.
- MS-GAN Mode Seeking GAN
- MS-GAN Mode Seeking GAN
- DS-GAN Diversity Balancing GAN
- DivCo GAN Diversity Conditional GAN
- Div Aug GAN Diversity Augmented GAN
- DivCo GAN introduces a contrastive loss that encourages similarity between images generated from adjacent latent codes while promoting dissimilarity between images from distinct latent codes.
- Div Aug GAN another extension of MS-GAN, defines a new regularization term to enhance mode diversity by exploring unseen image space, ensuring relative variation consistency, and maximizing distinction when injecting different noise vectors.
- PAD-GAN Performance Augmented Diverse GAN introduces a unique loss function that employs the determinantal point process (DPP) kernel, effectively augmenting quality and diversity simultaneously by establishing a global measure of similarity between pair of items. This kernel ensures a balanced representation of quality and diversity in generated samples.
- DPP determinantal point process
- the Inside Generator Augmentation category incorporates models that promote diversity through manipulations within the generator itself.
- One such model, PD-GAN Personalized Diversity Promoting GAN
- the process begins by selecting a set of diverse items from the dataset as the ground truth for diversity. During each iteration, the generator generates samples that are then ranked based on their relevance to each other. The top k items are selected and compared to the ground truth diverse set to be used in the loss function. Diversity is measured based on category coverage, a metric commonly employed in recommendation systems.
- the Data Augmentation category encompasses models that leverage data manipulation techniques to promote diversity in generated outputs.
- Two notable models within this category are GAN+ and EDA+GAN (Easy Data Augmentation Coupled with GAN).
- GAN+ adopts a two-step approach in which the dataset is initially sampled using the Dirichlet method, known for its ability to produce a more diverse set of samples. Subsequently, the model is trained, and the generated samples undergo a filtering process to eliminate low-quality samples. Finally, the qualified generated samples are integrated into the main dataset, augmenting the diversity of the training data.
- EDA+GAN incorporates data augmentation as a preprocessing step before training.
- the Bagging-Inspired category encompasses models that draw inspiration from the principles of bagging methods in machine learning to enhance diversity in GAN-generated samples.
- One such model within this category is EDA+GAN (Easy Data Augmentation Coupled with GAN).
- EDA+GAN Easy Data Augmentation Coupled with GAN
- the utilization of data augmentation techniques serves as a bagging-inspired strategy.
- data augmentation Prior to training the GAN, data augmentation is applied as a preprocessing step on the training set.
- the EDA+GAN model comprises multiple generators operating in parallel, each receiving the same input vector. These generators work independently and are supervised by a single shared discriminator.
- EDA+GAN aims to mimic the concept of ensemble learning in bagging methods.
- the Multi-Step Training category encompasses models that adopt a multistage approach to enhance diversity and quality in GAN-generated samples.
- a prominent model in this category is CLS-R GAN (Classification-Reinforced GAN).
- CLS-R GAN Classification-Reinforced GAN
- This approach introduces an additional discriminator-independent classifier that assesses the quality of the generated images.
- the classifier is first pretrained on the dataset to establish a reliable basis for quality evaluation. Subsequently, during the training process, the model receives feedback from both the discriminator and the classifier to guide the generator to generate higher-quality samples.
- This combination of feedback enables the CLS-R GAN to leverage the strengths of both the discriminator and the classifier in promoting diverse and realistic samples.
- the generator undergoes a self-training phase where it refines its output based on the qualified fake images detected and endorsed by the classifier.
- the Reinforcement-Leaming-Inspired category comprises models that draw inspiration from reinforcement learning principles to foster diversity in GAN generated samples.
- CLS-R GAN Classification- Reinforced GAN
- DP-GAN Diversity Promoting GAN
- DP-GAN utilizes a Long Short-Term Memory (LSTM) network as the discriminator, leveraging the LSTM’s ability to memorize previous records.
- LSTM Long Short-Term Memory
- DP-GAN employs a reinforcement learning paradigm, where the generator’s behavior is guided by rewards and penalties based on the uniqueness of the generated sample. If the sample is novel and has not been seen before, the generator is rewarded, whereas repeated samples lead to penalization.
- rewards and penalties are calculated at both the word-level and sentence-level.
- the discriminator outputs the cross entropy of the last layer of the network instead of a binary real-or-fake score to enhance discrimination accuracy.
- the Latent Vector Manipulation category includes models that focus on manipulating the latent vectors in GANs to enrich the diversity in generated samples.
- a prominent model in this category is PD-GAN (Probabilistic Diverse GAN), which finds application in image inpainting. In the context of image inpainting, regions near the boundary exhibit lower degrees of freedom for diversity compared to central areas.
- the PD- GAN model addresses this by calculating the dependence of each minor area on the existing content, progressively increasing diversity as it moves towards the center of the image.
- the training process begins with the generator-discriminator model being trained on the dataset. Subsequently, the generator generates a sample for the hole in an image, and the model modulates the latent vectors in areas that allow for high diversity.
- the evaluation algorithms of the present disclosure are powerful, automated, and objective. Further the evaluation algorithms were used in verifying the initial hypothesis regarding the limitations of traditional GANs and in establishing a benchmark for the proposed models described herein against the baseline. The evaluation algorithms may possess broader applicability beyond the applications described herein. They may serve as robust tools for assessing the diversity and novelty of new design concepts across various industries and contexts given a set of ground truth samples for algorithmic evaluation.
- the systems and methods described herein may enhance the GAN’s loss function by integrating multiple inspectors, each specialized in evaluating distinct aspects of the generated images.
- the systems and methods described herein may consider criteria for design concept generation. For example, the criteria comprises realism, novelty, diversity, desirability, and geometrical proportionality.
- Quality properties are typically measured according to two main categories of methods in the design literature, namely qualitative assessment carried out by a human expert and mathematical analysis.
- the qualitative assessment of diversity involves categorizing a set of design ideas into various ideas based on intuitive categories.
- a common mathematical approach for diversity analysis is to adopt a genealogical tree for a set of design solutions and to estimate the degree of relatedness among the under-review concept and other instances accordingly.
- the GAN diversity assessment approach seeks to capture and quantify the diversity of generated solutions for a given design problem.
- VGG16 model was initially proposed by for image classification and object detection, which that gained 92.7% accuracy on the ImageNet dataset.
- CNN convolutional neural network
- VGG16 is a very powerful model for feature extraction and image coding. Therefore, VGG16 is used to embed the dataset before feeding it to PCA.
- VGG16 is a 16-layer deep neural network model that contains stacked convolutional layers using the smallest possible receptive field of 3/3 that can have a sense of up/down, left/right, and center notions.
- An optional linear transformation layer of the input channel can be added to the top of the network in the form of a 1 > ⁇ 1 convolution filter.
- the 13 convolutional layers 5 are followed by max-pooling layers to implement spatial pooling with a pooling window of size 2x2 and a stride of size 2.
- the convolution stride is set to 1, but the padding is specified according to the receptive field to preserve the spatial resolution.
- the convolutional layers are then followed by three fully connected layers, with the first two layers containing 4096 each, and the last one depending on the number of classes.
- the topmost output layer is a softmax layer. Layers do not usually contain normalization to avoid high memory consumption and time complexity, as well as to preserve model performance.
- PCA is a multivariate statistical technique utilized in this paper to reduce the dimensionality of high-dimensional data from the intercorrelated feature space.
- the dataset on which PCA was used was a high-dimensional set of dependent features extracted from an image set, it is convenient to use this method to assess the diversity of generated samples.
- PCA may be applied to analyze and interpret complex data by disentangling the most representative features. This task may be carried out by computing values of the data table corresponding to a new set of orthogonal variables; thus, PCA can geometrically be viewed as the projection of the data samples onto the principal components’ space.
- These variables, which are called principal components are acquired as a linear combination of the original variables.
- the first principal component may be calculated so that it has the largest possible variance.
- the number of principal components calculated depends on the data structure and how much dimension reduction is required. For diversity evaluation, since the areas of the design space that are explored by the original and generated datasets need to be compared, a two-dimensional representation of the samples may be the most informative for visual analysis.
- Novelty is a multifaceted concept that encompasses various dimensions that require careful evaluation. When assessing the novelty of GAN-generated design concepts, it is essential to consider different aspects to gain comprehensive insights into their uniqueness and distinctiveness. To gain a comprehensive understanding of novelty in GAN-based concept generation, analyze various complementary aspects of novelty are analyzed as follows. These models have the capacity to identify similarities between images at various levels, ranging from low-level (z.e., pixel-based) detection to high-level (feature-based) detection.
- RMSE Root Mean Square Error
- RMSE may be used to score the extent of similarity between two images in a pixel-wise manner using the following formula: where l org and l gen denote samples from the original and generated datasheets, respectively; and each z and j represents the pixel associated with the i th row and the j th column of the images in which there are AT rows and N columns in total. This method checks for absolute repetitions.
- PSNR Peak Signal-to-Noise Ratio
- PSNR may be employed. PSNR may is defined as the ratio of the maximum possible power of a signal to the power of applied distortion. This definition can be translated to the ratio of the maximum pixel value to the error between the corresponding pixels in the generated and original images. PSNR is usually reported in logarithmic scale to characterize the high ratio stemming from high dynamic ranges of pixel values.
- PSNR is denoted by the following expression: where L 2 denotes the maximum potential value of the mentioned pixel and c denotes the color channel in an RGB image.
- PSNR may be applied due to its high noise-tolerance and also due to its sensitivity to brightness/color alternations.
- L 2 denotes the maximum potential value of the mentioned pixel
- c denotes the color channel in an RGB image.
- PSNR may be applied due to its high noise-tolerance and also due to its sensitivity to brightness/color alternations.
- SSIM Structural Similarity Index Measure
- this model operates similarly to the human visual system by modeling perceived changes in structural information as the combination of three components, namely structure comparison, contrast comparison, and luminance comparison.
- SSIM is defined as follows: where [i x , o x , and a x y represent the mean of x, the standard deviation of x, and the covariance matrix between x and , respectively, and C is a constant.
- SSIM is leveraged as an interdependency aware model to detect contextual dissimilarities.
- SRE Signal to Reconstruction Error
- the Spectral Angle Mapper (SAM), a spectrum-based pixel-wise metric, may be used to calculate the spectral similarity of the GAN-generated and original 2D images.
- the calculated angles between each pixel vector and the end-member spectrum vector indicate the degree of 547 similarity to the reference spectrum. As the presence of a dot product suggests, the smaller the angle, the more similar the spectrums will be.
- the reference spectrums are extracted from the original images and the test spectra are based on the generated samples. SAM may be used due to its capability to mitigate the impact of shading effects, thus emphasizing the desired reflectance characteristics of the target.
- the template matching algorithm may be applied to search for similar areas of a template image (original images) in a source image (generated images), called a training image. Template matching utilizes a sliding-window approach to compare different areas of the template with the source. The comparison method depends on the content of the images and the goal.
- the most frequently used similarity scoring methods for this technique include squared difference, cross-correlation, and cosine coefficient, as well as their normalized versions, which usually provide more accurate results.
- the normalized cross-correlation may be selected as the best, as it yields slightly more similar matching results.
- the matching process creates a two-dimensional result matrix R with similarity scores associated with each area of the image, and searches for the highest/lowest value depending on the comparison method.
- Template matching can be used to identify the most similar part or determine the location of that part. In this paper, however, template matching may be applied to find the generated image that is most similar to the source image from a set of real images. This method is simple to implement and computationally efficient.
- Fig. 10 An exemplary procedure to match one source image and one template image is presented in Fig. 10.
- the steps proposed herein may necessitate extensive preprocessing steps, such as background removal, for optimal performance. A combination of these measures was employed as described herein to ensure a comprehensive evaluation that captures various dimensions of novelty.
- An exemplary architecture designed in accordance with one or more embodiments of the present disclosure may replace and/or supplement the discriminating network with an evaluation panel of multiple benchmarks.
- the benchmarks are not necessarily global and domain-agnostic; but instead, they can be chosen with respect to domain-specific requirements.
- the evaluation panel may comprise a degree of realism.
- metrics such as diversity, novelty, desirability, and compatibility with geometric constraints (e.g., silhouettes) are often of the utmost importance.
- This feature can be integrated into the GAN structure using a large set of hand-evaluated concepts or using arithmetic methods to analyze the generated samples. To avoid the necessity of annotating a large dataset for each product category and benefit from user feedback on new design concepts, these terms may be mathematically modeled and integrated into the GAN architecture.
- FIG. 1 shows a schematic illustration of an exemplary system 100 configured for design concept generation and/or any of the GAN models described herein.
- System 100 may be modeled after a traditional GAN model and/or any of the GAN models described herein.
- discriminator 108 is modeled after and trained in the same way as a discriminator of a traditional GAN and/or any of the GAN models described herein.
- generator 104 is modeled after a generator of a traditional GAN.
- System 100 may address the limitations of the traditional GAN architecture for design concept generation.
- System 100 may be particularly configured to meet the needs of visual design recommendation.
- Architecture 100 may incorporate design-related specifications and limitations into the evaluation network.
- Architecture 100 may be configured to consider diversity, novelty, and desirability as essential requirements in visual design, along with the need for compatibility with a given silhouette as a constraint.
- System 100 may comprise panel of inspectors 100.
- An input vector 102 may be provided to generator 104 as input.
- Input vector(s) 102 may comprise one or more latent vectors from a latent space of generator 104.
- the number of vectors of input vector(s) 102 is equal to the batch size.
- the batch size may indicate a number of sample(s) 126 to be generated by generator 104.
- a pool of random codes is generated. The most diverse random codes may be selected from the pool. The number of random codes selected from the pool may correspond in quantity to the batch size.
- Executing the selection from the pool may comprise adopting a stratified sampling method.
- the latent space of generator 104 may be partitioned. For example, the latent space is partitioned into distinct hypercubes. Stratified sampling may ensure that each subgroup contributes proportionately to the overall diversity.
- the partitioned space may facilitate the allocation of each input vector 102 to a unique subgroup, from which an equal number of points are drawn.
- Generator 104 may explore uncharted regions of the latent space by virtue of input vector(s) 102 being diverse.
- Generator 104 may be configured to generate one or more samples 126 and/or one or more other samples based on input vector 102.
- Input vector 102 may be an n-dimensional vector.
- Sample(s) 126, one or more examples 124, and/or other information may be provided to panel of inspectors 106.
- Example(s) 124 may comprise one or more real examples from the training dataset.
- example(s) 124 comprises one or more real images.
- Panel of inspectors 106 may comprise one or more of discriminator 108, diversity inspector 110, novelty inspector 112, desirability inspector 114, constraint inspector 116, and/or another inspector.
- Each inspector may be configured to generate a respective vector characterizing a respective attribute of one or more samples 126.
- Each inspector may be designed to assess generated samples against a specific criterion.
- Providing input to panel of inspectors 106 may comprise providing the input to each inspector of panel of inspectors 106.
- Each inspector may have associated weights and an associated loss function.
- the associated weights for each inspector may be adjusted according to the associated loss function of that inspector.
- Generator 104 may be trained on the basis of feedback from each inspector of panel of inspectors 100.
- the feedback from inspectors may comprise the vectors generated by the inspectors.
- Cost function 118 may be configured to compute an aggregation of the feedback from the inspectors.
- the result of the computation may comprise generator loss 122.
- Generator 104 may be trained using generator loss 122.
- Discriminator loss 120 may be the same as generator loss 122, another output of cost function 118, the output of discriminator 108, and/or another value.
- Discriminator 108 may be trained based on discriminator loss 120.
- Each inspector is optimized to score the generator’s output with respect to a single criterion for which it was designed. The generator may be optimized to satisfy all criteria scored by panel of inspectors 106.
- Discriminator 108 may be configured to generate a vector characterizing realism of the at least one sample relative to the at least one example.
- Diversity inspector 110 may be configured to receive sample 126 as input.
- Diversity inspector 110 may be configured to gauge the diversity of the batch.
- Diversity inspector 110 may be configured to generate a diversity vector.
- the diversity vector may characterize diversity of sample(s) 126.
- the diversity of the batch may be characterized by a diversity score.
- Diversity inspector 110 may be configured to guide the generator towards more varied sample generation.
- Determining the diversity score may comprise using the Covering Radius Upper Bound (CRUB) method to generate the diversity vector.
- Diversity inspector 110 may be configured to determine the CRUB by computing the maximal distance between any Voronoi vertex and its nearest neighbor in the set of points.
- Diversity inspector 110 may be configured to determine the upper bound of CR
- the upper bound of CR is the diversity score.
- CRUB plays a pivotal role in global optimization by bounding the worst-case error approximation of the global optimum.
- a distinctive attribute of CRUB lies in its ability to maximize the shortest distance between all samples within the batch, rather than relying on the average distance that encompasses all pair- wise distances.
- Diversity inspector 110 may encourage the dispersion of all samples away from one another, thus contributing to greater diversity among the generated outcomes.
- Novelty inspector 112 may be configured to receive sample(s) 126 as input. Novelty inspector 112 may be configured to configured to generate a novelty vector characterizing novelty of sample(s) 126 relative to examples 124.
- the novelty vector may be generated using Local Outlier Factor (LOF).
- LEF serves as a density -based anomaly detection technique, designed to assess the distinctiveness of a generated sample in relation to the input dataset. This method identifies potential anomalies by evaluating the local deviation of a given data point concerning its neighboring points. LOF operates by comparing the local density of a data point with the densities of its k-nearest neighbors. If the density of the point is considerably lower than that of its neighbors, it is deemed an outlier.
- the core idea revolves around measuring the typical distance at which a point can be reached from its neighbors, known as the reachability distance. This measurement involves calculating the reachability distance between two objects, ensuring that it does not fall below the k distance of the second object, as defined by:
- novelty inspector 112 is configured to determine the LRD for each sample of sample(s) 126.
- novelty inspector 112 is configured to determine the LRD for each of a subset of sample(s) 126.
- the novelty vector may comprise each LRD determined by novelty inspector 112.
- Desirability inspector 114 may be configured to receive sample(s) 126 as input. Desirability inspector 114 may be configured to generate a desirability vector characterizing predicted desirability of the at least one sample. Desirability inspector 114 may comprise a Deep Multimodal Design Evaluation (DMDE) model. DMDE may be configured to assess user satisfaction with generated samples, this inclusion strengthens the DCG-GAN’s ability to ensure desirability in the produced visual designs. Generating the desirability vector may comprise providing sample(s) 126 as input to the DMDE model. The desirability vector may comprise the output of the DMDE model.
- DMDE Deep Multimodal Design Evaluation
- Deep multimodal design evaluation is an Al-driven model configured to provide an estimate of the desirability of a design concept without having to release the product and aggregate market results.
- DMDE is capable of performing design evaluation at the general level, attribute level, or both in any field that requires the provision of images, textual descriptions, and user reviews.
- the training workflow on this platform consists of four main parts: attribute-level sentiment analysis, image features extraction, description features extraction, and multimodal predictor.
- attribute-level sentiment intensities are extracted from online reviews, which serve as ground truth for training. Subsequently, visual and textual features are simultaneously extracted using a finetuned ResNet50 model and a fine-tuned BERT language model, respectively. Finally, the extracted features are processed by a multimodal deep regression model to predict desirability.
- the DMDE model is one example of an Al-driven tool that can help guide generator 104 towards creating samples that are user-centered and desirable.
- Designers or mathematical methods can use DMDE to predict the performance of a new design concept from the perspective of end users simply by feeding the renderings and descriptions to the model. This platform eases the process of evaluating design concepts, which is one of the most challenging tasks in the design of competitive design concepts.
- Constraint inspector 116 may be configured to receive sample(s) 126 as input. Constraint inspector 116 may be configured to generate a shape vector characterizing adherence of a silhouette for each sample of the at least one sample with a silhouette for each example of the at least one example using a Structural Similarity Index Measure (SSIM). Each silhouette may comprise a representation of the outer shape of one or more objects depicted in its associated sample or example. For example, each silhouette is a two- dimensional black image that illustrates the body shape of a group of products, leaving out the details.
- SSIM Structural Similarity Index Measure
- Sample(s) 126 should bear a noticeable similarity to one of the predefined silhouettes.
- the silhouette of an individual sample 126 meant to resemble a shoe should have the general shape of a shoe. This objective may not be attainable using traditional GAN architecture, since (1) GANs generate samples based on a noise vector from the latent space allowing no control over the features of the final concept, and (2) we aim to enlarge the input dataset for diversity enhancement, resulting in hundreds of silhouettes existing in the dataset.
- generating the shape vector may comprise extracting the image’s contours.
- Generating the shape vector may comprise comparing these contours to the corresponding contours of a designated set of silhouettes. The corresponding contours of the designated set may be generated based on examples 124. This comparison may yield a set of similarity scores, each indicating how closely the generated image’s contours match those of the silhouettes.
- the shape vector comprises the set of similarity scores. In some implementations, the shape vector comprises the highest similarity score.
- the highest similarity score serves as a geometrical constraint score in the broader loss function. In such an example, only the highest similarity score may be considered in the broader loss function to characterize the adherence of sample(s) 126.
- the contours of the generated images are extracted. These extracted contours are then compared with the contours derived from the set of benchmark silhouettes.
- the comparison process may comprise using the SSIM metric.
- the SSIM metric may be well suited for generating the shape vector because of its ability to quantify the compatibility of structural changes between images.
- SSIM emphasis on the overall image structure is in harmony with the aim of extracting detail-free body shapes from the generated concepts.
- its incorporation of perceptual phenomena, such as luminance masking and contrast masking enhances its capability to precisely identify structural disparities between images.
- SSIM’s focus on interdependence among neighboring pixels effectively captures vital information about object structure, particularly in terms of luminance masking’s influence on dissimilarities in bright regions and contrast masking’s ability to detect differences in textured or active areas.
- the inspectors of panel of inspectors 106 may be configured to yield vectors of uniform size, aligned with the batch size (z.e., Bl). Each row within these vectors may correspond to a distinct sample 126 within the batch.
- Cost function 118 may be configured to normalize the outputs of panel of inspectors 106 using min-max normalization techniques. Cost function 118 may be configured to compute the weighted summation of the normalized vectors, preserving the order of samples (or rows).
- Generator loss 122 may be the weighted summation. Generator loss 122 may be backpropagated through the generative network to adjust parameters of generator 104.
- the augmented loss function is formulated as follows: where 5 represents the silhouette.
- the associated loss functions are defined as follows:
- max(557A ), max(L0F), max(CRUB), and max(DMDE) represent the optimal or maximum values achievable by each function.
- the frameworks described herein serve as bespoke platforms for automated design concept recommendation, ensuring the quality of generated samples in terms of realism, outer shape geometry, novelty, diversity, and desirability.
- Fig. 8 a flowchart illustrating an exemplary method 800 for adaptive node removal during training of an artificial neural network is depicted.
- the operations of method 800 presented below are intended to be illustrative. In some implementations, method 800 is accomplished with one or more additional operations not described and/or without one or more of the operations discussed. The operations of method 800 may be performed in another order. Additionally, the order in which the operations of method 800 are illustrated in Fig. 8 and described below is not intended to be limiting.
- method 800 is implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information).
- the one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 800.
- the operations of method 800 are performed with regard to layers of an artificial neural network rather than individual nodes.
- Operation 802 may comprise providing an input vector of a latent space of the generative model to a generator of a generative model.
- Operation 804 may comprise reading at least one sample generated by the generator based on the input vector.
- the at least one sample may comprise an image.
- Operation 806 may comprise providing the at least one sample and at least one corresponding example to a plurality of inspectors of the generative model.
- Each of the plurality of inspectors may be configured to generate a respective vector characterizing a respective attribute of the at least one sample.
- Each respective attribute may be selected from a set comprising realism, a shape, novelty, diversity, and desirability.
- Each inspector may be associated with a corresponding loss function.
- Each inspector may be configured to compute its corresponding loss function based on its input to generate a loss value.
- the generated vectors may be the loss values.
- Each example may comprise an image.
- the vectors of the plurality of vectors may have a uniform size.
- Each row of an individual vector of the plurality of vectors may correspond to a sample of the at least one sample.
- Operation 808 may comprise reading the vectors generated by the plurality of inspectors.
- the plurality of inspectors may comprise one or more of a discriminator, a diversity inspector, a novelty inspector, a desirability inspector, a constraint inspector, and/or another inspector.
- Operation 810 may comprise determining a generator loss value characterizing performance of the generator based on the vectors generated by the plurality of inspectors.
- determining the generator loss may comprise normalizing each vector of the plurality of vectors.
- Determining the generator loss value may comprise computing a result of a generator loss function.
- the generator loss function may comprise the loss functions associated with the inspectors.
- the vectors are normalized using min-max normalization. By way of non-limiting example, the vectors are normalized prior to determining the generator loss
- Operation 812 may comprise training the generator and/or the discriminator.
- the generator may be trained based on the generator loss value. Training the generator may comprise performing backpropagation.
- the discriminator may be trained based on the vector generated by the discriminator and/or another loss value. Training the discriminator may comprise performing backpropagation.
- a case study was conducted on a large-scale dataset extracted from multiple online footwear stores to test the results of system 100 depicted in Fig. 1 and described herein. Subsequently, we present and analyze the results in depth to benchmark DCG-GAN against StyleGAN2 in terms of diversity and novelty.
- StyleGAN2-generated samples presented in Fig. 2 bear a strong resemblance to established shoe models and brands. Specifically, the five depicted concepts closely mirror existing Adidas, Basics, Nike, Reebok, and Adidas shoe designs available in the market. This outcome aligns with the inherent limitation of GAN generators, which tend to replicate patterns and characteristics present in the training dataset.
- Fig. 3 illustrates several design concepts generated by DCG-GAN that demonstrate various visually discernible features.
- the features comprise novelty, higher diversity, aesthetic appeal, minimal brand-specific features, high visual quality, and compatibility with geometric constraints.
- the DCG-GAN architecture excels in generating design concepts imbued with abundant novel attributes. This stark contrast becomes more pronounced compared to the concepts generated from the baseline. Although the latter typically mirror existing products from the training set, lacking distinctive and unique features, the DCG-GAN consistently introduces innovative elements.
- the efficacy of DCG-GAN in fostering diversity is evident through the wide- ranging spectrum of generated designs, which span different styles, patterns, and structural arrangements.
- aesthetic appeal the generated design concepts boast captivating forms and harmonious color palettes.
- the model’s ability to produce visually appealing outcomes is indicative of its capacity to capture intricate design aesthetics.
- the VGG16 model was trained on RGB images with dimensions of 224x224 and included 3 fully connected layers at the top of the network, without pooling layers, and employed a softmax activation function.
- Fig. 4A depicts the 2D representation of the mapped data points, where the red points represent the original data set, and the green points represent the generated data set.
- This visualization illustrates that the StyleGAN2 model (and other traditional GAN architectures) did not explore the entirety of the original data space, resulting in designs that are constrained to a specific and incomplete range of styles.
- the green dots cover only a subset of the space occupied by the red dots, highlighting a limitation in GANs’ ability to learn the complete distribution of the dataset.
- the scatter plot reveals the model’s inadequacy, especially in regions where there are fewer original samples, indicating that the model’s capacity to learn a subspace is reliant on the presence of an adequate number of data samples from that specific subspace.
- Fig. 4B depicts the visualization results.
- the visualization results indicate that the images generated by DCG-GAN possess the capacity to traverse uncharted areas within the solution space. Remarkably, this exploration is not only superior to the baseline model’s performance, but also extends beyond the confines of the original dataset.
- the DCG-GAN model effectively expands the boundaries of the solution space that the original dataset occupies from several directions. Notably, an interesting observation demonstrates that unlike the baseline model, DCG-GAN adeptly learns and captures areas within the solution space that lacked adequate representation within the original dataset.
- Table 2 provides statistical information on the similarity of the original and the GAN-generated samples calculated by the five different diversity measures, as follows:
- RMSE RMSE.
- the comparison results show the lowest RMSE of 0.0023 and the highest RMSE of 0.0173, with a mean RMSE of 0.008594, a standard deviation (SD) of 0.0031, and a median RMSE of 0.0083.
- SD standard deviation
- 846 A good baseline range for RMSE is (0.5, 1). Any values lower than this range suggest a very accurate model, or a very similar pair of samples, indicating the high similarity of low-level pixel -wise features between the GAN-generated and the original samples.
- PSNR PSNR.
- the results also show the highest PSNR of 30.9093 and the lowest PSNR of 28.0060, with a mean PSNR of 29.1585, a standard deviation of 0.6401, and a median PSNR of 29.0317.
- a PSNR value of 30 dB or higher is considered a good baseline for PSNR.
- the exact value may vary depending on the application and the quality of the original image. Consequently, the PSNR results also suggest a relatively high similarity between the original and GAN-generated samples.
- SSIM SSIM.
- the results of SSIM assessment show the highest SSIM of 0.9968 and the lowest SSIM of 0.8177, with a mean SSIM of 0.9470, a standard deviation of 0.0374, and a median SSIM of 0.9575.
- the acceptable range for SSIM is generally considered to be between 0.8 and 1.0, with higher values indicating greater similarity between the two images. Given this baseline for SSIM, the results strongly designate a high structural similarity between the two datasets.
- SRE SRE.
- the results show the highest SRE of 0.6185 and the lowest SRE of 0.4989, with a mean SRE of 0.5530, a standard deviation of 0.0256, and a median SRE of 0.5541.
- the acceptable SRE range is generally between 0.5 and 1.5.
- a score of 1.0 is considered optimal.
- the results indicate a high degree of similarity between the original and GAN-generated samples.
- SAM It is also shown that the model had the highest SAM of 89.9505 and the lowest SAM of 0.8127, with a mean SAM of 0.8962, a standard deviation of 0.0084, and a median SAM of 0.8986.
- Template Matching To quantify the novelty of GAN-generated sets, we employed a novelty evaluation algorithm, incorporating various similarity detection methods, including Template Matching. The process involved identifying the most similar design instance from the original dataset for each generated sample within each generated set. Subsequently, these results were consolidated into a similarity distribution function, enabling a statistical assessment of novelty in the generated outputs.
- the template matching method operates by searching a source image to identify areas that closely resemble a provided template image.
- Confidence scores for different areas within the template image are computed and compared using a sliding window approach.
- the algorithm gauges the similarity between the source image and the template image by considering the confidence within the rectangle whose dimensions match those of the source image and whose top-left corner corresponds to a specific point.
- source ie., created
- template z.e., original
- the confidence score in this context operates within a numerical range of 0 to 1, where a score of 0 indicates an absence of similarity, while a score of 1 signifies complete identity.
- Fig. 5A presents a distribution graph accompanied by a semi-Gaussian function, which has been fitted with a mean of 0.8385 and a variance of 0.0075.
- This distribution suggests that the majority of the design samples generated by StyleGAN2 bear a very close resemblance to those found in the original dataset. This outcome aligns with the central hypothesis, affirming that GANs have limitations in generating novel design concepts.
- the presence of samples with relatively low confidence scores which can be attributed to two factors: (1) Some generated images exhibit an unrealistic nature that they cannot be readily recognized as sneakers. (2) Template matching tends to differentiate between sneakers with the same pattern but different colors, resulting in lower similarity scores.
- 6A illustrates the most similar pair from the original dataset for a generated image, and it’s visually apparent that the generated image closely resembles an item already present in the original dataset, devoid of any discernibly novel attributes.
- Fig. 5B illustrates, for DCG-GAN, a distribution function and the correlating semiGaussian function matching confidence scores based on the generated-real comparisons.
- the evaluation results reveal DCG-GAN’ s proficiency in generating design concepts with increased novelty.
- Fig. 6B illustrates the most similar pair from the original dataset for an image generated by DCG-GAN.
- the dissimilarity between a sample generated by the DCG- GAN model and its closest counterpart from the original dataset is prominently evident, emphasizing DCG-GAN’ s ability to generate design concepts that deviate significantly from existing instances.
- the survey was extended to individuals from diverse professional and demographic backgrounds.
- the pool of participants included sneaker designers, designers specializing in other diverse applications, engineering students, and individuals with limited familiarity with design, engineering, and Al models.
- the survey comprised three main sections:
- Demographic questions Participants responded to inquiries about their age group, gender, highest level of education, occupation, ethnicity/race, and familiarity with generative Al models.
- Novelty assessment Following the presentation of both academic and simplified definitions of novelty, participants were tasked with individually rating the novelty of the 20 randomly ordered concepts, generated by the DCG-GAN and baseline models.
- Diversity assessment Similar to the novelty assessment, participants were provided with a definition of diversity before rating two sets of concepts. One set contained concepts generated by DCG-GAN, while the other featured concepts generated by the baseline model.
- Quantitative outcomes depicted in Fig. 7A show the minimum, first quartile, median, third quartile, maximum values, as well as the average derived from the participant ratings for novelty assessment. These values represented the novelty assessment for each individual design concept.
- the survey results on novelty indicated an average novelty assessment of 5.5257 for DCG-GAN, marking a 15% improvement compared to the baseline average score of 4.0757.
- the minimum novelty score for DCG-GAN surpassed the minimum for the baseline.
- the standard deviation for DCG-GAN (0.4583) was considerably reduced compared to the baseline (1.0395), which signifies enhanced consistency and suggests that not a subset of DCG-GAN generated samples but all exhibited increased novelty.
- GANs face in the context of design concept generation and generative design.
- GAN variants necessitate large training datasets, and when dealing with diverse design images, the generator’s ability to capture various modes may suffer, particularly when the dataset is insufficient, leading to overlooked modes with fewer data.
- mode collapse a common issue, arises during training, where the generator tends to produce a narrow set of samples associated with a limited distribution subset, especially problematic in high-dimensional inputs like images and videos.
- the generator’s lack of reward for diversification during training exacerbates this issue, causing over-optimization on a single output.
- GANs can struggle with diversity, novelty, and desirability, as their objective encourages mimicry of input data, potentially leading to an overly emulative generator. Pushing for greater diversity and creativity may compromise sample quality and utility.
- evaluating GAN performance remains challenging, with a lack of standardized methods to assess generated versus real distributions regarding different criteria.
- evaluation metrics e.g., image quality, stable training, and image diversity
- the focus primarily centers on image quality and training stability, leaving room for improvement in other evaluation criteria such as image diversity.
- the limitations inherent in traditional GAN architectures were considered, with a particular focus on StyleGAN2 chosen for quantitative validation due to its status as a state-of-the-art GAN model with photo-realistic outputs.
- the findings derived from this analysis are applicable to all traditional GAN models, as they share a common evaluation framework.
- the model architecture described herein demonstrated a significant enhancement in terms of diversity and novelty. This outcome confirmed it as a valuable tool for design concept generation.
- the approaches described herein address the initial challenges of GANs, demonstrating its capacity to explore design spaces with limited real samples, comprehensively exploring the real dataset distribution, and producing outputs that excel across multiple criteria.
- this research helps advance the transition of emerging technologies into useful tools for the designer.
- the architectures described herein may enable large numbers of novel and diverse concepts to be presented to the human designer as well as fast concept evaluation frameworks in terms of diversity and novelty, leveraging the speed and efficiency of computer-generated design knowledge while maintaining the critical eye and decision making of the human. This augmented approach may enable ultimate generated samples being radically changed relating to design, efficiency, and quality.
- FIG. 9 a schematic of an example of a computing node is shown.
- Computing node 10 is only one example of a suitable computing node and is not intended to suggest any limitation as to the scope of use or functionality of embodiments described herein. Regardless, computing node 10 is capable of being implemented and/or performing any of the functionality set forth hereinabove.
- computing node 10 there is a computer system/server 12, which is operational with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 12 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
- Computer system/server 12 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system.
- program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types.
- Computer system/server 12 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote computer system storage media including memory storage devices.
- computer system/server 12 in computing node 10 is shown in the form of a general-purpose computing device.
- the components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.
- Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
- bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, Peripheral Component Interconnect (PCI) bus, Peripheral Component Interconnect Express (PCIe), and Advanced Microcontroller Bus Architecture (AMBA).
- Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and nonremovable media.
- System memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
- Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”).
- a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”)
- an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media
- each can be connected to bus 18 by one or more data media interfaces.
- memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
- Program/utility 40 having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment.
- Program modules 42 generally carry out the functions and/or methodologies of embodiments as described herein.
- Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, etc.; one or more devices that enable a user to interact with computer system/server 12; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As depicted, network adapter 20 communicates with the other components of computer system/server 12 via bus 18.
- LAN local area network
- WAN wide area network
- public network e.g., the Internet
- the present disclosure may be embodied as a system, a method, and/or a computer program product.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD- ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD- ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiberoptic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user’s computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
- the term “exemplary” is used in the sense of “example,” rather than “ideal.” Moreover, the terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of one or more of the referenced items.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Geometry (AREA)
- Computer Hardware Design (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Processing Or Creating Images (AREA)
Abstract
Systems, methods, and computer program products for training models to generate images are described herein. The method may comprise providing an input vector of a latent space of the generative model to a generator of a generative model; reading at least one sample generated by the generator based on the input vector; providing the at least one sample and at least one corresponding example to a plurality of inspectors of the generative model; reading vectors generated by the plurality of inspectors; determining a generator loss value characterizing performance of the generator based on the vectors generated by the plurality of inspectors; and training the generator based on the generator loss value.
Description
DESIGN CONCEPT GENERATION WITH GENERATIVE ADVERSARIAL
NETWORKS
RELATED APPLICATION(S)
[0001] This application claims the benefit of priority of U.S. Provisional Application No. 63/606,914, filed December 6, 2023, which is hereby incorporated by reference in its entirety.
GOVERNMENT SUPPORT
[0002] This invention was made with government support under Grant No. 2050052 awarded by the National Science Foundation. The government has certain rights in the invention.
TECHNICAL FIELD
[0003] The disclosure is generally directed to generative adversarial networks (GANs), and in particular, to an architecture for design concept generation with GANs.
BACKGROUND OF THE DISCLOSURE
[0004] Generative adversarial networks (GANs) have been proposed as a potentially disruptive approach to generative design due to their remarkable ability to generate visually appealing and realistic samples. Current generator-discriminator architectures inherently limit the ability of GANs as a design concept generation tool. The results of a series of comprehensive and objective assessments conducted using a large dataset reveal that while the traditional GAN architecture can generate realistic samples, the generated and style- mixed samples closely resemble the training dataset, exhibiting significantly low creativity. Embodiments of the present disclosure build and validate an architecture for design concept generation with GANs (DCG-GAN) that enable GAN-based generative processes to be guided by geometric conditions and criteria such as novelty, diversity, and desirability.
SUMMARY
[0005] According to embodiments of the present disclosure, systems, methods of, and computer program products for training models to generate images are provided. A method
for training models to generate images may comprise providing an input vector of a latent space of the generative model to a generator of a generative model. The method may comprise reading at least one sample generated by the generator based on the input vector. The method may comprise providing the at least one sample and at least one corresponding example to a plurality of inspectors of the generative model. Each of the plurality of inspectors may be configured to generate a respective vector characterizing a respective attribute of the at least one sample. The method may comprise reading the vectors generated by the plurality of inspectors. The method may comprise determining a generator loss value characterizing performance of the generator based on the vectors generated by the plurality of inspectors. The method may comprise training the generator based on the generator loss value.
[0006] In some implementations, the at least one example comprises an image.
[0007] In some implementations, the at least one sample comprises a generated image. [0008] In some implementations, training the generator comprises backpropagation. [0009] In some implementations, each of the plurality of vectors is of a uniform size. [0010] In some implementations, each row of each of the plurality of vectors corresponds to a sample of the at least one sample.
[0011] In some implementations, the method comprises normalizing each of the plurality of vectors using min-max normalization.
[0012] In some implementations, each inspector is associated with a loss function.
[0013] In some implementations, the method comprises determining a loss value for each inspector based on its associated loss function.
[0014] In some implementations, determining the generator loss value comprises computing a result of a generator loss function.
[0015] In some implementations, the generator loss function comprises the loss function associated with each of the inspectors.
[0016] In some implementations, each respective attribute is selected from a set comprising realism, a shape, novelty, diversity, and desirability.
[0017] In some implementations, the plurality of inspectors comprises a discriminator configured to generate a vector characterizing realism of the at least one sample relative to the at least one example. The method may further comprise training the discriminator based on the loss value for the discriminator.
[0018] In some implementations, the plurality of inspectors comprises at least one of a diversity inspector, a novelty inspector, a desirability inspector, or a constraint inspector. The
diversity inspector may be configured to generate a diversity vector characterizing diversity of the at least one sample. The diversity vector may be generated using the Covering Radius Upper Bound (CRUB) method. The novelty inspector may be configured to generate a novelty vector characterizing novelty of the at least one sample relative to the at least one example. The novelty vector may be generated using a Local Outlier Factor (LOF) method. The desirability inspector may be configured to generate a desirability vector characterizing predicted desirability of the at least one sample. The desirability vector may be generated using a Deep Multimodal Design Evaluation (DMDE) model. The constraint inspector may be configured to generate a shape vector characterizing adherence of a silhouette for each sample of the at least one sample with a silhouette for each example of the at least one example using a Structural Similarity Index Measure (SSIM). Each silhouette may comprise a representation of the outer shape of one or more objects depicted in its associated sample or example.
[0019] In some implementations, a system comprises a computing node. The computing node comprising a computer readable storage medium having program instructions embodied therewith. The program instructions may be executable by a processor of the computing node to cause the processor to perform a method comprising any of the aforementioned methods.
[0020] In some implementations, a computer program product for training models to generate images. The computer program product may comprise a computer readable storage medium having program instructions embodied therewith. The program instructions may be executable by a processor to cause the processor to perform a method comprising any of the aforementioned methods.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate various exemplary embodiments and together with the description, serve to explain the principles of the disclosed embodiments.
[0022] Fig. 1 illustrates a DCG-GAN architecture, in accordance with one or more embodiments of the present disclosure.
[0023] Fig. 2 illustrates examples of StyleGAN2 generated images, in accordance with one or more embodiments of the present disclosure.
[0024] Fig. 3 illustrates examples of DCG-GAN generated images, in accordance with one or more embodiments of the present disclosure.
[0025] Fig. 4A illustrates areas of the design space covered by the original and generated samples of the baseline model’s results, in accordance with one or more embodiments of the present disclosure.
[0026] Fig. 4B illustrates areas of the design space covered by the original and generated samples of the DCG-GAN’ s results, in accordance with one or more embodiments of the present disclosure.
[0027] Fig. 5A illustrates a distribution function and correlating semi-Gaussian function of the template matching confidence scores based on generated-real comparisons for the baseline model, in accordance with one or more embodiments of the present disclosure.
[0028] Fig. 5B illustrates a distribution function and correlating semi-Gaussian function of the template matching confidence scores based on generated-real comparisons for the DCG- GAN model, in accordance with one or more embodiments of the present disclosure.
[0029] Figs. 6A and 6B illustrate pairs of exemplary generated samples and their most similar real example from the training dataset, in accordance with one or more embodiments of the present disclosure.
[0030] Fig. 7A illustrates a qualitative assessment of blinded experiments, in accordance with one or more embodiments of the present disclosure.
[0031] Fig. 7B illustrates a qualitative assessment of blinded experiments, in accordance with one or more embodiments of the present disclosure.
[0032] Fig. 8 is a flow diagram depicting an exemplary method for training models to generate images, in accordance with one or more embodiments of the present disclosure. [0033] Fig. 9 depicts an exemplary computing node according to one or more embodiments of the present disclosure.
[0034] Fig. 10 depicts an exemplary algorithm for matching source and template images, in accordance with one or more embodiments of the present disclosure.
DETAILED DESCRIPTION
[0035] Reference will now be made in detail to the exemplary embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
[0036] Design is a complex cognitive process that requires designers to make creative connections across different areas of knowledge. This process includes carefully identifying and solving problems that may not have been dealt with before or have been approached in unique ways in the past. Venturing into new territories within the design realm increases the chances of finding new and inventive solutions. However, this kind of exploration can take a long time and may be influenced by preconceived notions, a fixation on initial ideas, and personal biases. Designers often aspire to navigate the design space uniformly or adapt it to meet specific requirements. Computational technologies, particularly generative Artificial Intelligence (Al) methods, offer a promising avenue to accelerate searching and generating novel design concepts within the solution space.
[0037] Generative design refers to an automated design exploration process that analyzes all possible solutions to a design problem based on the specified requirements and constraints and then selects the passable ones among them. Generative design and design concept generation share a common iterative approach to exploring a broad solution space.
Nevertheless, they diverge in their respective objectives and applications. Design concept generation primarily focuses on generating a multitude of approximate solutions aimed at inspiring designers during the ideation phase, rather than optimizing a design for production. Design concept generation applies a bottom-up approach, in contrast to a traditional designerbased top-down approach that enables exploration of a wider range 191 of complex solutions. Since there is no single correct answer to a design problem, given the high and even infinite degrees of freedom in product design, searching for all possible solutions could be resource- exhaustive and not practical to be executed by humans. Most of the well-known generative design methods operate on the basis of a set of defined design rules to iteratively evolve, or possibly optimize, an initial (usually selected randomly) solution to satisfy certain requirements. In contrast, GAN models are not limited to predefined rules, but instead attempt to search the design space based on the distribution of the provided dataset. Thus, GANs are a favorable choice for design concept generation.
[0038] Existing generative design approaches can be categorized into five main classes, including shape grammars, L-systems, cellular automata, genetic algorithms, and swarm intelligence. These approaches typically enhance design generation through the application of mathematical functions or physics-based simulations. The generative design capabilities of commercial CAD packages focus on a limited set of conditions e.g., spatial constraints) and criteria (e.g., optimizing mass or structural strength). Current methodologies and tools are aimed at creating optimized production-ready designs rather than fostering unique and
innovative design concepts for faster and more efficient ideation during the early stages of the design process. However, the premise of design concept generation is to enhance the efficiency, quality, and consistency of the design process by automatically generating numerous and diverse samples for designers to synthesize, choose from, and edit, thus elevating their roles to “curators” and “empathizers.” Models configured for design concept generation as a visual representation that capture the fundamental idea behind a product’s design are disclosed herein. In some implementations, the visual representation takes the form of an image.
[0039] With the growing abundance of publicly available data (e.g., product data, user reviews) and recent advances in Al methods such as generative adversarial networks (GANs), there has been a recent surge in the adoption of Al-driven approaches for design automation. GANs are a relatively recent method in modern Al and have demonstrated state-of-the-art performance in many generative modeling tasks. GANs have been used to solve a variety of generative design problems, from creating 3D aircraft models in native format for detailed simulation to topology optimization of 2D wheel design and generating realistic fashion apparel style recommendations. GANs provide a method of generative modeling that allows new forms of training algorithms. These algorithms use indirect guidance, where a “discriminator” network is used as a source of feedback for a “generator” network tasked with gaining knowledge of the intricate distribution of the training set. The primary focus of GAN models is to create “realistic” outputs with the ability to mimic the latent region of the input dataset. This makes GANs potentially useful for generating visually appealing and realistic concepts. However, the existing architecture of GANs limit their suitability for design concept generation, which requires divergent thinking and imagination, since aesthetics and creativity are crucial in design.
[0040] Al-driven design concept generation can serve as a powerful and transformative tool for designers to efficiently create more original and useful concepts. Advanced data-driven models can be developed to automatically analyze large amounts of product and user data, comprehend intricate patterns, invent new ideas, and evaluate them based on existing performance and user data, as well as other requirements and metrics. As a result, the designer can shift their focus from dragging and dropping to iterating a design, selecting, integrating, and modifying Al-generated concepts. GANs are one of the state-of-the-art generative models capable of generating realistic images according to the distribution of the input dataset to an extent that is not recognizable as synthetic data by human eyes. Moreover, GANs are capable of producing a large number of solutions in a relatively short period.
These properties make GANs a potentially disruptive approach to generate myriad design concepts with little effort. To illuminate the capabilities and limitations of GANs for design concept generation, this section provides the background necessary to understand the general logic of the standard GAN model, followed by a description of StyleGAN architecture. Consecutively, this discussion covers recent developments in GAN-based generative design, explores emerging challenges in the field, and introduces a data driven design evaluation method that has the potential to address some of the key limitations associated with GANs in design concept generation.
[0041] The existing Al-driven design automation literature lacks a generic computational framework to conduct design concept generation studies guided by various design conditions and criteria to augment the creative design process. Despite the possibilities of GANs to produce realistic design outcomes, it is not yet clear how existing GAN architectures can support creativity since they tend inherently to replicate the training dataset with the same characteristics due to their sole focus on generating samples that “look real.” The lack of creativity is due to the fact that during the training process, the GAN generator is urged to produce samples close to the training data distribution to deceive the discriminator in a minimax game, which ultimately constrains design output, particularly in terms of variety and originality. The systems and methods described herein address the above gaps in Al driven design concept generation knowledge by first conducting a thorough quantitative analysis of the limitations of state-of-the-art models, then proposing a generic GAN-based architecture for multi-criteria sample generation, and finally customizing it for design concept generation. [0042] A standard GAN architecture consists of two neural networks: a generator G and a discriminator D, which are interactively trained by competing against each other in a minimax game. That is, GANs can be regarded as a two-player game of chance between the generator and the discriminator, in which the benefit of each contestant is equal to the detriment of the other. The generator attempts to produce realistic samples, while the discriminator attempts to distinguish the fake samples from the real ones. The parameters of both networks are updated through backpropagation with the following learning objective: mmmaxIEx~pdata [log (x)] + Ez._Pz [log (1 - £)(G(Z)) ] where z is a random or encoded vector, pdata is the empirical distribution of training images, and pz is the prior distribution of z (e.g., normal distribution).
[0043] In the standard GANs model, there is no control over the modes of data being generated. GANs are notoriously difficult to train and may often be unstable due to mode collapse, one of the main problems in the generative model. In this way, it is not a good choice to use this approach for generating realistic designs, especially considering the significant developments of GANs, which established a new state-of-the-art in generated images with high-quality and high resolution. This work builds on a cutting-edge GAN architecture for artificial image generation, called StyleGAN2. StyleGAN, created by NVIDIA, produces facial images in high-resolution with unprecedented quality and is capable of synthesizing and mixing non-existent photorealistic images.
[0044] StyleGAN and its extension, StyleGAN2, are characterized by an architecture different from that of the standard GAN generator. Conventional generators feed the latent code through the input layer only and map the input to an intermediate latent space.
However, in StyleGAN2, a latent vector z is first normalized and then mapped by a mapping network m to an intermediate vector w. A synthesis network g starts with a constant layer B with a base size. The learned affine transform A is modulated with trainable convolution weights w and then demodulated to reduce artifacts caused by arbitrary amplification of certain feature maps via instance normalization. Gaussian noise inputs are scaled by a trainable factor B and added to the convolution output in each style block with bias b. The Leaky ReLU is deployed as a non-linear activation for all layers. The output is fed into the 1 x 1 convolutional filter to generate an image at the last layer. The loss function used in StyleGAN2 is the logistic loss function with path-length regularization:
where w is a is a mapped latent code, g is a generator, y are random images with normally distributed pixel intensities, and a is set as a dynamical constant learned as the long-running exponential moving average of the first term over optimization. Including the bias, a can regularize the gradient magnitude of a generated image g(w) projected on y and adjust it to be similar to the running exponential average, thus making the latent space w smoother. The loss function of the discriminator is the same as the standard logistic loss function with Al or R2 regularization.
[0045] The StyleGAN/StyleGAN2 architecture is capable of controlling the features of generated images at various scales due to its two sub-network consisted generator. With the inclusion of path length regularization, StyleGAN2 improves the generator condition and reduces the representation error. StyleGAN2 provides superior quality and resolution of
generated images compared to previous generative models. However, existing applications and extensions of this architecture are predominantly focused on the quality and realism of generated samples without addressing diversity or novelty.
[0046] Creativity, as an indispensable element of the design process, may be defined as “the capacity to produce original and valuable items by flair.” Yet, it is often difficult to objectively assess due to its intangible and subjective nature. In the context of engineering design, the definition of creativity may be translated into maximizing the degree of novelty and usefulness of the design concepts generated. Novelty may be gauged by how different an idea is relative to others. Usefulness may be measured in terms of the quality and performance of the design. In addition, the quality, performance, and originality of generated designs often correlate with the diversity of the concepts generated and the design space explored. As such, an exemplary architecture for generating images may consider diversity and/or novelty as two fundamental criteria for objectively assessing the performance of GAN-based design concept generation in terms of creativity.
[0047] Methods for measuring diversity may comprise subjective rating and the genealogical tree approach. Subjective rating of design space diversity may comprise categorizing a set of design ideas into various idea pools based on intuitive categories. Subjective rating is efficient in terms of time and effort, but the results may not be as valid or reliable since the inferences are based on the rater’s mental models. A genealogical tree approach adopts deterministic rules derived from design attributes to rate the diversity of a set of design ideas. The genealogical tree approach is repeatable and relatively more objective. However, they lack sensitivity and accuracy since they use the same set of formulae for all types of design problems.
[0048] One of the most challenging tasks in the design process is evaluating the novelty of concepts and, ideally, distinguishing instances with the highest probability of success. However, there is no widely used method for assessing or improving GAN models in terms of novelty. An assessment of novelty may characterize the degree to which the GAN generator is capable of generating realistic samples that fool the discriminator and are significantly distinct from the training dataset.
[0049] Embodiments of the present disclosure present a systematic and objective assessment of the creativity of, building of, and validation of a new architecture to compensate for the traditional GAN architecture. The new architecture may be based on GAN-based design concept generation. The new architecture was evaluated in terms of diversity and novelty. Findings demonstrate that although the trained generator of the
baseline model is capable of producing realistic and authentic-looking images of sneakers, the generated samples strongly resemble existing products (z.e., the training dataset). As the baseline generator solely concentrates on outsmarting the discriminator by creating samples that look like the training dataset, it results in a lack of originality and variety, which limits its ability to generate creative designs.
[0050] A generic architecture for GAN-based design concept generation is proposed herein to address the limitations of GANs in terms of creativity and to guide the generative process with a more diverse set of conditions and criteria beyond the generation of merely “realistic” samples, in accordance with one or more embodiments of the present disclosure. The proposed approach may involve incorporating additional inspectors’ feedback into the generator’s loss function, alongside the discriminator’s loss. This regularization enables the generator to simultaneously learn multiple domain-specific and domain-agnostic criteria, making it a versatile and effective generative tool for meeting various predefined benchmarks and performance standards.
[0051] DCG-GAN, a customized variant of the proposed generic architecture for design concept generation, may be trained in accordance with one or more embodiments of the present disclosure. This adaptation was specifically designed to meet the demands of generating design concepts that balance aesthetics and functionality. The evaluation process involved visual analysis, quantitative metrics, and qualitative assessments in a survey format with 90 participants. This comprehensive approach provided a thorough assessment of DCG- GAN’ s performance in terms of the diversity and novelty of the generated samples. The culmination of computational and subjective assessment methodologies consistently showcased the enhanced capabilities of DCG-GAN when juxtaposed with the baseline model. As used herein, the baseline model refers to a state-of-the-art GAN architecture, StyleGAN2, trained to create 2D visual concepts (z.e., images) of sneakers based on a large training dataset scraped from multiple online footwear stores. Evidently, DCG-GAN exhibited superior performance in generating design concepts that transcend mere realism, embracing attributes of novelty, diversity, desirability, and geometrical proportionality.
[0052] Novelty and diversity are central themes in design and engineering innovation. According to the Osborn rule for brainstorming, the availability of a more diverse set of solutions and the uniqueness of the solutions can increase the chances of proposing a successful design instance. Although novelty has had variations on how to define it, a widely accepted definition stems from the work of who defined novelty as “how unusual or unexpected an idea is compared to other ideas.” Other work has centered around the newness
or how frequent or infrequent a design solution appears. From the large body of work that defines novelty and evaluates it through various metrics and studies, novelty for the purposes of the panel of inspectors described herein is considered as something different, unique, and new. The novelty of a design concept (also referred to as uniqueness) does not necessarily correspond to the appearance and final results. Rather, the “otherness” or “uncommonness” of any characteristic within the concept or even in the design process can be credited as novelty.
[0053] Diversity or variety is a related but separate construct in design. The diversity of the design output has been defined as how different the results are from each other. Research in innovation has noted that diversity is a critical component of innovation and has led to a number of strategies to increase the diversity of new knowledge in the design process, such as using communities to seek different design input from a wide variety of contributors. Diversity in design can take many forms, 247 from differences in function and working principle, to product architecture configurations, and visual aesthetics.
[0054] There are differences in the type of innovation output that relates to the novelty and diversity of design. These segments output into radical innovation versus incremental innovation. Radical innovations are fundamental and revolutionary changes that typically accompany the adoption of new technology. Incremental innovations are minor improvements or adjustments to current technology or product design instantiations. One can view radical versus incremental innovation as a spectrum, with most existing products falling on the incremental side of the continuum.
[0055] In the examples of sneakers and shoes, new designs and market entrants are, in the main, examples of incremental innovation. Fundamentally, the architecture of athletic shoes has not changed in more than 100 years. The product architecture is comprised of a rubberbased sole, with an upper constructed from different materials such as leather, canvas, or a combination of both. As such, novelty and diversity in this segment has a significant relationship with fast changes and adaptions of physical appearance such as shapes used to construct the lower and upper, material variations, graphics and colors, rather than significant changes in architecture or overall structure of the sneaker.
[0056] Sneakers have become a social and cultural signifier, rather than products focusing solely on function. Therefore, the main thrust of the design of these products is changes and variations in appearance.
[0057] In addition to creativity, the design process must address other key metrics such as viability, feasibility, and desirability. The latter is of utmost importance in GAN-based
design concept generation, as it is directly correlated with the incorporation of user needs and feedback in the generative process. For a GAN structure to generate unique and desirable design concepts, it must be equipped with a design evaluator so that the model is trained not only according to the realism of the outputs but also according to their desirability.
[0058] Diversity augmentation has emerged as a vital area of research in GANs, with the primary objective of improving the variety while preserving the quality of the generated outputs. Although GANs possess the capability to create realistic data samples from a given distribution, they are often plagued by mode collapse, wherein they produce only a limited subset of samples, failing to encompass the entire diversity of the target distribution.
Consequently, the generated outputs lack variety and do not accurately represent the full data manifold. Prominent models were selected and categorized based on their strategies for restructuring the traditional GAN architecture. These strategies involve modifying or extending the standard GAN framework by incorporating supplementary components, regularization terms, or loss functions that facilitate the generation of more diverse and novel samples. Table 1 categorizes and provides an overview of selected GAN models that utilize diversity-augmented approaches.
[0059] The loss regularization category comprises several models aimed at mitigating mode collapse in GANs by introducing additional regularization terms into the loss function. Within this category, MS-GAN (Mode Seeking GAN) and its extensions were identified, namely DS-GAN (Diversity Sensitive Conditional GAN), Diversity Balancing GAN, DivCo GAN (Diversity Conditional GAN), and Div Aug GAN (Diversity Augmented GAN). MS- GAN, operating on conditional GAN principles, proposes a novel regularization term to maximize the ratio of distances between generated images and their corresponding latent codes, thereby promoting the exploration of diverse minor modes. Extending MS-GAN, DivCo GAN introduces a contrastive loss that encourages similarity between images generated from adjacent latent codes while promoting dissimilarity between images from distinct latent codes. Div Aug GAN, another extension of MS-GAN, defines a new regularization term to enhance mode diversity by exploring unseen image space, ensuring
relative variation consistency, and maximizing distinction when injecting different noise vectors. Furthermore, PAD-GAN (Performance Augmented Diverse GAN) introduces a unique loss function that employs the determinantal point process (DPP) kernel, effectively augmenting quality and diversity simultaneously by establishing a global measure of similarity between pair of items. This kernel ensures a balanced representation of quality and diversity in generated samples.
[0060] The Inside Generator Augmentation category incorporates models that promote diversity through manipulations within the generator itself. One such model, PD-GAN (Personalized Diversity Promoting GAN), adopts a personalized approach to enhance diversity. The process begins by selecting a set of diverse items from the dataset as the ground truth for diversity. During each iteration, the generator generates samples that are then ranked based on their relevance to each other. The top k items are selected and compared to the ground truth diverse set to be used in the loss function. Diversity is measured based on category coverage, a metric commonly employed in recommendation systems. By diversifying the generation process within the generator and leveraging personalized ranking mechanisms, PD-GAN effectively encourages the production of a more diverse range of high-quality samples.
[0061] The Data Augmentation category encompasses models that leverage data manipulation techniques to promote diversity in generated outputs. Two notable models within this category are GAN+ and EDA+GAN (Easy Data Augmentation Coupled with GAN). GAN+ adopts a two-step approach in which the dataset is initially sampled using the Dirichlet method, known for its ability to produce a more diverse set of samples. Subsequently, the model is trained, and the generated samples undergo a filtering process to eliminate low-quality samples. Finally, the qualified generated samples are integrated into the main dataset, augmenting the diversity of the training data. EDA+GAN, on the other hand, incorporates data augmentation as a preprocessing step before training.
[0062] The Bagging-Inspired category encompasses models that draw inspiration from the principles of bagging methods in machine learning to enhance diversity in GAN-generated samples. One such model within this category is EDA+GAN (Easy Data Augmentation Coupled with GAN). In this approach, the utilization of data augmentation techniques serves as a bagging-inspired strategy. Prior to training the GAN, data augmentation is applied as a preprocessing step on the training set. The EDA+GAN model comprises multiple generators operating in parallel, each receiving the same input vector. These generators work independently and are supervised by a single shared discriminator. By employing data
augmentation and parallel generator structures, EDA+GAN aims to mimic the concept of ensemble learning in bagging methods.
[0063] The Multi-Step Training category encompasses models that adopt a multistage approach to enhance diversity and quality in GAN-generated samples. A prominent model in this category is CLS-R GAN (Classification-Reinforced GAN). This approach introduces an additional discriminator-independent classifier that assesses the quality of the generated images. The classifier is first pretrained on the dataset to establish a reliable basis for quality evaluation. Subsequently, during the training process, the model receives feedback from both the discriminator and the classifier to guide the generator to generate higher-quality samples. This combination of feedback enables the CLS-R GAN to leverage the strengths of both the discriminator and the classifier in promoting diverse and realistic samples. Moreover, the generator undergoes a self-training phase where it refines its output based on the qualified fake images detected and endorsed by the classifier.
[0064] The Reinforcement-Leaming-Inspired category comprises models that draw inspiration from reinforcement learning principles to foster diversity in GAN generated samples. Within this category, two notable models are CLS-R GAN (Classification- Reinforced GAN) and DP-GAN (Diversity Promoting GAN). DP-GAN utilizes a Long Short-Term Memory (LSTM) network as the discriminator, leveraging the LSTM’s ability to memorize previous records. When a new sample is generated, DP-GAN employs a reinforcement learning paradigm, where the generator’s behavior is guided by rewards and penalties based on the uniqueness of the generated sample. If the sample is novel and has not been seen before, the generator is rewarded, whereas repeated samples lead to penalization. To achieve accurate and specific diversity promotion, rewards and penalties are calculated at both the word-level and sentence-level. Furthermore, the discriminator outputs the cross entropy of the last layer of the network instead of a binary real-or-fake score to enhance discrimination accuracy.
[0065] The Latent Vector Manipulation category includes models that focus on manipulating the latent vectors in GANs to enrich the diversity in generated samples. A prominent model in this category is PD-GAN (Probabilistic Diverse GAN), which finds application in image inpainting. In the context of image inpainting, regions near the boundary exhibit lower degrees of freedom for diversity compared to central areas. The PD- GAN model addresses this by calculating the dependence of each minor area on the existing content, progressively increasing diversity as it moves towards the center of the image. The training process begins with the generator-discriminator model being trained on the dataset.
Subsequently, the generator generates a sample for the hole in an image, and the model modulates the latent vectors in areas that allow for high diversity.
[0066] While the previously mentioned studies have demonstrated noteworthy advancements in enhancing diversity, it is important to note that all the approaches within the loss function regularization category have relied on some form of intra-batch pair-to-pair distance averaging as a diversity measure. This approach has led to a shift in the overall diversity average, rather than achieving diversification across all generated samples. In this study, the minimum distance among all pairs (z.e., worst-case scenario) may have been considered as the measure of diversity, effectively promoting diversity across all generated samples by forcing the generator to diversify any chosen subset of the generated images. Furthermore, drawing inspiration from, which diversifies input noise vectors across various categories, the input noise vectors’ diversity may be enhanced through an approach involving extensive sampling of a pool of vectors and then selecting the most diverse subset using stratified sampling, enabling exploration of uncharted areas.
[0067] The evaluation algorithms of the present disclosure are powerful, automated, and objective. Further the evaluation algorithms were used in verifying the initial hypothesis regarding the limitations of traditional GANs and in establishing a benchmark for the proposed models described herein against the baseline. The evaluation algorithms may possess broader applicability beyond the applications described herein. They may serve as robust tools for assessing the diversity and novelty of new design concepts across various industries and contexts given a set of ground truth samples for algorithmic evaluation. The systems and methods described herein may enhance the GAN’s loss function by integrating multiple inspectors, each specialized in evaluating distinct aspects of the generated images. The systems and methods described herein may consider criteria for design concept generation. For example, the criteria comprises realism, novelty, diversity, desirability, and geometrical proportionality.
[0068] Quality properties are typically measured according to two main categories of methods in the design literature, namely qualitative assessment carried out by a human expert and mathematical analysis. The qualitative assessment of diversity involves categorizing a set of design ideas into various ideas based on intuitive categories. A common mathematical approach for diversity analysis is to adopt a genealogical tree for a set of design solutions and to estimate the degree of relatedness among the under-review concept and other instances accordingly.
[0069] In alignment with the established definition of diversity in the design literature, which refers to the extent of dissimilarity among design concepts compared to each other, the GAN diversity assessment approach seeks to capture and quantify the diversity of generated solutions for a given design problem. When employing GANs to produce a batch of design solutions, a thorough analysis of the (dis)similarities among the generated outputs was conducted. By employing equations and mathematical models, the degree of diversity within the batch was precisely assessed. The approach not only adheres to the conceptual understanding of diversity as stated in the literature, but also implements it through a mathematical strategy, allowing for an objective evaluation of the diversity.
[0070] Diversity assessment within the context of GANs entails a meticulous evaluation of the dissimilarity inherent in the generated design concepts, adhering to the design literature’s notion of diversity. The methodology for diversity evaluation revolves around the generation of a substantial batch of design samples, facilitating a comprehensive inter-sample comparison. A particularly effective approach to elucidate the diversity of this sample ensemble involves visual representation, providing a tangible depiction of the inherent variations among the data points.
[0071] Visualizing this diversity within a two-dimensional space offers a pragmatic solution that aligns with the cognitive mechanisms of the human perception system. Such a visualization strategy enables an enhanced discernment of the intricate relationships, patterns, and distinctions encapsulated within the dataset. Among the various techniques available, Principal Component Analysis (PC A) may be the method of choice for projecting highdimensional data points onto a two-dimensional plane. This preference stems from PCA’s capability to retain crucial information and structural attributes that might otherwise be compromised in the transformation process. To facilitate the visualization of diversity, a pivotal step involves the transformation of design concepts from their raw image format into a more structured feature format. This conversion ensures the preservation of the most pertinent information that governs the diversity inherent within the concepts generated. To this end, the adoption of a neural network becomes imperative, given its capacity to discern intricate patterns and features within complex image data. The VGG16 neural network is employed to extract these features.
[0072] VGG16 model was initially proposed by for image classification and object detection, which that gained 92.7% accuracy on the ImageNet dataset. As a state-of-the-art convolutional neural network (CNN) model, VGG16 is a very powerful model for feature extraction and image coding. Therefore, VGG16 is used to embed the dataset before feeding
it to PCA. VGG16 is a 16-layer deep neural network model that contains stacked convolutional layers using the smallest possible receptive field of 3/3 that can have a sense of up/down, left/right, and center notions. An optional linear transformation layer of the input channel can be added to the top of the network in the form of a 1 >< 1 convolution filter. Among the 13 convolutional layers, 5 are followed by max-pooling layers to implement spatial pooling with a pooling window of size 2x2 and a stride of size 2. The convolution stride is set to 1, but the padding is specified according to the receptive field to preserve the spatial resolution. The convolutional layers are then followed by three fully connected layers, with the first two layers containing 4096 each, and the last one depending on the number of classes. The topmost output layer is a softmax layer. Layers do not usually contain normalization to avoid high memory consumption and time complexity, as well as to preserve model performance.
[0073] PCA is a multivariate statistical technique utilized in this paper to reduce the dimensionality of high-dimensional data from the intercorrelated feature space. As the dataset on which PCA was used was a high-dimensional set of dependent features extracted from an image set, it is convenient to use this method to assess the diversity of generated samples. In embodiments of this disclosure, PCA may be applied to analyze and interpret complex data by disentangling the most representative features. This task may be carried out by computing values of the data table corresponding to a new set of orthogonal variables; thus, PCA can geometrically be viewed as the projection of the data samples onto the principal components’ space. These variables, which are called principal components, are acquired as a linear combination of the original variables. The first principal component may be calculated so that it has the largest possible variance. The first weight vector, based on which the first principal component may be calculated, satisfies the following expression: w1 = arg max x,(x„wr = arg 1 ||w|| = l 1 1
is a row vector of the original data table A, and w is a coefficient vector set to be a unit vector. Equivalently, in a closed format, the mapping weight vector of the first component w1 can be calculated using the second part of the equation, where w is the matrix eigenvector that results in the largest corresponding eigenvalue. The A111 component is obtained under the constraint of being orthogonal to the k~l previous components and having the kth largest possible variance. Thus, first k~l components are subtracted from previous components from X and then used as the original matrix in the following equation:
\ 2 wTXTXw wk — arg max ( ,w) = arg max — = — , where Xk = X — j=i XwjW . The number of principal components calculated depends on the data structure and how much dimension reduction is required. For diversity evaluation, since the areas of the design space that are explored by the original and generated datasets need to be compared, a two-dimensional representation of the samples may be the most informative for visual analysis.
[0074] For novelty evaluation, a natural and convenient approach may be to assess the similarity of a design instance to existing concepts either by human judges through developing mental connections between various knowledge sets to score dissimilarities or using predefined rules based on design attributes. This may also the fundamental approach taken by some of the existing novelty assessment work based on the FBS (Function- Behavior- Structure) and SAPPhIRE (State-Action-Part-Phenomenon-Input-oRgan-Effect) models, which assess novelty through comparison with previous design. The qualitative assessment of both diversity and novelty, despite being more accurate, is hard to explain and depends on the rater’s mental models. On the other hand, algorithmic assessment suffers from a lack of sensitivity and generalizability, as it is relatively more repeatable and objective.
[0075] The definition of novelty in the design literature entails various methods to assess novelty, including the “a priori” and “a posteriori” approaches. The former requires identifying a reference solution or a set of solutions to determine the novelty of the examined ideas, whereas the latter calculates novelty based on a specific framework with respect to existing systems. Leveraging this conceptual understanding, the GAN novelty assessment approach may be meticulously aligned with the mathematical approach suggested in the literature. Drawing inspiration from the “a priori” approach, each GAN-generated design concept may be thoroughly compared with an extensive repository of previous solutions pertaining to the same design problem. This comparison may be facilitated by employing a set of rigorous mathematical models that enable ascertaining the novelty value of each design solution based on its unexpectedness within the generated design space, providing a reliable and comprehensive assessment of its novelty in relation to existing design solutions.
[0076] Novelty is a multifaceted concept that encompasses various dimensions that require careful evaluation. When assessing the novelty of GAN-generated design concepts, it is essential to consider different aspects to gain comprehensive insights into their uniqueness and distinctiveness. To gain a comprehensive understanding of novelty in GAN-based
concept generation, analyze various complementary aspects of novelty are analyzed as follows. These models have the capacity to identify similarities between images at various levels, ranging from low-level (z.e., pixel-based) detection to high-level (feature-based) detection.
[0077] Given that GANs sample from a learned distribution, it is common to encounter duplicate points within the generated samples. This aspect involves detecting instances where GANs generate exact replicas of existing design samples. The Root Mean Square Error (RMSE) may be applied as the most common difference metric to evaluate the error between a predicted sample and its corresponding observed sample by calculating the quadratic mean of the said differences. To calculate the similarity between the GAN- generated images and the original image set, RMSE may be used to score the extent of similarity between two images in a pixel-wise manner using the following formula:
where lorg and lgen denote samples from the original and generated datasheets, respectively; and each z and j represents the pixel associated with the ith row and the jth column of the images in which there are AT rows and N columns in total. This method checks for absolute repetitions.
[0078] GANs may produce noisy images due to the inherent stochasticity of the generative process. Assessing noise tolerance may be crucial to ensure that novelty evaluation is not compromised by noisy instances and capture unique design solutions even in the presence of noise. The Peak Signal-to-Noise Ratio (PSNR) may be employed. PSNR may is defined as the ratio of the maximum possible power of a signal to the power of applied distortion. This definition can be translated to the ratio of the maximum pixel value to the error between the corresponding pixels in the generated and original images. PSNR is usually reported in logarithmic scale to characterize the high ratio stemming from high dynamic ranges of pixel values. PSNR is denoted by the following expression:
where L2 denotes the maximum potential value of the mentioned pixel and c denotes the color channel in an RGB image. PSNR may be applied due to its high noise-tolerance and also due to its sensitivity to brightness/color alternations.
[0079] Generated samples often undergo alterations in brightness, color, or other visual aspects. To capture the full spectrum of theme mutations, assessing sensitivity to alterations enables the recognition of even subtle changes. To evaluate all types of similarities, more sophisticated similarity detection methods are considered that are capable of identifying higher level features such as edges and corners in an image. The Structural Similarity Index Measure (SSIM) is an image similarity measure, which models and compares the structural contents between images. Instead of bottom-up error-sensitivity simulation approaches, this model operates similarly to the human visual system by modeling perceived changes in structural information as the combination of three components, namely structure comparison, contrast comparison, and luminance comparison. SSIM is defined as follows:
where [ix, ox, and ax y represent the mean of x, the standard deviation of x, and the covariance matrix between x and , respectively, and C is a constant. SSIM is leveraged as an interdependency aware model to detect contextual dissimilarities.
[0080] Spatial shifts and minor changes in the generated samples can easily be overlooked by some assessment methods. Therefore, considering contextual dissimilarities helps to detect variations in the overall structure of designs, avoiding unstructured differences being considered novel. Signal to Reconstruction Error (SRE) is applied to compute the similarity of the GAN-generated RGB images and the original RGB images with respect to the signal power of the original image, as follows:
where [iorg indicates the average intensity of the image values. This model may be used to implement the error standardization aspect.
[0081] For shading effects repression, the evaluation of novelty should include the ability to convert errors to a comparable scale, allowing for a more standardized and objective comparison between design concepts. The Spectral Angle Mapper (SAM), a spectrum-based pixel-wise metric, may be used to calculate the spectral similarity of the GAN-generated and original 2D images. Considering spectra as vectors of an N-dimensional spectral space, where N is the number of bands, SAM maps the image spectra to the reference spectra by calculating the N-dimensional angle between them using the following equation:
a = arccos
where 7} and Rt represent the ith test spectrum and reference spectrum, respectively. The calculated angles between each pixel vector and the end-member spectrum vector indicate the degree of 547 similarity to the reference spectrum. As the presence of a dot product suggests, the smaller the angle, the more similar the spectrums will be. In this study, the reference spectrums are extracted from the original images and the test spectra are based on the generated samples. SAM may be used due to its capability to mitigate the impact of shading effects, thus emphasizing the desired reflectance characteristics of the target.
[0082] To gain more nuanced insights into design concepts, detailed information extraction plays a vital role. This aspect allows for a comprehensive analysis of design objects, enabling a deeper understanding of their unique characteristics and attributes. Furthermore, novelty in design concepts may not be solely determined by their overall appearance, but also by the arrangement and localization of individual design elements. Evaluating the flexibility of object localization ensures that GANs can identify occurrences of design templates regardless of orientation and local brightness. Hence, the template matching algorithm may be applied to search for similar areas of a template image (original images) in a source image (generated images), called a training image. Template matching utilizes a sliding-window approach to compare different areas of the template with the source. The comparison method depends on the content of the images and the goal.
[0083] The most frequently used similarity scoring methods for this technique include squared difference, cross-correlation, and cosine coefficient, as well as their normalized versions, which usually provide more accurate results. After testing the normalized versions of the three methods, the normalized cross-correlation may be selected as the best, as it yields slightly more similar matching results. The matching process creates a two-dimensional result matrix R with similarity scores associated with each area of the image, and searches for the highest/lowest value depending on the comparison method. Template matching can be used to identify the most similar part or determine the location of that part. In this paper, however, template matching may be applied to find the generated image that is most similar to the source image from a set of real images. This method is simple to implement and computationally efficient. An exemplary procedure to match one source image and one template image is presented in Fig. 10.
[0084] The steps proposed herein may necessitate extensive preprocessing steps, such as background removal, for optimal performance. A combination of these measures was employed as described herein to ensure a comprehensive evaluation that captures various dimensions of novelty.
[0085] An exemplary architecture designed in accordance with one or more embodiments of the present disclosure may replace and/or supplement the discriminating network with an evaluation panel of multiple benchmarks. The benchmarks are not necessarily global and domain-agnostic; but instead, they can be chosen with respect to domain-specific requirements. The evaluation panel may comprise a degree of realism. In generating design concepts, metrics such as diversity, novelty, desirability, and compatibility with geometric constraints (e.g., silhouettes) are often of the utmost importance. This feature can be integrated into the GAN structure using a large set of hand-evaluated concepts or using arithmetic methods to analyze the generated samples. To avoid the necessity of annotating a large dataset for each product category and benefit from user feedback on new design concepts, these terms may be mathematically modeled and integrated into the GAN architecture.
[0086] FIG. 1 shows a schematic illustration of an exemplary system 100 configured for design concept generation and/or any of the GAN models described herein. System 100 may be modeled after a traditional GAN model and/or any of the GAN models described herein. By way of non-limiting example, discriminator 108 is modeled after and trained in the same way as a discriminator of a traditional GAN and/or any of the GAN models described herein. By way of non-limiting example, generator 104 is modeled after a generator of a traditional GAN. System 100 may address the limitations of the traditional GAN architecture for design concept generation. System 100 may be particularly configured to meet the needs of visual design recommendation. Architecture 100 may incorporate design-related specifications and limitations into the evaluation network. Architecture 100 may be configured to consider diversity, novelty, and desirability as essential requirements in visual design, along with the need for compatibility with a given silhouette as a constraint.
[0087] System 100 may comprise panel of inspectors 100. An input vector 102 may be provided to generator 104 as input. Input vector(s) 102 may comprise one or more latent vectors from a latent space of generator 104. In some implementations, the number of vectors of input vector(s) 102 is equal to the batch size. The batch size may indicate a number of sample(s) 126 to be generated by generator 104. In some implementations, a pool of random codes is generated. The most diverse random codes may be selected from the
pool. The number of random codes selected from the pool may correspond in quantity to the batch size.
[0088] Executing the selection from the pool may comprise adopting a stratified sampling method. The latent space of generator 104 may be partitioned. For example, the latent space is partitioned into distinct hypercubes. Stratified sampling may ensure that each subgroup contributes proportionately to the overall diversity. The partitioned space may facilitate the allocation of each input vector 102 to a unique subgroup, from which an equal number of points are drawn. Generator 104 may explore uncharted regions of the latent space by virtue of input vector(s) 102 being diverse.
[0089] Generator 104 may be configured to generate one or more samples 126 and/or one or more other samples based on input vector 102. Input vector 102 may be an n-dimensional vector. Sample(s) 126, one or more examples 124, and/or other information may be provided to panel of inspectors 106. Example(s) 124 may comprise one or more real examples from the training dataset. For example, example(s) 124 comprises one or more real images. Panel of inspectors 106 may comprise one or more of discriminator 108, diversity inspector 110, novelty inspector 112, desirability inspector 114, constraint inspector 116, and/or another inspector. Each inspector may be configured to generate a respective vector characterizing a respective attribute of one or more samples 126. Each inspector may be designed to assess generated samples against a specific criterion. Providing input to panel of inspectors 106 may comprise providing the input to each inspector of panel of inspectors 106.
[0090] Each inspector may have associated weights and an associated loss function. The associated weights for each inspector may be adjusted according to the associated loss function of that inspector. Generator 104 may be trained on the basis of feedback from each inspector of panel of inspectors 100. For example, the feedback from inspectors may comprise the vectors generated by the inspectors. Cost function 118 may be configured to compute an aggregation of the feedback from the inspectors. The result of the computation may comprise generator loss 122. Generator 104 may be trained using generator loss 122. Discriminator loss 120 may be the same as generator loss 122, another output of cost function 118, the output of discriminator 108, and/or another value. Discriminator 108 may be trained based on discriminator loss 120. Each inspector is optimized to score the generator’s output with respect to a single criterion for which it was designed. The generator may be optimized to satisfy all criteria scored by panel of inspectors 106.
[0091] Discriminator 108 may be configured to generate a vector characterizing realism of the at least one sample relative to the at least one example. Diversity inspector 110 may be
configured to receive sample 126 as input. Diversity inspector 110 may be configured to gauge the diversity of the batch. Diversity inspector 110 may be configured to generate a diversity vector. The diversity vector may characterize diversity of sample(s) 126. The diversity of the batch may be characterized by a diversity score. Diversity inspector 110 may be configured to guide the generator towards more varied sample generation. The score may apply to each of one or more samples 126. As such, the diversity score may characterize the diversity among one or more samples 126. Gauging the diversity of the batch may comprise determining the diversity score. Determining the diversity score may comprise using the Covering Radius Upper Bound (CRUB) method to generate the diversity vector. Diversity inspector 110 may be configured to determine the CRUB by computing the maximal distance between any Voronoi vertex and its nearest neighbor in the set of points. The covering radius of a generated point set Xgen = {X , ... ,XN] c S may be calculated as follows:
[0092] Diversity inspector 110 may be configured to determine the upper bound of CR The upper bound of CR may be calculated as follows:
= 1, ... , }. Here, St represents the ith strata where U^= i S; = S and xt E St represents the sample point within those strata. In some implementations, the upper bound of CR is the diversity score. CRUB plays a pivotal role in global optimization by bounding the worst-case error approximation of the global optimum. A distinctive attribute of CRUB lies in its ability to maximize the shortest distance between all samples within the batch, rather than relying on the average distance that encompasses all pair- wise distances. Diversity inspector 110 may encourage the dispersion of all samples away from one another, thus contributing to greater diversity among the generated outcomes.
[0093] Novelty inspector 112 may be configured to receive sample(s) 126 as input. Novelty inspector 112 may be configured to configured to generate a novelty vector characterizing novelty of sample(s) 126 relative to examples 124. The novelty vector may be generated using Local Outlier Factor (LOF). LOF serves as a density -based anomaly detection technique, designed to assess the distinctiveness of a generated sample in relation to the input dataset. This method identifies potential anomalies by evaluating the local deviation of a given data point concerning its neighboring points. LOF operates by comparing the local density of a data point with the densities of its k-nearest neighbors. If the density of the point is considerably lower than that of its neighbors, it is deemed an outlier.
[0094] In the LOF algorithm, the core idea revolves around measuring the typical distance at which a point can be reached from its neighbors, known as the reachability distance. This measurement involves calculating the reachability distance between two objects, ensuring that it does not fall below the k distance of the second object, as defined by:
RD(Xgen,Xj) = max{distance ( gen,Xf),k — distance (JQ).
[0095] The local reachability distance (LRD) of a point is then defined as the inverse of the average reachability distance from its neighbors, calculated using:
where Nk represents the set of k neighbors of the generated sample Xgen. Having the LRD defined, the LOF score is formulated as the ration of the average LRD of neighboring points to the LRD of the generated sample itself, expressed as:
LOF values below 1 generally indicate inliers, representing data points within denser regions, while values significantly greater than 1 signal outliers, indicating points that are distinct from their neighbors. The utilization of LOF as part of the evaluation framework provides a mechanism to effectively capture and quantify the novelty of generated design concepts. By way of non-limiting example, novelty inspector 112 is configured to determine the LRD for each sample of sample(s) 126. By way of non-limiting example, novelty inspector 112 is configured to determine the LRD for each of a subset of sample(s) 126. The novelty vector may comprise each LRD determined by novelty inspector 112.
[0096] Desirability inspector 114 may be configured to receive sample(s) 126 as input. Desirability inspector 114 may be configured to generate a desirability vector characterizing predicted desirability of the at least one sample. Desirability inspector 114 may comprise a Deep Multimodal Design Evaluation (DMDE) model. DMDE may be configured to assess user satisfaction with generated samples, this inclusion strengthens the DCG-GAN’s ability to ensure desirability in the produced visual designs. Generating the desirability vector may comprise providing sample(s) 126 as input to the DMDE model. The desirability vector may comprise the output of the DMDE model.
[0097] Deep multimodal design evaluation (DMDE) is an Al-driven model configured to provide an estimate of the desirability of a design concept without having to release the product and aggregate market results. DMDE is capable of performing design evaluation at the general level, attribute level, or both in any field that requires the provision of images,
textual descriptions, and user reviews. The training workflow on this platform consists of four main parts: attribute-level sentiment analysis, image features extraction, description features extraction, and multimodal predictor.
[0098] First, attribute-level sentiment intensities are extracted from online reviews, which serve as ground truth for training. Subsequently, visual and textual features are simultaneously extracted using a finetuned ResNet50 model and a fine-tuned BERT language model, respectively. Finally, the extracted features are processed by a multimodal deep regression model to predict desirability.
[0099] The DMDE model is one example of an Al-driven tool that can help guide generator 104 towards creating samples that are user-centered and desirable. Designers or mathematical methods can use DMDE to predict the performance of a new design concept from the perspective of end users simply by feeding the renderings and descriptions to the model. This platform eases the process of evaluating design concepts, which is one of the most challenging tasks in the design of competitive design concepts.
[00100] Constraint inspector 116 may be configured to receive sample(s) 126 as input. Constraint inspector 116 may be configured to generate a shape vector characterizing adherence of a silhouette for each sample of the at least one sample with a silhouette for each example of the at least one example using a Structural Similarity Index Measure (SSIM). Each silhouette may comprise a representation of the outer shape of one or more objects depicted in its associated sample or example. For example, each silhouette is a two- dimensional black image that illustrates the body shape of a group of products, leaving out the details.
[00101] Product designers are often tasked with generating concepts that maintain visual consistency with their respective product lines, adhering to a defined silhouette. Thus, the apparent geometry of the concepts must be preserved with respect to a contour provided in the design concept generation process. Sample(s) 126 should bear a noticeable similarity to one of the predefined silhouettes. For example, the silhouette of an individual sample 126 meant to resemble a shoe should have the general shape of a shoe. This objective may not be attainable using traditional GAN architecture, since (1) GANs generate samples based on a noise vector from the latent space allowing no control over the features of the final concept, and (2) we aim to enlarge the input dataset for diversity enhancement, resulting in hundreds of silhouettes existing in the dataset. Consequently, a generated concept is likely to be of a non-target contour or a combination of them.
[00102] Preserving apparent geometry with respect to a provided contour is imperative to maintain the outer shape consistency of products within a designated product line. Sillhouettes may serve as reference contours to preserve constraints. Within each iteration, for each sample 126, generating the shape vector may comprise extracting the image’s contours. Generating the shape vector may comprise comparing these contours to the corresponding contours of a designated set of silhouettes. The corresponding contours of the designated set may be generated based on examples 124. This comparison may yield a set of similarity scores, each indicating how closely the generated image’s contours match those of the silhouettes. In some implementations, the shape vector comprises the set of similarity scores. In some implementations, the shape vector comprises the highest similarity score. By way of non-limiting example, the highest similarity score serves as a geometrical constraint score in the broader loss function. In such an example, only the highest similarity score may be considered in the broader loss function to characterize the adherence of sample(s) 126.
[00103] In each training iteration and for every generated image in the batch, the contours of the generated images are extracted. These extracted contours are then compared with the contours derived from the set of benchmark silhouettes. The comparison process may comprise using the SSIM metric. The SSIM metric may be well suited for generating the shape vector because of its ability to quantify the compatibility of structural changes between images. SSIM’s emphasis on the overall image structure is in harmony with the aim of extracting detail-free body shapes from the generated concepts. Moreover, its incorporation of perceptual phenomena, such as luminance masking and contrast masking, enhances its capability to precisely identify structural disparities between images. SSIM’s focus on interdependence among neighboring pixels effectively captures vital information about object structure, particularly in terms of luminance masking’s influence on dissimilarities in bright regions and contrast masking’s ability to detect differences in textured or active areas.
[00104] The inspectors of panel of inspectors 106 may be configured to yield vectors of uniform size, aligned with the batch size (z.e., Bl). Each row within these vectors may correspond to a distinct sample 126 within the batch. Cost function 118 may be configured to normalize the outputs of panel of inspectors 106 using min-max normalization techniques. Cost function 118 may be configured to compute the weighted summation of the normalized vectors, preserving the order of samples (or rows). Generator loss 122 may be the weighted summation. Generator loss 122 may be backpropagated through the generative network to adjust parameters of generator 104. As stochastic algorithms such as LOF and SSIM are
employed, except for the discriminator, the other inspectors do not necessitate updates via specific loss functions during the training process. The augmented loss function is formulated as follows:
where 5 represents the silhouette. The associated loss functions are defined as follows:
[00105] In these equations max(557A ), max(L0F), max(CRUB), and max(DMDE) represent the optimal or maximum values achievable by each function. Overall, the frameworks described herein serve as bespoke platforms for automated design concept recommendation, ensuring the quality of generated samples in terms of realism, outer shape geometry, novelty, diversity, and desirability.
[00106] Referring now to Fig. 8, a flowchart illustrating an exemplary method 800 for adaptive node removal during training of an artificial neural network is depicted. The operations of method 800 presented below are intended to be illustrative. In some implementations, method 800 is accomplished with one or more additional operations not described and/or without one or more of the operations discussed. The operations of method 800 may be performed in another order. Additionally, the order in which the operations of method 800 are illustrated in Fig. 8 and described below is not intended to be limiting.
[00107] In some implementations, method 800 is implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 800. In some implementations, the
operations of method 800 are performed with regard to layers of an artificial neural network rather than individual nodes.
[00108] Operation 802 may comprise providing an input vector of a latent space of the generative model to a generator of a generative model.
[00109] Operation 804 may comprise reading at least one sample generated by the generator based on the input vector. The at least one sample may comprise an image.
[00110] Operation 806 may comprise providing the at least one sample and at least one corresponding example to a plurality of inspectors of the generative model. Each of the plurality of inspectors may be configured to generate a respective vector characterizing a respective attribute of the at least one sample. Each respective attribute may be selected from a set comprising realism, a shape, novelty, diversity, and desirability. Each inspector may be associated with a corresponding loss function. Each inspector may be configured to compute its corresponding loss function based on its input to generate a loss value. By way of nonlimiting example, the generated vectors may be the loss values. Each example may comprise an image. The vectors of the plurality of vectors may have a uniform size. Each row of an individual vector of the plurality of vectors may correspond to a sample of the at least one sample.
[00111] Operation 808 may comprise reading the vectors generated by the plurality of inspectors. The plurality of inspectors may comprise one or more of a discriminator, a diversity inspector, a novelty inspector, a desirability inspector, a constraint inspector, and/or another inspector.
[00112] Operation 810 may comprise determining a generator loss value characterizing performance of the generator based on the vectors generated by the plurality of inspectors. In some implementations, determining the generator loss may comprise normalizing each vector of the plurality of vectors. Determining the generator loss value may comprise computing a result of a generator loss function. The generator loss function may comprise the loss functions associated with the inspectors. In some implementations, the vectors are normalized using min-max normalization. By way of non-limiting example, the vectors are normalized prior to determining the generator loss
[00113] Operation 812 may comprise training the generator and/or the discriminator. The generator may be trained based on the generator loss value. Training the generator may comprise performing backpropagation. The discriminator may be trained based on the vector generated by the discriminator and/or another loss value. Training the discriminator may comprise performing backpropagation.
[00114] A case study was conducted on a large-scale dataset extracted from multiple online footwear stores to test the results of system 100 depicted in Fig. 1 and described herein. Subsequently, we present and analyze the results in depth to benchmark DCG-GAN against StyleGAN2 in terms of diversity and novelty.
[00115] To evaluate and validate the performance of DCG-GAN versus StyleGAN2 in generating novel and diverse concepts, a large-scale dataset comprising 6745 images was scraped from a major online footwear store. The images included were side views of shoes from several brands of footwear such as Adidas, ASICS, Converse, Crocs, Champion, FILA, PUMA, Lactose, New Balance, Nike, and Reebok to avoid mode collapse and increase the diversity of the dataset. The neural network models were trained using the Pytorch implementation of StyleGAN2. The training process employed four Tesla V100-SXM2 GPUs, PyTorch version 1.8, and Python version 3.7. The configuration settings remained consistent throughout, including a latent code represented by both z and w of dimension 512, the use of 8 fully connected layers in the mapping network, activation functions employing leaky ReLU with a slope parameter a = 0.2, bilinear filtering in all up/down sampling layers, equalized learning rates for all trainable parameters, incorporation of a minibatch standard deviation layer at the conclusion of the discriminator, implementation of an exponential moving average for generator weights, utilization of a non-saturating logistic loss function with R1 regularization, and the Adam optimizer with specific hyperparameters: P 1 = 0.5, (32 = 0.9, a = 10-8, and a minibatch size of 64.
[00116] The StyleGAN2-generated samples presented in Fig. 2 bear a strong resemblance to established shoe models and brands. Specifically, the five depicted concepts closely mirror existing Adidas, Basics, Nike, Reebok, and Adidas shoe designs available in the market. This outcome aligns with the inherent limitation of GAN generators, which tend to replicate patterns and characteristics present in the training dataset.
[00117] Fig. 3 illustrates several design concepts generated by DCG-GAN that demonstrate various visually discernible features. The features comprise novelty, higher diversity, aesthetic appeal, minimal brand-specific features, high visual quality, and compatibility with geometric constraints. With respect to novelty, the DCG-GAN architecture excels in generating design concepts imbued with abundant novel attributes. This stark contrast becomes more pronounced compared to the concepts generated from the baseline. Although the latter typically mirror existing products from the training set, lacking distinctive and unique features, the DCG-GAN consistently introduces innovative elements. With respect to higher diversity, the efficacy of DCG-GAN in fostering diversity is evident through the wide-
ranging spectrum of generated designs, which span different styles, patterns, and structural arrangements. With respect to aesthetic appeal, the generated design concepts boast captivating forms and harmonious color palettes. The model’s ability to produce visually appealing outcomes is indicative of its capacity to capture intricate design aesthetics.
[00118] With respect to having minimal brand-specific features, a notable observation is the limited inclusion of logos within the generated sneaker concepts. The model leans toward designs that prioritize other design elements over logo placement. With respect to high visual quality, despite reducing the discriminator’s weight to 20% in the loss function (with all inspectors sharing equal weights), the samples generated in DCG-GAN demonstrate remarkable quality. This outcome is particularly surprising, as a diminished discriminator weight might suggest a compromise in realism. However, the generated designs retain high quality characteristics. In fact, the DCG-GAN’ s generated samples exhibit a pronounced reduction in unrealistic appearances (e.g., lacking recognizable attributes characteristic of sneakers) compared to the baseline model. With respect to compatibility with geometrical constraints, the generated design concepts align with a predetermined set of silhouettes, serving as a benchmark for geometrical constraints.
[00119] To assess diversity of generated samples, two sets of 1500 design concept images using DCG-GAN and the baseline model were generated. To ensure diversity and avoid redundancy, a random seed was introduced as a numerical element by creating two separate sets of 1500 random variables, each of which was individually injected into the model. To evaluate the diversity of the generated samples, the process was initiated by extracting essential visual attributes from the images using VGG16. This process produced a vector of 1000 features for each image. Given the complexity of interpreting a 1000-dimensional space, PCA was applied to these vectors, reducing their dimensionality to a 2D space. This transformation enabled a more accessible comparison of the solution space coverage achieved by each model. For the feature extraction step, VGG16 model was utilized due to its capability to embed images while excluding the top output layer. To ensure that the model could better identify design-specific features rather than general ones from broader datasets like ImageNet, it was initially trained on a combined dataset, consisting of both the original and generated images. The VGG16 model was trained on RGB images with dimensions of 224x224 and included 3 fully connected layers at the top of the network, without pooling layers, and employed a softmax activation function.
[00120] Fig. 4A depicts the 2D representation of the mapped data points, where the red points represent the original data set, and the green points represent the generated data set.
This visualization illustrates that the StyleGAN2 model (and other traditional GAN architectures) did not explore the entirety of the original data space, resulting in designs that are constrained to a specific and incomplete range of styles. Notably, the green dots cover only a subset of the space occupied by the red dots, highlighting a limitation in GANs’ ability to learn the complete distribution of the dataset. Additionally, the scatter plot reveals the model’s inadequacy, especially in regions where there are fewer original samples, indicating that the model’s capacity to learn a subspace is reliant on the presence of an adequate number of data samples from that specific subspace. These findings provide partial validation of expected results concerning the limited diversity of GAN-generated design samples.
[00121] Fig. 4B depicts the visualization results. The visualization results indicate that the images generated by DCG-GAN possess the capacity to traverse uncharted areas within the solution space. Remarkably, this exploration is not only superior to the baseline model’s performance, but also extends beyond the confines of the original dataset. The DCG-GAN model effectively expands the boundaries of the solution space that the original dataset occupies from several directions. Notably, an intriguing observation demonstrates that unlike the baseline model, DCG-GAN adeptly learns and captures areas within the solution space that lacked adequate representation within the original dataset.
[00122] The experiments and the results of the novelty assessment for DCGT-GAN compared to the baseline model was performed on the same sets of 1500 generated images as described with respect to the diversity assessment.
[00123] Comparison experiments were performed on all data points using the RMSE, PSNR, SSIM, SRE, and SAM methods. In each experiment, the similarity metric was computed for each generated sample compared to all original samples. As a result, design concepts that were most similar to those of all generated samples were found. The samples identified by different methods were not the same most of the time because they considered similarities of different patterns and structures. Among the methods described herein, except for the RMSE metric that computes the Euclidean error, higher values indicate more similarity between the images. Squaring errors and then averaging over them results in an undesirable outcome if there are outlier data. Outlier data is likely due to the wide range of pixel differences. To compensate for this concern, instead of averaging the square errors, the mean absolute errors were calculated, so that the highly different pairs of pixels do not have a supplemental effect on the similarity metric compared to the average data points.
[00124] There are several formats for colored image analysis, among which a chromatic image format that also includes luma information was used, resulting in extracting
similarities with more compatibility to the human perception system. Subsequently, the images were first converted to the YCbCr color space, where Y represents the weighted averages of the R, B, and G channels with the highest weight assigned to G because it is more easily recognized by human eyes. The SSIM metric is always between -1 and 1 with 1 meaning 100% similarity, 0 meaning no similarity, and -1 meaning 100% dissimilarity. SSIM can leverage sliding Gaussian windows of size 11 x 11 and move it pixel -by-pixel to create the similarity map of the test image. However, to minimize computational complexity, one window was applied to the whole image.
[00125] Table 2 provides statistical information on the similarity of the original and the GAN-generated samples calculated by the five different diversity measures, as follows:
RMSE. The comparison results show the lowest RMSE of 0.0023 and the highest RMSE of 0.0173, with a mean RMSE of 0.008594, a standard deviation (SD) of 0.0031, and a median RMSE of 0.0083. 846 A good baseline range for RMSE is (0.5, 1). Any values lower than this range suggest a very accurate model, or a very similar pair of samples, indicating the high similarity of low-level pixel -wise features between the GAN-generated and the original samples.
PSNR. The results also show the highest PSNR of 30.9093 and the lowest PSNR of 28.0060, with a mean PSNR of 29.1585, a standard deviation of 0.6401, and a median PSNR of 29.0317. Generally, a PSNR value of 30 dB or higher is considered a good baseline for PSNR. However, the exact value may vary depending on the application and the quality of the original image. Consequently, the PSNR results also suggest a relatively high similarity between the original and GAN-generated samples.
SSIM. The results of SSIM assessment show the highest SSIM of 0.9968 and the lowest SSIM of 0.8177, with a mean SSIM of 0.9470, a standard deviation of 0.0374, and a median SSIM of 0.9575. The acceptable range for SSIM is generally considered to be between 0.8 and 1.0, with higher values indicating greater similarity between the two images. Given this baseline for SSIM, the results strongly designate a high structural similarity between the two datasets.
SRE. For SRE, the results show the highest SRE of 0.6185 and the lowest SRE of 0.4989, with a mean SRE of 0.5530, a standard deviation of 0.0256, and a median SRE of 0.5541. The acceptable SRE range is generally between 0.5 and 1.5. Generally, a score of 1.0 is considered optimal. The results indicate a high degree of similarity between the original and GAN-generated samples.
SAM. It is also shown that the model had the highest SAM of 89.9505 and the lowest SAM of 0.8127, with a mean SAM of 0.8962, a standard deviation of 0.0084, and a median SAM of 0.8986. The range of values for SAM is generally considered to be between 0 and 1, with values closer to 0 indicating a better match and vice versa. Values between 0.85 and 0.95 are generally considered excellent baselines for SAM, with which the results are aligned. [00126] Template Matching. To quantify the novelty of GAN-generated sets, we employed a novelty evaluation algorithm, incorporating various similarity detection methods, including Template Matching. The process involved identifying the most similar design instance from the original dataset for each generated sample within each generated set. Subsequently, these results were consolidated into a similarity distribution function, enabling a statistical assessment of novelty in the generated outputs. The template matching method operates by searching a source image to identify areas that closely resemble a provided template image. Confidence scores for different areas within the template image are computed and compared using a sliding window approach. The algorithm gauges the similarity between the source image and the template image by considering the confidence within the rectangle whose dimensions match those of the source image and whose top-left corner corresponds to a specific point. To ensure that template matching aligns with the objectives set out herein, we utilized source (ie., created) and template (z.e., original) images of identical dimensions, ensuring equal weighting for all sections of the images in the confidence score calculation. The confidence score in this context operates within a numerical range of 0 to 1, where a score of 0 indicates an absence of similarity, while a score of 1 signifies complete identity.
Table 2. Statistical results of running the algebraic methods on all generated samples.
Method Statistical Results Similarity in Max Mean SD Median
RMSE 0.0023 0.0173 0.008594 0.0031 0.0083 High
PSNR 28.0060 30.9093 29.1585 0.6401 29.0317 High
SSIM 0.8177 0.9968 0.9470 0.0374 0.9575 High
SRE 0.4989 0.6185 0.5530 0.0256 0.5541 High
SAM 0.8127 0.8995 0.8962 0.0084 0.8986 High
[00127] Fig. 5A presents a distribution graph accompanied by a semi-Gaussian function, which has been fitted with a mean of 0.8385 and a variance of 0.0075. This distribution
suggests that the majority of the design samples generated by StyleGAN2 bear a very close resemblance to those found in the original dataset. This outcome aligns with the central hypothesis, affirming that GANs have limitations in generating novel design concepts. However, the presence of samples with relatively low confidence scores, which can be attributed to two factors: (1) Some generated images exhibit an unrealistic nature that they cannot be readily recognized as sneakers. (2) Template matching tends to differentiate between sneakers with the same pattern but different colors, resulting in lower similarity scores. Fig. 6A illustrates the most similar pair from the original dataset for a generated image, and it’s visually apparent that the generated image closely resembles an item already present in the original dataset, devoid of any discernibly novel attributes. To comprehensively assess the novelty of traditional GAN models, we additionally computed the distribution function for the confidence scores pertaining to the generated samples.
[00128] Fig. 5B illustrates, for DCG-GAN, a distribution function and the correlating semiGaussian function matching confidence scores based on the generated-real comparisons. The evaluation results reveal DCG-GAN’ s proficiency in generating design concepts with increased novelty. Fig. 6B illustrates the most similar pair from the original dataset for an image generated by DCG-GAN. The dissimilarity between a sample generated by the DCG- GAN model and its closest counterpart from the original dataset is prominently evident, emphasizing DCG-GAN’ s ability to generate design concepts that deviate significantly from existing instances. This contrast in similarity is supported by the confidence scores, with the DCG-GAN pair exhibiting a significantly lower similarity score (0.79) compared to the baseline pair (0.84), reinforcing the enhanced novelty of DCG-GAN-generated concepts. The distribution of confidence scores depicted in Fig. 2 further underscores these findings. Notably, this distribution shows a considerable leftward shift of approximately 10% compared to the baseline, indicative of a substantial 10% enhancement in novelty for median, most novel, and least novel samples. The mean and variance of the DCG-GAN distribution stand at 0.7680 and 0.0116, respectively. This mean value substantiates an average improvement of 7% in terms of novelty, while the reduced variance indicates a higher prevalence of generated concepts by DCG-GAN that possess novel features.
[00129] In summary, the above findings indicate that there is a lack of novelty in the GAN- generated samples according to different perspectives toward similarity evaluation and different levels of feature spaces.
[00130] To comprehensively evaluate the performance of the DCG-GAN model, an additional layer of qualitative assessment was introduced through a quantitative survey. This
survey aimed to garner human insights into the novelty and diversity of the design concepts produced by the DCG-GAN model, contrasting them with those generated by the baseline model. The summative evaluation process centered on the rating of two distinct sets, each comprising 20 sneaker design concepts. Within these sets, ten designs were generated by the DCG-GAN model, while the remaining ten were generated by the baseline model.
Participants were instructed to provide ratings for each concept’s novelty and diversity using a scale ranging from 0 to 10, where 0 represented “no novelty/diversity,” 5 indicated a “neutral” viewpoint, and 10 signified “high novelty/diversity.” Importantly, participants were kept unaware of the generative model linked to each concept.
[00131] To ensure a diverse array of perspectives and minimize potential biases, the survey was extended to individuals from diverse professional and demographic backgrounds. The pool of participants included sneaker designers, designers specializing in other diverse applications, engineering students, and individuals with limited familiarity with design, engineering, and Al models. A total of 89 individuals actively participated in the survey. The survey comprised three main sections:
1. Demographic questions: Participants responded to inquiries about their age group, gender, highest level of education, occupation, ethnicity/race, and familiarity with generative Al models.
2. Novelty assessment: Following the presentation of both academic and simplified definitions of novelty, participants were tasked with individually rating the novelty of the 20 randomly ordered concepts, generated by the DCG-GAN and baseline models.
3. Diversity assessment: Similar to the novelty assessment, participants were provided with a definition of diversity before rating two sets of concepts. One set contained concepts generated by DCG-GAN, while the other featured concepts generated by the baseline model.
[00132] Quantitative outcomes depicted in Fig. 7A show the minimum, first quartile, median, third quartile, maximum values, as well as the average derived from the participant ratings for novelty assessment. These values represented the novelty assessment for each individual design concept. The survey results on novelty indicated an average novelty assessment of 5.5257 for DCG-GAN, marking a 15% improvement compared to the baseline average score of 4.0757. Notably, the minimum novelty score for DCG-GAN surpassed the minimum for the baseline. Additionally, the standard deviation for DCG-GAN (0.4583) was considerably reduced compared to the baseline (1.0395), which signifies enhanced
consistency and suggests that not a subset of DCG-GAN generated samples but all exhibited increased novelty.
[00133] Quantitative outcomes depicted in Fig. 7B show the minimum, first quartile, median, third quartile, maximum values, as well as the average derived from the participant ratings for diversity assessment. These values represented the diversity assessment for each individual design concept. In the diversity assessment, DCG-GAN achieved an average score of 6.4492, indicating a 7% improvement over the baseline average score of 5.7971. These survey results consistently affirm the findings from both objective and visual analyses, collectively showcasing the effectiveness of the DCG-GAN model in generating design concepts that are not only visually appealing but also novel and diverse.
[00134] There are several key limitations that GANs face in the context of design concept generation and generative design. First, GAN variants necessitate large training datasets, and when dealing with diverse design images, the generator’s ability to capture various modes may suffer, particularly when the dataset is insufficient, leading to overlooked modes with fewer data. Second, mode collapse, a common issue, arises during training, where the generator tends to produce a narrow set of samples associated with a limited distribution subset, especially problematic in high-dimensional inputs like images and videos. The generator’s lack of reward for diversification during training exacerbates this issue, causing over-optimization on a single output. Third, GANs can struggle with diversity, novelty, and desirability, as their objective encourages mimicry of input data, potentially leading to an overly emulative generator. Pushing for greater diversity and creativity may compromise sample quality and utility. Fourth, evaluating GAN performance remains challenging, with a lack of standardized methods to assess generated versus real distributions regarding different criteria. Among widely recognized evaluation metrics (e.g., image quality, stable training, and image diversity) in the literature, the focus primarily centers on image quality and training stability, leaving room for improvement in other evaluation criteria such as image diversity. In this work, the limitations inherent in traditional GAN architectures were considered, with a particular focus on StyleGAN2 chosen for quantitative validation due to its status as a state-of-the-art GAN model with photo-realistic outputs. The findings derived from this analysis are applicable to all traditional GAN models, as they share a common evaluation framework.
[00135] An extensive and systematic examination of GANs was conducted as a potential tool for design concept generation, with a specific focus on assessing creativity in terms of novelty and diversity. The approaches described herein began with the mathematical
modeling of diversity and novelty, as defined within the design literature, addressing the fourth limitation in GANs. Subsequently, these models were applied to evaluate the output of traditional GAN architectures, revealing a deficiency in creativity. To overcome this limitation, a novel and versatile multi-criteria GAN architecture as described herein was built, adaptable to various domains. Specifically, for the generation of design concepts, the architecture was extended to encompass four additional criteria alongside realism: diversity, novelty, desirability, and geometrical proportionality. To facilitate the assessment of these criteria, dedicated evaluation algorithms and recommended appropriate implementation techniques for efficient training were used. Through visual inspection and both quantitative and qualitative assessments, the model architecture described herein demonstrated a significant enhancement in terms of diversity and novelty. This outcome confirmed it as a valuable tool for design concept generation. Furthermore, the approaches described herein address the initial challenges of GANs, demonstrating its capacity to explore design spaces with limited real samples, comprehensively exploring the real dataset distribution, and producing outputs that excel across multiple criteria. Lastly, this research helps advance the transition of emerging technologies into useful tools for the designer. The architectures described herein may enable large numbers of novel and diverse concepts to be presented to the human designer as well as fast concept evaluation frameworks in terms of diversity and novelty, leveraging the speed and efficiency of computer-generated design knowledge while maintaining the critical eye and decision making of the human. This augmented approach may enable ultimate generated samples being radically changed relating to design, efficiency, and quality.
[00136] Referring now to Fig. 9, a schematic of an example of a computing node is shown. Computing node 10 is only one example of a suitable computing node and is not intended to suggest any limitation as to the scope of use or functionality of embodiments described herein. Regardless, computing node 10 is capable of being implemented and/or performing any of the functionality set forth hereinabove.
[00137] In computing node 10 there is a computer system/server 12, which is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 12 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems,
mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
[00138] Computer system/server 12 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server 12 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
[00139] As shown in Fig. 9, computer system/server 12 in computing node 10 is shown in the form of a general-purpose computing device. The components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.
[00140] Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, Peripheral Component Interconnect (PCI) bus, Peripheral Component Interconnect Express (PCIe), and Advanced Microcontroller Bus Architecture (AMBA).
[00141] Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and nonremovable media.
[00142] System memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32. Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a
magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus 18 by one or more data media interfaces. As will be further depicted and described below, memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
[00143] Program/utility 40, having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments as described herein.
[00144] Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, etc.; one or more devices that enable a user to interact with computer system/server 12; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As depicted, network adapter 20 communicates with the other components of computer system/server 12 via bus 18. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 12. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
[00145] The present disclosure may be embodied as a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
[00146] The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic
storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD- ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiberoptic cable), or electrical signals transmitted through a wire.
[00147] Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
[00148] Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user’s computer through any type of network,
including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
[00149] The systems, devices, and methods disclosed herein are described in detail by way of examples and with reference to the figures. The examples discussed herein are examples only and are provided to assist in the explanation of the apparatuses, devices, systems, and methods described herein. None of the features or components shown in the drawings or discussed below should be taken as mandatory for any specific implementation of any of these devices, systems, or methods unless specifically designated as mandatory.
[00150] Also, for any methods described, regardless of whether the method is described in conjunction with a flow diagram, it should be understood that unless otherwise specified or required by context, any explicit or implicit ordering of steps performed in the execution of a method does not imply that those steps must be performed in the order presented but instead may be performed in a different order or in parallel.
[00151] As used herein, the term “exemplary” is used in the sense of “example,” rather than “ideal.” Moreover, the terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of one or more of the referenced items.
[00152] Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
[00153] These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing
apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
[00154] The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
[00155] The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
[00156] The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims
1. A method for training models to generate images, the method comprising: providing, to a generator of a generative model, an input vector of a latent space of the generative model; reading at least one sample generated by the generator based on the input vector; providing the at least one sample and at least one corresponding example to a plurality of inspectors of the generative model, each of the plurality of inspectors being configured to generate a respective vector characterizing a respective attribute of the at least one sample; reading the vectors generated by the plurality of inspectors; determining a generator loss value characterizing performance of the generator based on the vectors generated by the plurality of inspectors; and training the generator based on the generator loss value.
2. The method of claim 1, wherein the at least one example comprises an image, wherein the at least one sample comprises a generated image.
3. The method of claim 1 or 2, wherein training the generator comprises backpropagation.
4. The method of any one of claims 1-3, wherein each of the plurality of vectors are of a uniform size.
5. The method of any one of claims 1-4, wherein each row of each of the plurality of vectors corresponds to a sample of the at least one sample.
6. The method of any one of claims 1-5, wherein the method further comprises normalizing each of the plurality of vectors using min-max normalization.
7. The method of any one of claims 1-6, wherein each inspector is associated with a loss function, wherein the method further comprises determining a loss value for each inspector based on its associated loss function.
8. The method of claim 7, wherein determining the generator loss value comprises computing a result of a generator loss function, the generator loss function comprising the loss function associated with each of the inspectors.
9. The method of any one of claims 1-8, wherein each respective attribute is selected from a set comprising realism, a shape, novelty, diversity, and desirability.
10. The method of any one of claims 1-9, wherein the plurality of inspectors comprises a discriminator configured to generate a vector characterizing realism of the at least one sample relative to the at least one example, wherein the method further comprises training the discriminator based on the loss value for the discriminator.
11. The method of any one of claims 1-10, wherein the plurality of inspectors comprises at least one of: a diversity inspector configured to generate a diversity vector characterizing diversity of the at least one sample, wherein the diversity vector is generated using the Covering Radius Upper Bound (CRUB) method, a novelty inspector configured to generate a novelty vector characterizing novelty of the at least one sample relative to the at least one example, wherein the novelty vector is generated using a Local Outlier Factor (LOF) method, a desirability inspector configured to generate a desirability vector characterizing predicted desirability of the at least one sample, wherein the desirability vector is generated using a Deep Multimodal Design Evaluation (DMDE) model, or a constraint inspector configured to generate a shape vector characterizing adherence of a silhouette for each sample of the at least one sample with a silhouette for each example
of the at least one example using a Structural Similarity Index Measure (SSIM), wherein each silhouette comprises a representation of the outer shape of one or more objects depicted in its associated sample or example.
12. A computer program product, the computer program product comprising a computer- readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising any one of the methods of claims 1-11.
13. A system comprising a computing node, the computing node comprising: a processor set; and one or more computer-readable storage media having program instructions embodied therewith, the program instructions executable by a processor of the computing node to cause the processor set to perform a method comprising any one of the methods of claims 1-11.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363606914P | 2023-12-06 | 2023-12-06 | |
| US63/606,914 | 2023-12-06 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025122928A1 true WO2025122928A1 (en) | 2025-06-12 |
Family
ID=95980474
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/058962 Pending WO2025122928A1 (en) | 2023-12-06 | 2024-12-06 | Design concept generation with generative adversarial networks |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025122928A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200372308A1 (en) * | 2019-05-22 | 2020-11-26 | Lawrence Livermore National Security, Llc | Mimicking of corruption in images |
| US20210012162A1 (en) * | 2019-07-09 | 2021-01-14 | Shenzhen Malong Technologies Co., Ltd. | 3d image synthesis system and methods |
| US20230162330A1 (en) * | 2021-11-19 | 2023-05-25 | Adobe Inc. | Techniques for image attribute editing using neural networks |
| US20230186598A1 (en) * | 2021-12-15 | 2023-06-15 | Robert Bosch Gmbh | Rating of generators for generating realistic images |
-
2024
- 2024-12-06 WO PCT/US2024/058962 patent/WO2025122928A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200372308A1 (en) * | 2019-05-22 | 2020-11-26 | Lawrence Livermore National Security, Llc | Mimicking of corruption in images |
| US20210012162A1 (en) * | 2019-07-09 | 2021-01-14 | Shenzhen Malong Technologies Co., Ltd. | 3d image synthesis system and methods |
| US20230162330A1 (en) * | 2021-11-19 | 2023-05-25 | Adobe Inc. | Techniques for image attribute editing using neural networks |
| US20230186598A1 (en) * | 2021-12-15 | 2023-06-15 | Robert Bosch Gmbh | Rating of generators for generating realistic images |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wan et al. | Automated colorization of a grayscale image with seed points propagation | |
| JP7200139B2 (en) | Virtual face makeup removal, fast face detection and landmark tracking | |
| US12164599B1 (en) | Multi-view image analysis using neural networks | |
| US11501161B2 (en) | Method to explain factors influencing AI predictions with deep neural networks | |
| Xi et al. | Deep prototypical networks with hybrid residual attention for hyperspectral image classification | |
| JP2018513507A (en) | Relevance score assignment for artificial neural networks | |
| Feng | Feature selection algorithm based on optimized genetic algorithm and the application in high-dimensional data processing | |
| Nakanishi | Approximate inverse model explanations (AIME): Unveiling local and global insights in machine learning models | |
| US20220114289A1 (en) | Computer architecture for generating digital asset representing footwear | |
| Babnik et al. | DifFIQA: Face image quality assessment using denoising diffusion probabilistic models | |
| US20190164055A1 (en) | Training neural networks to detect similar three-dimensional objects using fuzzy identification | |
| CN117690178A (en) | A computer vision-based face image recognition method and system | |
| WO2024097958A1 (en) | Data-driven design evaluators integrated into generative adversarial networks | |
| Ghasemi et al. | Are generative adversarial networks capable of generating novel and diverse design concepts? An experimental analysis of performance | |
| Ghasemi et al. | DCG-GAN: design concept generation with generative adversarial networks | |
| JP2020123329A (en) | Relevance score assignment for artificial neural networks | |
| CN117371511A (en) | Training method, device, equipment and storage medium for image classification model | |
| Lu et al. | Underwater image enhancement based on global features and prior distribution guided | |
| WO2025122928A1 (en) | Design concept generation with generative adversarial networks | |
| CN113822293A (en) | Model processing method, device and equipment for graph data and storage medium | |
| Sobieranski et al. | Learning a nonlinear distance metric for supervised region-merging image segmentation | |
| Fan et al. | [Retracted] Accurate Recognition and Simulation of 3D Visual Image of Aerobics Movement | |
| Wei et al. | Wall segmentation in house plans: fusion of deep learning and traditional methods | |
| CN117392074A (en) | Method, apparatus, computer device and storage medium for detecting object in image | |
| Nouri et al. | Global visual saliency: Geometric and colorimetrie saliency fusion and its applications for 3D colored meshes |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24901651 Country of ref document: EP Kind code of ref document: A1 |