WO2024191658A1 - Contrôleur d'intelligence artificielle responsable avec paramètres accordables pour une optimisation de modèle à travers de multiples variables - Google Patents
Contrôleur d'intelligence artificielle responsable avec paramètres accordables pour une optimisation de modèle à travers de multiples variables Download PDFInfo
- Publication number
- WO2024191658A1 WO2024191658A1 PCT/US2024/018508 US2024018508W WO2024191658A1 WO 2024191658 A1 WO2024191658 A1 WO 2024191658A1 US 2024018508 W US2024018508 W US 2024018508W WO 2024191658 A1 WO2024191658 A1 WO 2024191658A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- output
- model
- variable
- descriptive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present disclosure relates generally to variable-aware generative model output. More particularly, the present disclosure relates to leveraging control tokens and/or parameter tuning to generate generative model outputs based at least in part on particular variable levels associated with factuality, creativity, and/or safety.
- Natural language processing model systems are being utilized for a large variety of uses, which include question and answer implementations, writing stories, and instructions.
- the language models can have billions of learned parameters, which can provide complex responses.
- different tasks may be reliant on model fine tuning, which can be computationally expensive.
- Al models can be optimized for performance and accuracy.
- Performance, accuracy, and a plurality of other variables may be related to one another such that the optimization in one field may cause a decline in one or more other variables.
- the tradeoffs in optimization factors for the other variables are often made without conscious consideration. For example, training and/or retraining an Al model for fairness can cause deterioration in other variables that may also be important to the user.
- One example aspect of the present disclosure is directed to a computing system.
- the system can include one or more processors and one or more non-transitory computer- readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations.
- the operations can include obtaining a training dataset.
- the training dataset can include a plurality of training examples.
- each training example can include a training input example, a respective training output example, and a respective variable label.
- the training input example can include text data.
- the respective training output example can include content data descriptive of the text data.
- the respective variable label can be descriptive of one or more variable values associated with the content data.
- the respective variable label can be associated with one or more interdependent variables of a plurality of interdependent variables.
- the operations can include training a generative model with the training dataset.
- the generative model can be trained to process a text input and generate a content output.
- the generative model can be further trained to learn a variable distribution for each respective interdependent variable of the plurality of interdependent variables.
- Each variable distribution can include a learned output space associated with outputs descriptive of the respective interdependent variable.
- the operations can include storing the generative model.
- the operations can include obtaining input data from a user computing system.
- the input data can include one or more text strings.
- the operations can include processing the input data with the generative model to generate one or more variable-level specific outputs and providing the one or more variable-level specific outputs to the user computing system.
- the operations can include obtaining input data.
- the input data can include one or more text strings.
- the operations can include processing the input data to determine a particular task associated with input data, determining a set of interdependent variable values associated with the particular task, and processing the input data and the set of interdependent variable values with the generative model to generate one or more variable-level specific outputs.
- the one or more variable-level specific outputs can be descriptive of content associated with the set of interdependent variable values.
- processing the input data to determine the particular task associated with input data can include processing the input data with a classification model to determine a particular task associated with input data. In some implementations, processing the input data to determine the particular task associated with input data can include processing the input data with a natural language processing model to determine a particular task associated with input data.
- the generative model can include one or more transformer models.
- the content data can include at least one of generated text data, generated audio data, generated image data, or generated latent encoding data.
- the generative model can include an autoregressive language model.
- the content data can include generated text data.
- the generated text data can be descriptive of one or more sentences.
- the generative model can include a text-to-image diffusion model.
- the content data can include generated image data.
- the generated image data can be descriptive of one or more images comprising one or more subjects.
- the plurality of interdependent variables can include a factuality variable, a creativity variable, and a safety variable.
- the one or more interdependent variables of the plurality of interdependent variables can include a safety variable.
- the safety variable can be descriptive of a level of safety provided to the user by the respective training output example.
- the respective variable label can be descriptive of at least one of a high level of safety, a medium level of safety, or a low level of safety.
- the method can include obtaining, by a computing system including one or more processors, input data.
- the input data can include text data descriptive of a plurality of text characters.
- the method can include processing, by the computing system, the input data with a language model to generate output data.
- the output data can be descriptive of a plurality of output words associated with a particular task and the plurality of text characters.
- the method can include determining, by the computing system, a factuality value associated with the language model based on the output data.
- the factuality value can be descriptive of a determined level of factuality in the output data.
- the method can include determining, by the computing system, a creativity value associated with the language model based on the output data.
- the creativity value can be descriptive of a determined level of creativity in the output data.
- the method can include comparing, by the computing system, the factuality value to a first target threshold to generate a first evaluation output.
- the method can include comparing, by the computing system, the creativity value to a second target threshold to generate a second evaluation output and adjusting, by the computing system, one or more parameters of the language model based on the first evaluation output and the second evaluation output.
- adjusting, by the computing system, the one or more parameters of the language model based on the first evaluation output and the second evaluation output can include parameter efficient tuning.
- Determining the factuality value can include processing, by the computing system, the output data with a factuality classification model to determine the factuality value.
- the factuality classification model may have been trained on labeled output examples.
- determining the creativity value can include processing, by the computing system, the output data with a creativity classification model to determine the creativity value.
- the method can include determining, by the computing system, a safety value associated with the language model based on the output data.
- the safety value can be descriptive of a determined level of safety associated with the output data.
- the method can include comparing, by the computing system, the safety value to a third target threshold to generate a third evaluation output. The one or more parameters can be adjusted based at least in part on the third evaluation output.
- Another example aspect of the present disclosure is directed to one or more non- transilory computer-readable media that collectively store instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations.
- the operations can include obtaining input data.
- the input data can include text data descriptive of a prompt for content generation.
- the operations can include determining a particular task associated with the input data.
- the particular task can be associated with a type of response for responding to the prompt.
- the operations can include determining a set of interdependent variable values associated with the particular task.
- the set of interdependent variable values can be descriptive of a relative value for each of a plurality of interdependent variables for generating a response to the input data.
- the operations can include processing the input data and the set of interdependent variable values with a generative model to generate output data.
- the generative model may have been trained on a training dataset.
- the training dataset can include a plurality of input examples, a plurality of respective output examples, and a plurality of respective interdependent variable labels associated with the plurality of respective output examples.
- the output data can include content data responsive to the text data.
- the operations can include providing the output data as an output.
- the particular task can be descriptive of generating fictional content.
- the set of interdependent variable values can be descriptive of a creativity value being higher than the factuality value.
- the content data can be descriptive of text associated with a learned creative output distribution.
- the particular task can be descriptive of generating objectively true information.
- the set of interdependent variable values can be descriptive of a factuality value being higher than the creativity value.
- the content data can be descriptive of text associated with a factuality control token.
- the generative model may have been trained to generate a plurality of control tokens based on the plurality of respective interdependent variable labels.
- Figure 1 depicts a block diagram of an example generative model system according to example embodiments of the present disclosure.
- Figure 2 depicts a block diagram of an example task-aware output generation according to example embodiments of the present disclosure.
- Figure 3 depicts a block diagram of an example generative model training/monitoring system according to example embodiments of the present disclosure.
- Figure 4 depicts a block diagram of an example generative model training system according to example embodiments of the present disclosure.
- Figure 5 depicts a block diagram of an example generative model system according to example embodiments of the present disclosure.
- Figure 6 depicts a flow chart diagram of an example method to perform generative model training according to example embodiments of the present disclosure.
- Figure 7 depicts a flow chart diagram of an example method to perform variable- aware training according to example embodiments of the present disclosure.
- Figure 8 depicts a flow chart diagram of an example method to perform task- aware output generation according to example embodiments of the present disclosure.
- Figure 9A depicts a block diagram of an example computing system that performs variable-aware output generation according to example embodiments of the present disclosure.
- Figure 9B depicts a block diagram of an example computing device that performs variable-aware output generation according to example embodiments of the present disclosure.
- Figure 9C depicts a block diagram of an example computing device that performs variable-aware output generation according to example embodiments of the present disclosure.
- the present disclosure is directed to systems and methods for controlling generative model output based on optimization of multiple variables.
- generative models may be trained and/or configured to optimize accuracy and efficiency; however, other variables (or criteria) may additionally be pertinent for determining a particular model output.
- certain tasks may be associated with a request for factual information, while other tasks may be associated with a request for creativity and/or safety.
- the systems and methods disclosed herein can be able to simultaneously optimize the generative model across multiple variables (or criteria). Additionally and/or alternatively, the systems and methods disclosed herein may be able adjust the generative model optimization on a query-by-query basis.
- Another approach can be controlled decoding, where the model decoder is augmented to encourage or avoid generating certain types of output by reranking possible next tokens (e g., Liu et al., “DExperts: Decoding-Time Controlled Text Generation with Experts and Anti -Experts,” ARXIV (Jun. 3, 2021), ⁇ https://arxiv.org/abs/2105.03023>.).
- another approach can include the use of control tokens. Control tokens can be prefixed during training and/or fine tuning. At inference time, a control token can be added before generation to steer the model output (e.g., Krause et al. “GeDi: Generative Discriminator Guided Sequence Generation,” ARXIV (Oct.
- Another approach can be to apply evaluation tokens at the end text during fine tuning (e.g., Thoppilan et al., "LaMDA: Language Models for Dialog Applications,” ARXIV (Feb. 10, 2022), ⁇ https://arxiv.org/pdf/2201.08239.pdf>.) that the model uses to interpret what is produced. Then at inference time, the model can produce several generations that are filtered to remove undesirable ones.
- Fine tuning e.g., Thoppilan et al., "LaMDA: Language Models for Dialog Applications,” ARXIV (Feb. 10, 2022), ⁇ https://arxiv.org/pdf/2201.08239.pdf>.
- the systems and methods disclosed herein can condition a generative model for task-specific prioritization across multiple variables.
- the systems and methods can condition the generative model based on one or more of a variety of techniques, which may include the use of control tokens, filter tokens, controlled decoding, fine tuning, prompting or parameterefficient tuning, and/or other techniques, individually and/or in combination.
- different tasks can be associated with different variables.
- a plurality of different interdependent variables may be associated with the generation of outputs with a generative model.
- the plurality of interdependent variables can include factuality, creativity, and safety.
- the plurality of interdependent variables can affect one another.
- the systems and methods disclosed herein can leam output and/or parameter associations (e.g.. relationships) for different interdependent variables, which can then be utilized to condition the output generation of the generative model based on a desired variable set associated with a particular task.
- the systems and methods can include obtaining a training dataset.
- the training dataset can include a plurality of training examples.
- each training example can include a training input example, a respective training output example, and a respective variable label.
- the training input example can include text data.
- the respective training output example can include content data that may include text data.
- the respective variable label can be descriptive of one or more variable names (e.g..
- variable label can be associated with one or more interdependent variables of a plurality of interdependent variables.
- the respective variable label can be descriptive of one or more variable names and one or more variable values associated with the variable name and may a triplet set (e.g., a factuality label descriptive of a factuality value, a creativity label descriptive of a creativity value, and/or a safety label descriptive of a safety value).
- the variable value(s) can be descriptive of a likelihood that the response would be viewed as factual.
- the variable value(s) can be a binary value descriptive of if the particular variable threshold is met (e.g.. is the response factual?).
- the systems and methods can include training a generative model with the training dataset.
- the generative model can be trained to process a text input and generate a content output.
- the generative model can be trained to leam a variable distribution for each respective interdependent variable of the plurality of interdependent variables.
- Each variable distribution can include a learned output space associated with outputs descriptive of the respective interdependent variable.
- the systems and methods can include storing the generative model.
- the training dataset may include one or more training examples that are unlabeled.
- one or more training examples may include a training input example and/or a training output example without a respective variable label.
- the unlabeled training examples may be utilized for training the generative model without the variable training loop.
- the unlabeled training example may be associated with a learned variable distribution. The unlabeled training example can then be associated with one or more variable values based on the association with the learned variable distribution.
- the systems and methods can obtain a training dataset.
- the training dataset can include a plurality of training examples.
- each training example can include a training input example, a respective training output example, and a respective variable label.
- the training input example can include text data.
- the respective training output example can include content data descriptive of the text data.
- the respective variable label can be descriptive of one or more variable values associated with the content data.
- the content data can include generated text data.
- the generated text data can be descriptive of one or more sentences. Additionally and/or alternatively, the content data may include generated image data.
- the respective variable label can be associated with one or more interdependent variables of a plurality of interdependent variables.
- the plurality of interdependent variables can include a factuality variable, a creativity variable, and a safety variable.
- the one or more interdependent variables of the plurality of interdependent variables can include a safety variable.
- the safety variable can be descriptive of a level of safety provided to the user.
- the respective variable label may be descriptive of a high level of safety, a low level of safety, or a medium level of safety associated with the respective training output example.
- a generative model can be trained with the training dataset.
- the generative model can be trained to process a text input and generate a content output.
- the generative model can be trained to leam a variable distribution for each respective interdependent variable of the plurality of interdependent variables.
- Each variable distribution can include a learned output space associated with outputs descriptive of the respective interdependent variable.
- the generative model can include an autoregressive language model.
- the generative model can include a text-to- image diffusion model.
- the generated image data can be descriptive of one or more images comprising one or more subjects.
- the generative model can be trained to process input data to generate output data.
- the input data can include text data, image data, audio data, latent encoding data, and/or other input data, which may include multimodal data.
- the output data can include text data, image data, audio data, latent encoding data, and/or other input data.
- the generative model can include a language model (e.g., a natural language processing model. The generative model may be trained to respond to input data in a conversational manner. In some implementations, the generative model may process context data with the input data to generate a context-aware response.
- the generative model may be trained to adjust a determined factuality variable level, a creativity variable level, and/or a safety’ variable level for generating the response based at least in part on the context data and/or user data (e.g., user profile data, user preference data, and/or historical data associated with the user).
- the context data can include previous inputs and outputs, a time, trending topics, a location, and/or other context information.
- the generative model can include an image generation model (e.g., a text-to-image diffusion model).
- the generative model can be trained to process text data to generate image data.
- the image data can be descriptive of the subject and/or details associated with the text data.
- the image data can depict a new image that differs from the training data.
- the generative model can process multimodal data to generate the image data, which can include image data, text data, content data, audio data, and/or latent encoding data.
- the generative model can then be stored.
- the generative model can be stored locally and/or on a server computing system.
- the generative model may be stored to be leveraged by one or more web platforms.
- the trained generative model may be utilized for generating one or more search results for a search engine.
- the generated one or more search results can be provided as a summary and/or example in a separate panel of a search results page and/or adjacent to web search results and/or image search results.
- the generative model can be stored to be utilized by a chat bot and/or an image-generation bot.
- the systems and methods can obtain input data from a user computing system.
- the input data can include one or more text strings.
- the input data can be processed with the generative model to generate one or more variable-level specific outputs.
- the one or more variable-level specific outputs can then be provided to the user computing system.
- the input data may include text data, image data, audio data, latent encoding data, and/or multimodal data.
- the output data may include text data, image data, audio data, latent encoding data, and/or multimodal data.
- the systems and methods can obtain input data.
- the input data can include one or more text strings.
- the input data can be processed to determine a particular task associated with input data.
- the particular task can be associated with a creation task (e.g., writing a poem and/or generating a painting st le image), a knowledge task (e.g., responding to a knowledge query 7 with factual information), and/or a conversational task (e.g.. responding to user messages that are associated w ith a mix of user experiences, emotions, and/or facts).
- a creation task e.g., writing a poem and/or generating a painting st le image
- a knowledge task e.g., responding to a knowledge query 7 with factual information
- a conversational task e.g. responding to user messages that are associated w ith a mix of user experiences, emotions, and/or facts.
- processing the input data to determine the particular task associated with input data can include processing the input data with a classification model to determine a particular task associated with input data.
- the classification model may be trained to determine a particular task associated with the input data, which may include determining a prompt preamble, semantic analysis with one or more machine-learned models, and/or natural language processing.
- processing the input data to determine the particular task associated with input data may include processing the input data with a natural language processing model to determine a particular task associated with input data.
- the particular task determination may be based on heuristics and/or learned prompt structure.
- a set of interdependent variable values associated with the particular task can then be determined. The determination can be based on an index of values associated with the particular task. Alternatively and/or additionally, the set of interdependent variable values may be determined by one or more machine-learned models (e.g., the generative model). [0043] The input data and the set of interdependent variable values can be processed with the generative model to generate one or more variable-level specific outputs. The one or more variable-level specific outputs can be descriptive of content associated with the set of interdependent variable values.
- the systems and methods can include variablebased parameter tuning.
- the systems and methods can include obtaining input data.
- the input data can include text data descriptive of a plurality of text characters.
- the systems and methods can include processing the input data with a language model to generate output data.
- the output data can be descriptive of a plurality of output words associated with a particular task and the plurality of text characters.
- the systems and methods can include determining a factuality value associated with the language model based on the output data.
- the factuality value can be descriptive of a determined level of factuality in the output data.
- the systems and methods can include determining a creativity value associated with the language model based on the output data.
- the creativity value can be descriptive of a determined level of creativity in the output data.
- the systems and methods can include comparing the factuality 7 value to a first target threshold to generate a first evaluation output and comparing the creativity value to a second target threshold to generate a second evaluation output.
- the systems and methods can include adjusting one or more parameters of the language model based on the first evaluation output and the second evaluation output.
- Input data can be obtained from a user.
- the input data can include text data descriptive of a plurality of text characters.
- the plurality of text characters can be descriptive of a prompt.
- the prompt can include a message to respond to with a model-generated response (e.g., a question and/or a command).
- the input data can be processed with a language model to generate output data.
- the output data can be descriptive of a plurality of output words associated with a particular task and the plurality of text characters.
- the language model can include a transformer model.
- the language model may include one or more autoregressive language models.
- the output data can include text strings that may be specifically tailored based on the w ording of the prompt input.
- the output data may be generated based on a plurality 7 of learned features associated with a plurality of training examples.
- the input data can be processed w ith the language model to generate a plurality of candidate output datasets. Each of the plurality of candidate output datasets may be associated with different variable values.
- the output data can be selected from and/or generated based on the plurality of candidate output datasets. For example, a particular candidate output dataset may be selected by a user and/or may be automatically selected based on a determined task and/or a determined classification.
- the systems and methods can determine a factuality value associated with the language model output data.
- the factuality value can be descriptive of a determined level of factuality in the output data.
- determining the factuality value can include processing the output data with a factuality classification model to determine the factuality value.
- the factuality classification model may have been trained on labeled output examples.
- the factuality can be determined by and/or based on a search query. For example, a search query can be generated, and a search may be performed to verify the information of the response, condition the generation of a response, and/or to determine an attribution (e.g., a source of the facts).
- the systems and methods can determine a creativity value associated with the language model based on the output data.
- the creativity value can be descriptive of a determined level of creativity in the output data.
- determining the creativity value can include processing the output data with a creativity classification model to determine the creativity value. Additionally and/or alternatively, the creativity classification model may have been trained on labeled output examples.
- the factuality value can be compared to a first target threshold to generate a first evaluation output.
- a plurality of outputs can be requested for generation by the generative model.
- the plurality of outputs can then be evaluated individually and/or in combination to generate the one or more first evaluation outputs.
- the first target threshold may be a predetermined value.
- the first target threshold may be determined based on a task associated with the input data. For example, the particular task may be associated with a set of variable values (e.g., high factuality may be requested for informational responses, while the threshold may be lower for creative generation tasks).
- the first target threshold may be a learned threshold.
- the creativity value can be compared to a second target threshold to generate a second evaluation output.
- a plurality of outputs can be requested for generation by the generative model.
- the plurality of outputs can then be evaluated individually and/or in combination to generate the one or more second evaluation outputs.
- a particular candidate output of the plurality of outputs may be selected based on the first evaluation output and/or the second evaluation output.
- the second target threshold may be a predetermined value.
- the second target threshold may be determined based on a task associated with the input data. For example, the particular task may be associated with a set of variable values (e.g., high creativity may be requested for fictional prose generation, while the threshold may be lower for informative responses).
- the second target threshold may be a learned threshold.
- One or more parameters of the language model can be adjusted based on the first evaluation output and the second evaluation output. Adjusting the one or more parameters of the language model based on the first evaluation output(s) and the second evaluation output(s) may include parameter efficient tuning. Alternatively and/or additionally, one or more outputs from a set of candidate outputs generated with the generative model may be selected based on the first evaluation output(s) and/or the second evaluation output(s).
- the systems and methods can determine a safety value associated with the language model based on the output data.
- the safety value can be descriptive of a determined level of safety associated with the output data.
- the safety value can be compared to a third target threshold to generate a third evaluation output.
- the one or more parameters can be adjusted based at least in part on the third evaluation output.
- the systems and methods can leverage a trained generative model to process input data and generate output data.
- the systems and methods can include obtaining input data.
- the input data can include text data descriptive of a prompt for content generation.
- the systems and methods can include determining a particular task associated with the input data.
- the particular task can be associated with a type of response for responding to the prompt.
- the systems and methods can include determining a set of interdependent variable values associated with the particular task.
- the set of interdependent variable values can be descriptive of a relative value for each of a plurality of interdependent variables for generating a response to the input data.
- the systems and methods can include processing the input data and the set of interdependent variable values with a generative model to generate output data.
- the generative model may have been trained on a training dataset.
- the training dataset can include a plurality of input examples, a plurality' of respective output examples, and a plurality of respective interdependent variable labels associated with the plurality' of respective output examples.
- the output data can include content data responsive to the text data.
- the systems and methods can include providing the output data as an output. [0054]
- the systems and methods can obtain input data.
- the input data can include text data descriptive of a prompt for content generation.
- the prompt can be descriptive of a particular task, a particular subject, one or more details, and/or a style.
- the input data may include multimodal data (e.g., one or more words and/or one or more images).
- a particular task associated with the input data can be determined.
- the particular task can be associated with a type of response for responding to the prompt.
- the particular task may be determined by a language model (e.g., a natural language processing model, a semantic understanding model, and/or a sentiment analysis model).
- the particular task may be determined by processing the input data with a classification model.
- a set of interdependent variable values associated with the particular task can be determined.
- the set of interdependent variable values can be descriptive of a relative value for each of a plurality of interdependent variables for generating a response to the input data.
- the set of interdependent variable values can be determined based on learned correlations. Alternatively and/or additionally, the set of interdependent variable values can be determined based on an index of values associated with particular tasks.
- the set of interdependent variable values can include a level of factuality, a level of creativity, and/or a level of safety to utilize when generating the output data (e.g., the response).
- the set of interdependent variable values can be associated with a set of thresholds, a range of values, a learned distribution, embedding values, and/or control tokens.
- the input data and the set of interdependent variable values can be processed with a generative model to generate output data.
- the generative model may have been trained on a training dataset.
- the training dataset can include a plurality of input examples, a plurality of respective output examples, and a plurality of respective interdependent variable labels associated with the plurality of respective output examples.
- the output data can include content data responsive to the text data.
- the generative model may have been trained to generate a plurality of control tokens based on the plurality of respective interdependent variable labels.
- the generative model can then utilize the control tokens to generate the output data, which can include selecting response data based on the control token.
- the response data can be utilized to determine how to structure the output data.
- the particular task can be descriptive of generating fictional prose.
- the set of interdependent variable values can be descriptive of a creativity value being higher than the factuality value.
- the content data can be descriptive of text associated with a learned creative output distribution.
- the creativity value of the set of variables can be descriptive of creativity being important in generation.
- the particular task can be descriptive of generating objectively true information.
- the set of interdependent variable values can be descriptive of a factuality value being higher than the creativity value.
- the content data can be descriptive of text associated with a factuality control token.
- the factuality value of the set of variables can be descriptive of factuality being important in generation.
- the output data can then be provided as an output.
- the output data can include text data, image data, audio data, video data, and/or multimodal data.
- the output data can be provided via a user interface.
- the user interface can provide the output data for display with the input data.
- the output data may be provided in a search results page, in a user interface of a dedicated generative web platform, in a suggestions/discovery page, and/or in a viewer window.
- Generative models can be trained to optimize performance and accuracy; however, other variables may provide desired benefits for providing optimal responses.
- the other variables can include factuality, creativity, and safety.
- the safety variable may include fairness, bias, robustness, toxicity, misinformation, transparency, explainability, and privacy.
- the variables in combination with accuracy and performance may be treated as independent factors.
- the different variables can have a powerful impact on each other, which may not be readily apparent. For example, optimizing a model for accuracy may decrease its fairness and privacy, while optimizing for privacy may decrease transparency, fairness, and accuracy.
- the systems and methods disclosed herein can leverage control tokens and/or parameter efficient tuning to provide well performing, accurate generative models that may be tailored to adjust for factuality, creativity, and/or safety variables.
- the systems and methods disclosed herein may include a holistic approach to optimization by taking into account the entire suite of variables that may be deemed pertinent for one or more tasks. To account for the multivariate set of factors, the systems and methods may be configured based on a determined state of equilibrium where each competing variable reaches a target threshold as a holistic system for any particular application.
- a set of target thresholds may be determined, and a tunable set of levers and/or knobs that measure the relative impact of increasing or decreasing a parameter within the set of interdependent variables may be utilized to track the relative levels of different variables.
- the systems and methods may include generative model monitoring. Once a trained generative model is deployed, a system of dynamic detection scripts can be utilized to be run periodically or continuously, that produce measurements of the parameters and compare them to thresholds. When a variable(s) falls below a specified threshold, an alert and/or notification may be triggered, and the variable may be adjusted in a way to maintain equilibrium across the set of variables. The adjustments may be manual.
- the systems and methods disclosed herein may utilize a self-correcting design, which may minimize manual intervention.
- the adjustments may include parameter efficient tuning, retraining, variable level adjustments, and/or label-specific adjustments.
- the monitoring may be based on processing queries (e.g., manually generated queries and/or synthetic data generated via synthetic generation techniques) with the generative model to generate an output that can then be classified.
- the classification may be performed by processing the output with one or more machine-learned classification models.
- the classification can be processed to determine if the output met the target threshold(s).
- the classification can be processed to determine if the output is descriptive of the appropriate variable levels for the particular task.
- adjustments may be performed, which can include obtaining and/or generating training examples that can be utilized to reach the state of equilibrium.
- adjustments may be performed.
- the systems and methods may determine model parameters associated with each variable and may adjust parameter values based on the particular use case and/or based on variable levels.
- the systems and methods may include learning multi-dimensional tagging (e.g.. an output may have a tag descriptive of factuality high, safety low, and/or creativity medium).
- the learned tags can be utilized to determine responses that are associated with a particular variable set.
- Parameter-efficient tuning can include only adjusting a subset of the generative model parameters instead of adjusting all parameters of the generative model. For example, during training, a relationship between one or more particular parameters and one or more particular variables can be learned. The learned relationship can be stored. The relationship can then be utilized to determine which particular parameters will be adjusted in response to determining an output is descriptive of a factuality level, a creativity level, and/or a safety level that is below a target threshold.
- Control tokens can include learned embedding tokens and/or learned output distributions.
- the control tokens can be generated and/or learned based on output label training, which can include learning which outputs and/or which inputs are associated with specific variable levels (e.g., creativity high, factuality low, and/or safety high).
- the control tokens can be utilized to determine which output features are associated with a response with a specific variable set.
- the systems and methods of the present disclosure provide a number of technical effects and benefits.
- the system and methods can provide a variable-aware response generation system that can determine a user query 7 is associated with a particular task, can determine a set of variable priorities associated with the particular task, and can condition the generative model to generate a response with variable-specific prioritization.
- the systems and methods can utilize one or more control tokens and/or parameter tuning to adjust variable prioritization (or variable levels) based on a determined task, which can lead to the generation of outputs with a desired level of factuality, creativity, and/or safety.
- Another technical benefit of the systems and methods of the present disclosure is the ability to leverage one or more label training datasets to leam various output distributions associated with differing interdependent variables, which can include factuality, creativity, and/or safety.
- the learned distributions and/or learned parameter dependencies can be utilized to efficiently leverage a generative model for a plurality of different downstream tasks.
- Another example of technical effect and benefit relates to improved computational efficiency and improvements in the functioning of a computing system.
- the systems and methods disclosed herein can leverage control tokens and/or parameter efficient tuning to reduce the computational cost of utilizing a generative model for different dow n stream tasks by replacing the retraining method that may include retraining billions of parameters.
- Figure 1 depicts a block diagram of an example generative model system 10 according to example embodiments of the present disclosure.
- the generative model system 10 can process input data 12 with a generative model 14 to generate output data 16.
- input data 12 can be obtained from a user.
- the input data 12 can include text data, image data, audio data, and/or latent encoding data.
- the input data 12 can be descriptive of a prompt.
- the prompt can include a task, a subject, and/or one or more details (e.g., a style and/or one or more attributes).
- the generative model 14 can include one or more transformer models.
- the generative model can include a language model (e.g., a natural language processing model).
- the generative model 14 can include an image generation model (e.g., a text-to-image model).
- the generative model 14 can be trained and/or configured to process an input to generate novel content that leverages learned features from a training dataset to generate outputs that merge learned features to generate content specifically responsive to that particular input.
- the generative model 14 may be trained to determine a task associated with the input data 12 and generate a response with a particular structure and/or particular levels of factuality, creativity, and/or safety based on the determined task.
- the output data 16 generated by the generative model 14 in response to processing the input data 12 can include content data responsive to and/or descriptive of the prompt.
- the prompt may include “write a short sloiy about elves learning how to share supplies.”
- the output data 16 may then be descriptive of one or more paragraphs that describe elves interacting with one another in a series of events that leads to them learning how to share supplies.
- the generative model 14 may have leveraged learned features and/or relationships associated with a plurality of different training datasets associated with a variety of topics including storytelling, children stories, sharing stories, and/or elf related informational pieces.
- the generative model system 10 may determine the task associated with the input data 12 is associated with a high level of creativity (e.g., a specific creativity threshold) and a medium level of safety (e.g., a specific safety threshold).
- the parameters of the generative model 14 may be adjusted based on this determination.
- one or more control tokens can be utilized to condition the generation to be based on output features and/or relationships associated with a specific variable set.
- FIG. 2 depicts a block diagram of an example task-aware output generation 200 according to example embodiments of the present disclosure.
- the task-aware output generation 200 can include processing input data 202 to determine a particular task, which can then be utilized to determine a set of interdependent variable values 204 for output generation.
- the set of interdependent variable values and the input data 202 can then be processed with the generative model 206 to generate output data 208 that is generated based on the variable values associated with the determined task.
- input data 202 may be received via a user interface.
- the input data can include text data, image data, and/or multimodal data.
- the input data 202 can include a prompt descriptive of a question, a comment, a request, and/or a problem.
- the input data can be processed to determine a particular task associated with the input data 202 (e.g., a prose generation task, a poem generation task, an informative response, an instructional response, an emotion based response, and/or a summarization response).
- a set of interdependent variable values can be determined 204. For example, in response to a fictional prose generation task determination, a set of interdependent variable values associated with high creativity and low factuality' may be received. Alternatively and/or additionally, in response to an instructional task determination, a set of interdependent variable values associated with high factuality and medium-to-high safety may be determined.
- the set of interdependent variable values may be determined based on an index and/or based on learned relationships.
- the input data 202 and the determined set of interdependent variable values can then be processed with a trained generative model 206 to generate output data 208.
- the set of interdependent variable values may be utilized for parameter tuning of the generative model 206 before processing the input data 202.
- the generative model 206 can process the set of interdependent variable values and may then utilize control tokens to condition the output data 208 generation based on learned relationships between output features and specific variables.
- the output data 208 can include text data, image data, audio data, and/or multimodal data.
- the output data can be descriptive of a prompt and/or descriptive of a response to a prompt. Additionally and/or alternatively, the output data 208 may include information that meets one or more variable thresholds.
- Figure 3 depicts a block diagram of an example generative model training/monitoring system 300 according to example embodiments of the present disclosure.
- the generative model training/monitoring system 300 can be utilized to train and/or retrain a generative model 314 based on one or more target thresholds.
- the generative model training/monitoring system 300 can include continuously and/or periodically monitoring outputs to determine whether retraining is to be performed.
- input data 312 e.g., text data and/or image data
- the input data 312 can be processed with a generative model 314 (e.g.. an autoregressive language model and/or a text-to-image diffusion model) to generate output data 316 (e g., text data and/or image data) responsive to the input data 312.
- a generative model 314 e.g.. an autoregressive language model and/or a text-to-image diffusion model
- output data 316 e., text data and/or image data
- the output data 316 may then be evaluated to monitor the performance of the generative model 314.
- the evaluation can include determining a factuality value 320 associated with the output data 316, determining a creativity' value 322 associated with the output data 316, and/or determining a safety value 324 associated with the output data 316.
- the factuality value can be compared to a first target threshold 330 to determine a first evaluation output.
- the creativity value can be compared to a second target threshold 332 to determine a second evaluation output.
- the safety' value can be compared to a third target threshold 334 to determine a third evaluation output.
- the first evaluation output, the second evaluation output, and/or the third evaluation output may include one or more gradient descents, which can be backpropagated to the generative model 314 to adjust one or more parameters of the generative model.
- prompt tuning and/or control token adjustment may be performed based on the first evaluation output, the second evaluation output, and/or the third evaluation output.
- the input data 312 can include a request for a plurality' of outputs.
- the input data 312 can then be processed with the generative model 314 to generate output data 316.
- the output data 316 can be descriptive of a plurality of outputs, which may be associated with different variable values.
- the plurality of outputs may be evaluated individually and/or in combination.
- the plurality 7 of outputs may be processed to determine a plurality of respective factuality values, a plurality' of respective creativity values, and/or a plurality’ of respective safety values.
- the plurality of respective factuality values may be utilized to generate a plurality of respective first evaluation outputs and/or a collective first evaluation output.
- the plurality of respective creativity' values may be utilized to generate a plurality of respective second evaluation outputs and/or a collective second evaluation output.
- the plurality of respective safety values may be utilized to generate a plurality of respective Third evaluation outputs and/or a collective third evaluation output.
- the determined evaluation outputs may be utilized to determine a specific output to utilize and/or may be utilized to adjust one or more parameters of the generative model 314.
- the parameters and/or the variables may be evaluated individually and/or in combination.
- the parameters associated with the different variables may be examined in combination to generate a broader evaluation.
- the parameters may be examined on an individual basis to generate a specific and/or targeted evaluation.
- FIG. 4 depicts a block diagram of an example generative model training system 400 according to example embodiments of the present disclosure.
- a training dataset with a plurality of training example sets can be obtained to train and/or retrain a generative model 414.
- Each training example set can include an input training example 412, a respective output training example 418 (e.g., ground truth text and/or image data responsive to the input training example), and one or more respective variable labels 422 (e.g., one or more variable values associated with the output training example 418).
- an input training example 412 e.g.. text descriptive a prompt
- a generative model 414 e.g., a language model
- a generated output 416 e.g., one or more sentences responsive to the prompt.
- a first loss function 420 may be evaluated based on a comparison between the generated output 416 and the output training example 418.
- the evaluation can be utilized to generate a first gradient descent, which can be backpropagated to the generative model to adjust one or more parameters of the generative model.
- the generated output 416 can be processed with one or more classification models 424 to determine one or more variable value labels 426 for the generated output 416.
- the determined labels 426 can be descriptive of variable values associated with the generated output 416.
- a second loss function 428 can then be evaluated based on a comparison between the determined labels 426 and the respective variable labels 422 to generate a second gradient descent.
- the second gradient descent can be backpropagated to the generative model 414 to adjust one or more parameters of the generative model 414.
- Figure 5 depicts a block diagram of an example generative model system 500 according to example embodiments of the present disclosure.
- the generative model system 500 can process input data 512 with a generative model 514 to generate output data 516.
- the output data 516 can be generated based on a determined task and/or a determined context.
- the output data 516 may be a first output 520 if one or more first tasks are determined, may be a second output 522 if one or more second tasks are determined, and/or may include features of a third output 524, a fourth output 526, and a fifth output 528 in response to determining a specific task.
- a plurality of candidate outputs can be generated that can then be provided for selection (e.g., task-based selection, user selection, classification-based selection, etc.).
- Each of the various possible outputs can be associated with various variable values (e.g., differing factuality values differing creativity values, and/or differing safety values).
- input data 512 can be obtained from a user.
- the input data 512 can include text data, image data, audio data, and/or latent encoding data.
- the input data 512 can be descriptive of a prompt.
- the prompt can include a task, a subject, and/or one or more details (e.g., a style and/or one or more attributes).
- the generative model 514 can include one or more transformer models.
- the generative model can include a language model (e.g., a natural language processing model).
- the generative model 514 can include an image generation model (e.g., a text-to-image model).
- the generative model 514 can be trained and/or configured to process an input to generate novel content that leverages learned features from a training dataset to generate outputs that merge learned features to generate content specifically responsive to that particular input.
- the generative model 514 may be trained to determine a task associated with the input data 512 and generate a response with a particular structure and/or particular levels of factuality, creativity’, and/or safety based on the determined task.
- the output data 516 generated by the generative model 514 in response to processing the input data 512 can include content data responsive to and/or descriptive of the prompt.
- the prompt may include “write a short story about elves learning how to share supplies.”
- the output data 516 may then be descriptive of one or more paragraphs that describe elves interacting with one another in a series of events that leads to them learning how to share supplies.
- the generative model 514 may have leveraged learned features and/or relationships associated with a plurality of different training datasets associated with a variety of topics including storytelling, children stories, sharing stories, and/or elf related informational pieces.
- the generative model system 500 may determine the task associated with the input data 512 is associated with a high level of creativity (e.g., a specific creativity' threshold) and a medium level of safety (e.g., a specific safety threshold).
- the parameters of the generative model 514 may be adjusted based on this determination.
- one or more control tokens can be utilized to condition the generation to be based on output features and/or relationships associated with a specific variable set.
- the first output 520, the second output 522, the third output 524, the fourth output 526. and/or the fifth output 528 can be descriptive of candidate outputs that may be selected and/or generated based on a determined task and/or a determined context.
- the candidate outputs can be associated with varying levels of factuality', creativity, and/or safety. Different value sets may be pre-associated with particular tasks and/or may be determined based on one or more machine-learned associations.
- the factuality variable can be associated with how fact heavy an output is.
- the creativity variable can be associated with how much of the output is associated with novel content generation.
- the safety variable can be associated with a level of safety provided to a user, which can include a level of misinformation, a level of toxicity, a level of bias, a level of fairness, a level of robustness, a level of transparency, and/or a level of explainability.
- Figure 6 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 6 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 600 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
- a computing system can obtain a training dataset.
- the training dataset can include a plurality of training examples.
- each training example can include a training input example, a respective training output example, and a respective variable label.
- the training input example can include text data.
- the respective training output example can include content data descriptive of the text data.
- the respective variable label can be descriptive of one or more variable values associated with the content data.
- the content data can include generated text data.
- the generated text data can be descriptive of one or more sentences. Additionally and/or alternatively, the content data may include generated image data.
- the respective variable label can be associated with one or more interdependent variables of a plurality of interdependent variables.
- the plurality of interdependent variables can include a factuality variable, a creativity' variable, and a safety variable.
- the one or more interdependent variables of the plurality of interdependent variables can include a safety variable.
- the safety variable can be descriptive of a level of safety provided to the user.
- the respective variable label can be descriptive of a high level of safety associated with the respective training output example.
- the computing system can train a generative model with the training dataset.
- the generative model can be trained to process a text input and generate a content output.
- the generative model can be trained to learn a variable distribution for each respective interdependent variable of the plurality of interdependent variables. Each variable distribution can include a learned output space associated with outputs descriptive of the respective interdependent variable.
- the generative model can include an autoregressive language model. Alternatively and/or additionally, the generative model can include a text-to-image diffusion model.
- the generated image data can be descriptive of one or more images comprising one or more subjects.
- the generative model can be trained to process input data to generate output data.
- the input data can include text data, image data, audio data, latent encoding data, and/or other input data, which may include multimodal data.
- the output data can include text data, image data, audio data, latent encoding data, and/or other input data.
- the generative model can include a language model (e.g., a natural language processing model.
- the generative model may be trained to respond to input data in a conversational manner.
- the generative model may process context data with the input data to generate a context-aware response.
- the generative model may be trained to adjust a determined factuality variable level, a creativity variable level, and/or a safety variable level for generating the response based at least in part on the context data and/or user data (e.g., user profile data, user preference data, and/or historical data associated with the user).
- the context data can include previous inputs and outputs, a time, trending topics, a location, and/or other context information.
- the generative model can include an image generation model (e.g., a text-to-image diffusion model).
- the generative model can be trained to process text data to generate image data.
- the image data can be descriptive of the subject and/or details associated with the text data.
- the image data can depict a new image that differs from the training data.
- the generative model can process multimodal data to generate the image data, which can include image data, text data, content data, audio data, and/or latent encoding data.
- the computing system can store the generative model.
- the generative model can be stored locally and/or on a server computing system.
- the generative model may be stored to be leveraged by one or more web platforms.
- the trained generative model may be utilized for generating one or more search results for a search engine.
- the generated one or more search results can be provided as a summary' and/or example in a separate panel of a search results page and/or adjacent to web search results and/or image search results.
- the generative model can be stored to be utilized by a chat hot and/or an image-generation bot.
- the computing system can obtain input data from a user computing system.
- the input data can include one or more text strings.
- the input data can be processed with the generative model to generate one or more variable-level specific outputs.
- the one or more variable-level specific outputs can then be provided to the user computing system.
- the input data may include text data, image data, audio data, latent encoding data, and/or multimodal data.
- the output data may include text data, image data, audio data, latent encoding data, and/or multimodal data.
- the computing system can obtain input data.
- the input data can include one or more text strings.
- the input data can be processed to determine a particular task associated with input data.
- the particular task can be associated with a creation task (e.g., writing a poem and/or generating a painting style image), a knowledge task (e.g., responding to a knowledge query with factual information), and/or a conversational task (e.g., responding to user messages that are associated with a mix of user experiences, emotions, and/or facts).
- a creation task e.g., writing a poem and/or generating a painting style image
- a knowledge task e.g., responding to a knowledge query with factual information
- a conversational task e.g., responding to user messages that are associated with a mix of user experiences, emotions, and/or facts.
- processing the input data to determine the particular task associated with input data can include processing the input data with a classification model to determine a particular task associated with input data.
- the classification model may be trained to determine a particular task associated with the input data, which may include determining a prompt preamble, semantic analysis with one or more machine-learned models, and/or natural language processing.
- processing the input data to determine the particular task associated with input data may include processing the input data with a natural language processing model to determine a particular task associated with input data.
- the particular task determination may be based on learned heuristics and/or learned prompt structure.
- a set of interdependent variable values associated with the particular task can then be determined. The determination can be based on an index of values associated with the particular task. Alternatively and/or additionally, the set of interdependent variable values may be determined by one or more machine-learned models (e.g., the generative model).
- FIG. 7 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 7 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 700 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
- a computing system can obtain input data.
- the input data can include text data descriptive of a plurality of text characters.
- the plurality of text characters can be descriptive of a prompt.
- the prompt can include a message to respond to with a modelgenerated response (e.g., a question and/or a command).
- the computing system can process input data with a language model to generate output data.
- the output data can be descriptive of a plurality of output words associated with a particular task and the plurality of text characters.
- the language model can include a transformer mode.
- the language model may include one or more autoregressive language models.
- the output data can include text strings that may be specifically tailored based on the wording of the prompt input.
- the output data may be generated based on a plurality of learned features associated with a plurality of training examples.
- the generative model may be conditioned and/or instructed (e.g., via a request associated with the input data) to generate output data descriptive of a plurality of candidate outputs that may be provided to the user and/or may be processed (e.g., with a dialogue management model (e g., a heuristic model and/or a machine-learned model for controlling dialogue outputs and may be trained on sequence data)) to determine a particular output of the plurality of candidate outputs to utilize as the output provided to the user.
- a dialogue management model e.g., a heuristic model and/or a machine-learned model for controlling dialogue outputs and may be trained on sequence data
- the computing system can determine a factuality value associated with the language model based on the output data.
- the factuality value can be descriptive of a determined level of factuality in the output data.
- determining the factuality value can include processing the output data with a factuality classification model to determine the factuality value.
- the factuality classification model may have been trained on labeled output examples.
- the computing system can determine a creativity value associated with the language model based on the output data.
- the creativity value can be descriptive of a determined level of creativity in the output data.
- determining the creativity value can include processing the output data with a creativity classification model to determine the creativity value. Additionally and/or alternatively, the creativity classification model may have been trained on labeled output examples.
- the computing system can compare the factuality value to a first target threshold to generate a first evaluation output and compare the creativity value to a second target threshold to generate a second evaluation output.
- the first target threshold may be a predetermined value.
- the first target threshold may be determined based on a task associated with the input data. For example, the particular task may be associated with a set of variable values (e.g., high factuality may be requested for informational responses, while the threshold may be lower for creative generation tasks).
- the first target threshold may be a learned threshold.
- the second target threshold may be a predetermined value. Alternatively and/or additionally, the second target threshold may be determined based on a task associated with the input data. For example, the particular task may be associated with a set of variable values (e.g., high creativity may be requested for fictional prose generation, while the threshold may be lower for informative responses).
- the second target threshold may be a learned threshold.
- the computing system can adjust one or more parameters of the language model based on the first evaluation output and the second evaluation output. Adjusting the one or more parameters of the language model based on the first evaluation output and the second evaluation output may include parameter efficient tuning.
- the computing system can determine a safety value associated with the language model based on the output data.
- the safety value can be descriptive of a determined level of safety associated with the output data.
- the safety' value can be compared to a third target threshold to generate a third evaluation output.
- the one or more parameters can be adjusted based at least in part on the third evaluation output.
- Figure 8 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 8 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 800 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
- a computing system can obtain input data.
- the input data can include text data descriptive of a prompt for content generation.
- the prompt can be descriptive of a particular task, a particular subject, one or more details, and/or a style.
- the input data may include multimodal data (e.g., one or more words and/or one or more images).
- the computing system can determine a particular task associated with the input data.
- the particular task can be associated with a type of response for responding to the prompt.
- the particular task may be determined by a language model (e.g., a natural language processing model, a semantic understanding model, and/or a sentiment analysis model).
- the particular task may be determined by processing the input data with a classification model.
- the computing system can determine a set of interdependent variable values associated with the particular task.
- the set of interdependent variable values can be descriptive of a relative value for each of a plurality of interdependent variables for generating a response to the input data.
- the set of interdependent variable values can be determined based on learned correlations. Alternatively and/or additionally, the set of interdependent variable values can be determined based on an index of values associated with particular tasks.
- the set of interdependent variable values can include a level of factuality, a level of creativity, and/or a level of safety to utilize when generating the output data (e.g., the response).
- the set of interdependent variable values can be associated with a set of thresholds, a range of values, a learned distribution, embedding values, and/or control tokens.
- the computing system can process the input data and the set of interdependent variable values with a generative model to generate output data.
- the generative model may have been trained on a training dataset.
- the training dataset can include a plurality of input examples, a plurality of respective output examples, and a plurality of respective interdependent variable labels associated with the plurality’ of respective output examples.
- the output data can include content data responsive to the text data.
- the generative model may have been trained to generate a plurality of control tokens based on the plurality of respective interdependent variable labels.
- the generative model can then utilize the control tokens to generate the output data, which can include selecting response data based on the control token.
- the response data can be utilized to determine how to structure the output data.
- the particular task can be descriptive of generating fictional prose.
- the set of interdependent variable values can be descriptive of a creativity value being higher than the factuality value.
- the content data can be descriptive of text associated with a learned creative output distribution.
- the creativity value of the set of variables can be descriptive of creativity being important in generation.
- the particular task can be descriptive of generating objectively true information.
- the set of interdependent variable values can be descriptive of a factuality value being higher than the creativity value.
- the content data can be descriptive of text associated with a factuality control token.
- the factuality value of the set of variables can be descriptive of factuality being important in generation.
- the computing system can provide the output data as an output.
- the output data can include text data, image data, audio data, video data, and/or multimodal data.
- the output data can be provided via a user interface.
- the user interface can provide the output data for display with the input data.
- the output data may be provided in a search results page, in a user interface of a dedicated generative web platform, in a suggest! ons/discovery page, and/or in a viewer window.
- Figure 9A depicts a block diagram of an example computing system 100 that performs variable-aware output generation according to example embodiments of the present disclosure.
- the system 100 includes a user computing device 102, a server computing system 130, and a training computing system 150 that are communicatively coupled over a network 180.
- the user computing device 102 can be any type of computing device, such as, for example, a personal computing device (e.g.. laptop or desktop), a mobile computing device (e.g., smartphone or tablet), a gaming console or controller, a wearable computing device, an embedded computing device, or any other type of computing device.
- the user computing device 102 includes one or more processors 112 and a memory 114.
- the one or more processors 112 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
- the memory 114 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM. EEPROM. EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
- the memory 114 can store data 1 16 and instructions 118 which are executed by the processor 112 to cause the user computing device 102 to perform operations.
- the user computing device 102 can store or include one or more generative models 120 (e.g., a language model and/or a text-to-image model).
- the generative models 120 can be or can otherwise include various machine-learned models such as neural networks (e.g., deep neural networks) or other ty pes of machine- learned models, including non-linear models and/or linear models.
- Neural networks can include feed-forward neural networks, recurrent neural networks (e g., long short-term memory recurrent neural networks), convolutional neural networks or other forms of neural networks.
- Example generative models 120 are discussed with reference to Figures 1 - 5.
- Machine-learned model(s) can be or include one or multiple machine-learned models or model components.
- Example machine-learned models can include neural networks (e.g., deep neural networks).
- Example machine-learned models can include non-hnear models or linear models.
- Example machine-learned models can use other architectures in lieu of or in addition to neural networks.
- Example machine-learned models can include decision tree based models, support vector machines, hidden Markov models, Bayesian networks, linear regression models, k-means clustering models, etc.
- Example neural networks can include feed-forward neural networks, recurrent neural networks (RNNs), including long short-term memory' (LSTM) based recurrent neural networks, convolutional neural networks (CNNs), diffusion models, generative-adversarial networks, or other forms of neural networks.
- RNNs recurrent neural networks
- CNNs convolutional neural networks
- Example neural networks can be deep neural networks.
- Some example machine-learned models can leverage an attention mechanism such as self-attention.
- some example machine-learned models can include multiheaded self-attention models.
- Machine-learned model(s) can include a single or multiple instances of the same model configured to operate on data from input(s).
- Machine-learned model(s) can include an ensemble of different models that can cooperatively interact to process data from input(s).
- machine-learned model(s) can employ a mixture-of-experts structure. See, e.g., Zhou et al.. Mixlure-of -Experts with Expert Choice Routing, ARXIV:2202.09368V2 (Oct. 14, 2022).
- Input(s) can generally include or otherwise represent various types of data. Input(s) can include one ty pe or many different ty pes of data. Output(s) can be data of the same type(s) or of different types of data as compared to input(s). Output(s) can include one type or many different types of data.
- Example data types for input(s) or output(s) include natural language text data, software code data (e.g., source code, object code, machine code, or any other form of computer-readable instructions or programming languages), machine code data (e.g., binary code, assembly code, or other forms of machine-readable instructions that can be executed directly by a computer's central processing unit), assembly code data (e.g., low-level programming languages that use symbolic representations of machine code instructions to program a processing unit), genetic data or other chemical or biochemical data, image data, audio data, audiovisual data, haptic data, biometric data, medical data, financial data, statistical data, geographical data, astronomical data, historical data, sensor data generally (e.g., digital or analog values, such as voltage or other absolute or relative level measurement values from a real or artificial input, such as from an audio sensor, light sensor, displacement sensor, etc.), and the like.
- software code data e.g., source code, object code, machine code, or any other form of computer-readable instructions or programming languages
- Data can be raw or processed and can be in any format or schema.
- example combinations of data types include image data and audio data, image data and natural language data, natural language data and software code data, image data and biometric data, sensor data and medical data, etc. It is to be understood that any combination of data types in an input or an output can be present.
- An example input can include one or multiple data types, such as the example data types noted above.
- An example output can include one or multiple data types, such as the example data types noted above.
- the data type(s) of input can be the same as or different from the data type(s) of output. It is to be understood that the example data types noted above are provided for illustrative purposes only. Data types contemplated within the scope of the present disclosure are not limited to those examples noted above.
- the one or more generative models 120 can be received from the server computing system 130 over network 180, stored in the user computing device memory 114. and then used or otherwise implemented by the one or more processors 112.
- the user computing device 102 can implement multiple parallel instances of a single generative model 120 (e.g., to perform parallel novel output generation across multiple instances of user prompt input).
- the generative model 120 can be trained to process a prompt input and generate an output.
- the generative model 120 can leverage learned associations (e.g., relationships) to generate outputs that are novel (e.g., not in the training dataset) and are prompt input specific. Additionally and/or alternatively, the generative model 120 can generate outputs that are variable-specific based on control tokens and/or prompt tuning.
- one or more generative models 140 can be included in or otherwise stored and implemented by the server computing system 130 that communicates with the user computing device 102 according to a client-server relationship.
- the generative models 140 can be implemented by the server computing system 140 as a portion of a web service (e.g., a generation service).
- a web service e.g., a generation service.
- one or more models 120 can be stored and implemented at the user computing device 102 and/or one or more models 140 can be stored and implemented at the server computing system 130.
- the user computing device 102 can also include one or more user input component 122 that receives user input.
- the user input component 122 can be a touch-sensitive component (e.g.. a touch-sensitive display screen or a touch pad) that is sensitive to the touch of a user input object (e.g., a finger or a stylus).
- the touch-sensitive component can serve to implement a virtual keyboard.
- Other example user input components include a microphone, a traditional keyboard, or other means by which a user can provide user input.
- the server computing system 130 includes one or more processors 132 and a memory 134.
- the one or more processors 132 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
- the memory 134 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory 7 devices, magnetic disks, etc., and combinations thereof.
- the memory' 134 can store data 136 and instructions 138 which are executed by the processor 132 to cause the server computing system 130 to perform operations.
- the server computing system 130 includes or is otherwise implemented by one or more server computing devices. In instances in which the server computing system 130 includes plural server computing devices, such server computing devices can operate according to sequential computing architectures, parallel computing architectures, or some combination thereof.
- the server computing system 130 can store or otherwise include one or more machine-learned generative models 140.
- the models 140 can be or can otherwise include various machine-learned models.
- Example machine-learned models include neural networks or other multi-layer non-linear models.
- Example neural networks include feed forward neural networks, deep neural networks, recurrent neural networks, and convolutional neural networks.
- Example models 140 are discussed with reference to Figures 1 - 5.
- the user computing device 102 and/or the server computing system 130 can train the models 120 and/or 140 via interaction with the training computing system 150 that is communicatively coupled over the network 180.
- the training computing system 150 can be separate from the server computing system 130 or can be a portion of the server computing system 130.
- the training computing system 150 includes one or more processors 152 and a memory 7 154.
- the one or more processors 152 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
- the memory 7 154 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory' devices, magnetic disks, etc., and combinations thereof.
- the memory 154 can store data 156 and instructions 158 which are executed by the processor 152 to cause the training computing system 150 to perform operations.
- the training computing system 150 includes or is otherwise implemented by one or more server computing devices.
- the training computing system 150 can include a model trainer 160 that trains the machine-learned models 120 and/or 140 stored at the user computing device 102 and/or the server computing system 130 using various training or learning techniques, such as, for example, backwards propagation of errors.
- a loss function can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function).
- Various loss functions can be used such as mean squared error, likelihood loss, cross entropy loss, hinge loss, and/or various other loss functions.
- Gradient descent techniques can be used to iteratively update the parameters over a number of training iterations.
- performing backwards propagation of errors can include performing truncated backpropagation through time.
- the model trainer 160 can perform a number of generalization techniques (e.g., weight decays, dropouts, etc.) to improve the generalization capability of the models being trained.
- the model trainer 160 can train the generative models 120 and/or 140 based on a set of training data 162.
- the training data 162 can include, for example, a plurality of training input examples, a plurality of respective training output examples, and/or a plurality of respective variable labels associated with the plurality' of respective training output examples.
- the training examples can be provided by the user computing device 102.
- the model 120 provided to the user computing device 102 can be trained by the training computing system 150 on user-specific data received from the user computing device 102. In some instances, this process can be referred to as personalizing the model.
- the model trainer 160 includes computer logic utilized to provide desired functionality.
- the model trainer 160 can be implemented in hardware, firmware, and/or software controlling a general purpose processor.
- the model trainer 160 includes program files stored on a storage device, loaded into a memory and executed by one or more processors.
- the model trainer 160 includes one or more sets of computer-executable instructions that are stored in a tangible computer-readable storage medium such as RAM hard disk or optical or magnetic media.
- An example machine-learned model can include a generative model (e.g., a large language model, a foundation model, a vision language model, an image generation model, a text-to-image model, an audio generation model, and/or other generative models).
- Training and/or tuning the machine-learned model can include obtaining a training instance.
- a set of training data can include a plurality- of training instances divided between multiple datasets (e g., a training dataset, a validation dataset, or testing dataset).
- a training instance can be labeled or unlabeled.
- the runtime inferences can form training instances when a model is trained using an evaluation of the model’s performance on that runtime instance (e.g., online training/leaming).
- Example data types for the training instance and various tasks associated therewith are described throughout the present disclosure.
- Training and/or tuning can include processing, using one or more machine- learned models, the training instance to generate an output.
- the output can be directly obtained from the one or more machine-learned models or can be a downstream result of a chain of processing operations that includes an output of the one or more machine-learned models.
- Training and/or tuning can include receiving an evaluation signal associated with the output.
- the evaluation signal can be obtained using a loss function. Various determinations of loss can be used, such as mean squared error, likelihood loss, cross entropy loss, hinge loss, contrastive loss, or various other loss functions.
- the evaluation signal can be computed using known ground-truth labels (e.g., supervised learning), predicted or estimated labels (e.g., semi- or self-supervised learning), or without labels (e.g., unsupervised learning).
- the evaluation signal can be a reward (e.g., for reinforcement learning).
- the reward can be computed using a machine-learned reward model configured to generate rewards based on output(s) received.
- the reward can be computed using feedback data describing human feedback on the output(s).
- Training and/or tuning can include updating the machine-learned model using the evaluation signal.
- values for parameters of the machine-learned model(s) can be learned, in some embodiments, using various training or learning techniques, such as, for example, backwards propagation.
- the evaluation signal can be backpropagated from the output (or another source of the evaluation signal) through the machine-learned model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the evaluation signal with respect to the parameter value(s)).
- system(s) containing one or more machine-learned models can be trained in an end-to-end manner.
- Gradient descent techniques can be used to iteratively update the parameters over a number of training iterations.
- performing backwards propagation of errors can include performing truncated backpropagation through time.
- Training and/or tuning can include implementing a number of generalization techniques (e.g., weight decays, dropouts, etc.) to improve the generalization capability of the models being trained.
- the above training loop can be implemented for training a machine-learned model from an initialized state to a fully trained state (e.g., when the model exhibits a desired performance profile, such as based on accuracy, precision, recall, etc.).
- the above training loop can be implemented for particular stages of a training procedure.
- the above training loop can be implemented for pre-training a machine-learned model.
- Pre-training can include, for instance, large-scale training over potentially noisy data to achieve a broad base of performance levels across a variety of tasks/data types.
- the above training loop can be implemented for fine-tuning a machine-learned model.
- Fine- tuning can include, for instance, smaller-scale training on higher-quality (e.g., labeled, curated, etc.) data. Fine-tuning can affect all or a portion of the parameters of a machine- learned model.
- various portions of the machine-learned model can be ‘frozen’' for certain training stages.
- parameters associated with an embedding space can be ‘frozen” during fine-tuning (e g., to retain information learned from a broader domain(s) than present in the fine-tuning dataset(s)).
- An example fine-tuning approach includes reinforcement learning. Reinforcement learning can be based on user feedback on model performance during use.
- the generative model may include language models (e.g., large language models and/or vision language models), image generation models (e.g., text-to-image generation models and/or image augmentation models), audio generation models, video generation models, graph generation models, and/or other data generation models (e.g.. other content generation models).
- the one or more generative models can include one or more transformer models, one or more convolutional neural networks, one or more recurrent neural networks, one or more feedforward neural networks, one or more generative adversarial networks, one or more self-attention models, one or more embedding models, one or more encoders, one or more decoders, and/or one or more other models.
- the one or more generative models can include one or more autoregressive models (e.g., a machine-learned model trained to generate predictive values based on previous behavior data) and/or one or more diffusion models (e g., a machine- learned model trained to generate predicted data based on generating and processing distribution data associated with the input data).
- autoregressive models e.g., a machine-learned model trained to generate predictive values based on previous behavior data
- diffusion models e.g., a machine- learned model trained to generate predicted data based on generating and processing distribution data associated with the input data.
- the one or more generative models can be trained to process input data and generate model-generated content items, which may include a plurality of predicted words, pixels, signals, and/or other data.
- the model -generated content items may include novel content items that are not the same as any pre-existing work.
- the one or more generative models 90 can leverage learned representations, sequences, and/or probability distributions to generate the content items, which may include phrases, storylines, settings, objects, characters, beats, lyrics, and/or other aspects that are not included in pre-existing content items.
- the one or more generative models may include a vision language model.
- the vision language model can be trained, tuned, and/or configured to process image data and/or text data to generate a natural language output.
- the vision language model may leverage a pre-trained large language model (e.g., a large autoregressive language model) with one or more encoders (e.g., one or more image encoders and/or one or more text encoders) to provide detailed natural language outputs that emulate natural language composed by a human.
- a pre-trained large language model e.g., a large autoregressive language model
- encoders e.g., one or more image encoders and/or one or more text encoders
- the vision language model may be utilized for zero-shot image classification, few shot image classification, image captioning, multimodal query distillation, multimodal question and answering, and/or may be tuned and/or trained for a plurality of different tasks.
- the vision language model can perform visual question answering, image caption generation, feature detection (e.g., content monitoring (e.g. for inappropriate content)), object detection, scene recognition, and/or other tasks.
- the vision language model may leverage a pre-trained language model that may then be tuned for multimodality. Training and/or tuning of the vision language model can include image-text matching, masked-language modeling, multimodal fusing with cross attention, contrastive learning, prefix language model training, and/or other training techniques.
- the vision language model may be trained to process an image to generate predicted text that is similar to ground truth text data (e.g., a ground truth caption for the image).
- the vision language model may be trained to replace masked tokens of a natural language template with textual tokens descriptive of features depicted in an input image.
- the training, tuning, and/or model inference may include multi-layer concatenation of visual and textual embedding features.
- the vision language model may be trained and/or tuned via jointly learning image embedding and text embedding generation, which may include training and/or tuning a system to map embeddings to a joint feature embedding space that maps text features and image features into a shared embedding space.
- the joint training may include image-text pair parallel embedding and/or may include triplet training.
- the images may be utilized and/or processed as prefixes to the language model.
- the machine-learned models can include machine- learned sequence processing models.
- An example system can pass inputs to sequence processing models.
- Sequence processing models can include one or more machine-learned components.
- Sequence processing models can process the data from inputs to obtain an input sequence.
- Input sequence can include one or more input elements obtained from inputs.
- the sequence processing model can process input sequence using prediction layers to generate an output sequence.
- the output sequence can include one or more output elements generated based on input sequence.
- the system can generate outputs based on output sequence.
- Sequence processing models can include one or multiple machine-learned model components configured to ingest, generate, or otherwise reason over sequences of information.
- some example sequence processing models in the text domain are referred to as “Large Language Models/’ or LLMs. See. e.g., PaLM 2 Technical Report, Google, https://ai.google/static/documents/palm2techreport.pdf (n d ).
- Other example sequence processing models can operate in other domains, such as image domains, see, e.g, Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, arXiv:2010.11929v2 (Jun.
- Sequence processing models can process one or multiple types of data simultaneously. Sequence processing models can include relatively large models (e.g.. more parameters, computationally expensive, etc.), relatively small models (e.g., fewer parameters, computationally lightweight, etc ), or both. [0165] In general, sequence processing models can obtain input sequence using data from inputs.
- input sequence can include a representation of data from inputs 2 in a format understood by sequence processing models.
- One or more machine-learned components of sequence processing models can ingest the data from inputs, parse the data into pieces compatible with the processing architectures of sequence processing models (e.g., via “tokenization”), and project the pieces into an input space associated with prediction layers (e.g., via “embedding”).
- Sequence processing models can ingest the data from inputs and parse the data into a sequence of elements to obtain input sequence. For example, a portion of input data from inputs can be broken down into pieces that collectively represent the content of the portion of the input data. The pieces can provide the elements of the sequence.
- processing the input data can include takenization.
- atokenizer may process a given portion of an input source and output a series of tokens (e.g., corresponding to input elements) that represent the portion of the input source.
- Various approaches to tokenization can be used.
- textual input sources can be tokenized using a byte-pair encoding (BPE) technique.
- BPE byte-pair encoding
- SentencePiece A simple arid language independent subword tokenizer and detokenizer for Neural Text Processing, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (System Demonstrations), pages 66-71 (October 31-November 4, 2018). https://aclanthology.org/D18-2012.pdf.
- Image-based input sources can be tokenized byextracting and serializing patches from an image.
- Prediction layers can predict one or more output elements based on the input elements.
- Prediction layers can include one or more machine-learned model architectures, such as one or more layers of learned parameters that manipulate and transform the inputs to extract higher-order meaning from, and relationships between, input elements.
- example prediction layers can predict new output elements in view of the context provided by input sequence.
- Prediction layers can evaluate associations between portions of input sequence and a particular output element. These associations can inform a prediction of the likelihood that a particular output follows the input context. For example, consider the textual snippet, “The carpenter's toolbox was small and heavy. It was full of Example prediction layers can identify that ’‘It” refers back to “toolbox” by determining a relationship between the respective embeddings. Example prediction layers can also link “It” to the attributes of the toolbox, such as “small” and “heavy.” Based on these associations, prediction layers can, for instance, assign a higher probability to the word “nails” than to the word “sawdust.” [0171] A transformer is an example architecture that can be used in prediction layers.
- a transformer is an example of a machine-learned model architecture that uses an attention mechanism to compute associations between items within a context window.
- the context window can include a sequence that contains input sequence and potentially one or more output elements.
- a transformer block can include one or more attention layers and one or more post-attention layers (e.g., feedforward layers, such as a multi-layer perceptron).
- Prediction layers can include other machine-learned model architectures in addition to or in lieu of transformer-based architectures. For example, recurrent neural networks (RNNs) and long short-term memory' (LSTM) models can also be used, as well as convolutional neural networks (CNNs). In general, prediction layers can leverage various kinds of artificial neural networks that can understand or generate sequences of information.
- Output sequence can include or otherwise represent the same or different data ty pes as input sequence. For instance, input sequence can represent textual data, and output sequence can represent textual data. The input sequence can represent image, audio, or audiovisual data, and output sequence can represent textual data (e.g., describing the image, audio, or audiovisual data). It is to be understood that prediction layers, and any other interstitial model components of sequence processing models, can be configured to receive a variety of data types in input sequences and output a variety of data types in output sequences.
- the output sequence can have various relationships to input sequence.
- Output sequence can be a continuation of input sequence.
- the output sequence can be complementary' to input sequence.
- the output sequence can translate, transform, augment, or otherwise modify input sequence.
- the output sequence can answer, evaluate, confirm, or otherwise respond to input sequence.
- the output sequence can implement (or describe instructions for implementing) an instruction provided via an input sequence.
- the output sequence can be generated autoregressively. For instance, for some applications, an output of one or more prediction layers can be passed through one or more output layers (e.g.. softmax layer) to obtain a probability distribution over an output vocabulary (e g., a textual or symbolic vocabulary ) conditioned on a set of input elements in a context window. In this manner, for instance, the output sequence can be autoregressively generated by sampling a likely next output element, adding that element to the context window, and re-generating the probability distribution based on the updated context window, and sampling a likely next output element, and so forth. [0176] The output sequence can also be generated non-autoregressively. For instance, multiple output elements of the output sequence can be predicted together without explicit sequential conditioning on each other. See, e.g., Saharia et al., Non-Autoregressive Machine Translation with Latent Alignments, arXiv:2004.07437v3 (Nov. 16, 2020).
- the output sequence can include one or multiple portions or elements.
- the output sequence can include multiple elements corresponding to multiple portions of a generated output sequence (e.g., a textual sentence, values of a discretized waveform, computer code, etc.).
- the output sequence can include a single element associated with a classification output.
- an output ‘‘vocabulary'’ can include a set of classes into which an input sequence is to be classified.
- a vision transformer block can pass latent state information to a multilayer perceptron that outputs a likely class value associated with an input image.
- the network 180 can be any type of communications network, such as a local area network (e.g., intranet), wide area network (e.g., Internet), or some combination thereof and can include any number of wired or wireless links.
- communication over the network 180 can be carried via any type of wired and/or wireless connection, using a wide variety 7 of communication protocols (e.g., TCP/IP, HTTP, SMTP, FTP), encodings or formats (e.g., HTML, XML), and/or protection schemes (e.g., VPN, secure HTTP, SSL).
- the input to the machine-learned model(s) of the present disclosure can be image data.
- the machine-learned model(s) can process the image data to generate an output.
- the machine-learned model(s) can process the image data to generate an image recognition output (e.g., a recognition of the image data, a latent embedding of the image data, an encoded representation of the image data, a hash of the image data, etc.).
- the machine-learned model(s) can process the image data to generate an image segmentation output.
- the machine- learned model(s) can process the image data to generate an image classification output.
- the machine-learned model(s) can process the image data to generate an image data modification output (e.g., an alteration of the image data, etc.).
- the machine-learned model(s) can process the image data to generate an encoded image data output (e.g., an encoded and/or compressed representation of the image data, etc.).
- the machine-learned model(s) can process the image data to generate an upscaled image data output.
- the machine-learned model(s) can process the image data to generate a prediction output.
- the input to the machine-learned model(s) of the present disclosure can be text or natural language data.
- the machine-learned model(s) can process the text or natural language data to generate an output.
- the machine- learned model(s) can process the natural language data to generate a language encoding output.
- the machine-learned model(s) can process the text or natural language data to generate a latent text embedding output.
- the machine- learned model(s) can process the text or natural language data to generate a translation output.
- the machine-learned model(s) can process the text or natural language data to generate a classification output.
- the machine-learned model(s) can process the text or natural language data to generate a textual segmentation output.
- the machine-learned model(s) can process the text or natural language data to generate a semantic intent output.
- the machine-learned model(s) can process the text or natural language data to generate an upscaled text or natural language output (e.g., text or natural language data that is higher quality than the input text or natural language, etc.).
- the machine-learned model(s) can process the text or natural language data to generate a prediction output.
- the input to the machine-learned model(s) of the present disclosure can be speech data.
- the machine-learned model(s) can process the speech data to generate an output.
- the machine-learned model(s) can process the speech data to generate a speech recognition output.
- the machine- learned model(s) can process the speech data to generate a speech translation output.
- the machine-learned model(s) can process the speech data to generate a latent embedding output.
- the machine-learned model(s) can process the speech data to generate an encoded speech output (e.g., an encoded and/or compressed representation of the speech data, etc.).
- an encoded speech output e.g., an encoded and/or compressed representation of the speech data, etc.
- the machine-learned model(s) can process the speech data to generate an upscaled speech output (e.g., speech data that is higher quality than the input speech data, etc.).
- the machine-learned model(s) can process the speech data to generate a textual representation output (e.g., a textual representation of the input speech data, etc.).
- the machine- learned model(s) can process the speech data to generate a prediction output.
- the input to the machine-learned model(s) of the present disclosure can be latent encoding data (e.g., a latent space representation of an input, etc.).
- the machine-learned model(s) can process the latent encoding data to generate an output.
- the machine-learned model(s) can process the latent encoding data to generate a recognition output.
- the machine-learned model(s) can process the latent encoding data to generate a reconstruction output.
- the machine-learned model(s) can process the latent encoding data to generate a search output.
- the machine-learned model(s) can process the latent encoding data to generate a reclustering output.
- the machine-learned model(s) can process the latent encoding data to generate a prediction output.
- the input includes visual data and the task is a computer vision task.
- the input includes pixel data for one or more images and the task is an image processing task.
- the image processing task can be image classification, where the output is a set of scores, each score corresponding to a different object class and representing the likelihood that the one or more images depict an object belonging to the object class.
- the image processing task may be object detection, where the image processing output identifies one or more regions in the one or more images and, for each region, a likelihood that region depicts an object of interest.
- the image processing task can be image segmentation, where the image processing output defines, for each pixel in the one or more images, a respective likelihood for each category' in a predetermined set of categories.
- the set of categories can be foreground and background.
- the set of categories can be object classes.
- the image processing task can be depth estimation, where the image processing output defines, for each pixel in the one or more images, a respective depth value.
- the image processing task can be motion estimation, where the network input includes multiple images, and the image processing output defines, for each pixel of one of the input images, a motion of the scene depicted at the pixel between the images in the network input.
- the task can be a generative task
- the one or more machine-learned models e.g., 120 and/or 140
- the inputs can be or otherwise represent data of one or more modalities that encodes context for generating additional content.
- the task can be a text completion task.
- the machine- learned models can be configured to process the inputs that represent textual data and to generate the outputs that represent additional textual data that completes a textual sequence that includes the inputs.
- the machine-learned models can be configured to generate the outputs to complete a sentence, paragraph, or portion of text that follows from a portion of text represented by inputs.
- the task can be an instruction following task.
- the machine-learned models can be configured to process the inputs that represent instructions to perform a function and to generate the outputs that advance a goal of satisfying the instruction function (e.g., at least a step of a multi-step procedure to perform the function).
- the outputs can represent data of the same or of a different modality as the inputs.
- the inputs can represent textual data (e.g., natural language instructions for a task to be performed) and the machine-learned models can process the inputs to generate the outputs that represent textual data responsive to the instructions (e.g., natural language responses, programming language responses, machine language responses, etc.).
- the inputs can represent image data (e.g., image-based instructions for a task to be performed, optionally accompanied by textual instructions) and the machine-learned models can process the inputs to generate the outputs that represent textual data responsive to the instructions (e.g., natural language responses, programming language responses, machine language responses, etc.).
- One or more outputs can be iteratively or recursively generated to sequentially process and accomplish steps toward accomplishing the requested functionality. For instance, an initial output can be executed by an external system or be processed by the machine-learned models to complete an initial step of performing a function. Multiple steps can be performed, with a final output being obtained that is responsive to the initial instructions.
- the task can be a question answering task.
- the machine-learned models can be configured to process the inputs that represent a question to answer and to generate the outputs that advance a goal of returning an answer to the question (e.g., at least a step of a multi-step procedure to perform the function).
- the outputs can represent data of the same or of a different modality as the inputs.
- the inputs can represent textual data (e.g., natural language instructions for a task to be performed) and the machine-learned models can process the inputs to generate the outputs that represent textual data responsive to the question (e.g.. natural language responses, programming language responses, machine language responses, etc.).
- the inputs can represent image data (e.g., image-based instructions for a task to be performed, optionally accompanied by textual instructions) and the machine-learned models can process the inputs to generate the outputs that represent textual data responsive to the question (e.g., natural language responses, programming language responses, machine language responses, etc.).
- One or more outputs can be iteratively or recursively generated to sequentially process and accomplish steps toward answering the question. For instance, an initial output can be executed by an external system or be processed by the machine-learned models to complete an initial step of obtaining an answer to the question (e.g., querying a database, performing a computation, executing a script, etc.). Multiple steps can be performed, with a final output being obtained that is responsive to the question.
- the task can be an image generation task.
- the machine-learned models can be configured to process the inputs that represent context regarding a desired portion of image content.
- the context can include text data, image data, audio data, etc.
- Machine-learned models can be configured to generate the outputs that represent image data that depicts imagery related to the context.
- the machine- learned models can be configured to generate pixel data of an image. Values for channels associated with the pixels in the pixel data can be selected based on the context (e.g., based on a probability determined based on the context).
- the task can be an audio generation task.
- Machine- learned models can be configured to process the inputs that represent context regarding a desired portion of audio content.
- the context can include text data, image data, audio data, etc.
- the machine-learned models can be configured to generate the outputs that represent audio data related to the context.
- the machine-learned models can be configured to generate waveform data in the form of an image (e.g., a spectrogram). Values for channels associated with pixels of the image can be selected based on the context.
- the machine- learned models can be configured to generate waveform data in the form of a sequence of discrete samples of a continuous waveform. Values of the sequence can be selected based on the context (e.g., based on a probability determined based on the context).
- the task can be a data generation task.
- Machine- learned models can be configured to process the inputs that represent context regarding a desired portion of data (e.g., data from various data domains, such as sensor data, image data, multimodal data, statistical data, etc.).
- the desired data can be, for instance, synthetic data for training other machine-learned models.
- the context can include arbitrary data types.
- the machine-learned models can be configured to generate the outputs that represent data that aligns with the desired data. For instance, the machine-learned models can be configured to generate data values for populating a dataset. Values for the data objects can be selected based on the context (e.g., based on a probability determined based on the context).
- Figure 9A illustrates one example computing system that can be used to implement the present disclosure.
- the user computing device 102 can include the model trainer 160 and the training dataset 162.
- the models 120 can be both trained and used locally at the user computing device 102.
- the user computing device 102 can implement the model trainer 160 to personalize the models 120 based on user-specific data.
- Figure 9B depicts a block diagram of an example computing device 40 that performs according to example embodiments of the present disclosure.
- the computing device 40 can be a user computing device or a server computing device.
- the computing device 40 includes a number of applications (e.g., applications 1 through N). Each application contains its own machine learning library and machine-learned model(s). For example, each application can include a machine-learned model.
- Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc.
- each application can communicate with a number of other components of the computing device, such as. for example, one or more sensors, a context manager, a device state component, and/or additional components.
- each application can communicate with each device component using an API (e.g., a public API).
- the API used by each application is specific to that application.
- Figure 9C depicts a block diagram of an example computing device 50 that performs according to example embodiments of the present disclosure.
- the computing device 50 can be a user computing device or a server computing device.
- the computing device 50 includes a number of applications (e.g., applications 1 through N). Each application is in communication with a central intelligence layer.
- Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc.
- each application can communicate with the central intelligence layer (and model(s) stored therein) using an API (e.g., a common API across all applications).
- the central intelligence layer includes a number of machine-learned models. For example, as illustrated in Figure 9C, a respective machine-learned model (e.g., a model) can be provided for each application and managed by the central intelligence layer. In other implementations, two or more applications can share a single machine-learned model. For example, in some implementations, the central intelligence layer can provide a single model (e.g., a single model) for all of the applications. In some implementations, the central intelligence layer is included within or otherwise implemented by an operating system of the computing device 50.
- a respective machine-learned model e.g., a model
- two or more applications can share a single machine-learned model.
- the central intelligence layer can provide a single model (e.g., a single model) for all of the applications.
- the central intelligence layer is included within or otherwise implemented by an operating system of the computing device 50.
- the central intelligence layer can communicate with a central device data layer.
- the central device data layer can be a centralized repository of data for the computing device 50.
- the central device data layer can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components.
- the central device data layer can communicate with each device component using an API (e.g., a private API).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
L'invention concerne des systèmes et des procédés d'optimisation à travers une pluralité de variables qui peuvent comprendre le conditionnement d'un modèle génératif pour générer des sorties ayant un niveau spécifique de factualité, de créativité et/ou de sécurité. Le modèle génératif peut être conditionné sur la base d'une ou de plusieurs techniques parmi une variété de techniques, qui peuvent comprendre l'utilisation de jetons de commande, de jetons de filtre, le décodage commandé, l'ajustement fin, l'ajustement par invite ou avec efficacité des paramètre, et/ou d'autres techniques, individuellement et/ou en combinaison. Les systèmes et les procédés peuvent comprendre des étiquettes d'apprentissage associées à différentes variables et/ou peuvent comprendre des associations de paramètres entre des variables données et des paramètres respectifs. Les systèmes et les procédés peuvent être utilisés pour conditionner un modèle génératif pour différentes tâches qui peuvent être associées à la priorisation de différentes variables.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363490076P | 2023-03-14 | 2023-03-14 | |
| US63/490,076 | 2023-03-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024191658A1 true WO2024191658A1 (fr) | 2024-09-19 |
Family
ID=92755729
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/018508 Pending WO2024191658A1 (fr) | 2023-03-14 | 2024-03-05 | Contrôleur d'intelligence artificielle responsable avec paramètres accordables pour une optimisation de modèle à travers de multiples variables |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024191658A1 (fr) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018017546A1 (fr) * | 2016-07-18 | 2018-01-25 | Google Llc | Formation de modèles d'apprentissage machine sur de multiples tâches d'apprentissage machine |
| US20200097766A1 (en) * | 2018-09-26 | 2020-03-26 | Nec Laboratories America, Inc. | Multi-scale text filter conditioned generative adversarial networks |
| US20210150355A1 (en) * | 2017-02-24 | 2021-05-20 | Deepmind Technologies Limited | Training machine learning models using task selection policies to increase learning progress |
| CN114238689A (zh) * | 2021-12-17 | 2022-03-25 | 北京百度网讯科技有限公司 | 视频生成方法、装置、电子设备、存储介质和程序产品 |
-
2024
- 2024-03-05 WO PCT/US2024/018508 patent/WO2024191658A1/fr active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018017546A1 (fr) * | 2016-07-18 | 2018-01-25 | Google Llc | Formation de modèles d'apprentissage machine sur de multiples tâches d'apprentissage machine |
| US20210150355A1 (en) * | 2017-02-24 | 2021-05-20 | Deepmind Technologies Limited | Training machine learning models using task selection policies to increase learning progress |
| US20200097766A1 (en) * | 2018-09-26 | 2020-03-26 | Nec Laboratories America, Inc. | Multi-scale text filter conditioned generative adversarial networks |
| CN114238689A (zh) * | 2021-12-17 | 2022-03-25 | 北京百度网讯科技有限公司 | 视频生成方法、装置、电子设备、存储介质和程序产品 |
Non-Patent Citations (1)
| Title |
|---|
| RINON GAL; YUVAL ALALUF; YUVAL ATZMON; OR PATASHNIK; AMIT H. BERMANO; GAL CHECHIK; DANIEL COHEN-OR: "An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 2 August 2022 (2022-08-02), US, XP091285859 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12482455B2 (en) | Systems and methods for training dual-mode machine-learned speech recognition models | |
| Johnsen | Large language models (LLMs) | |
| US20230401382A1 (en) | Dynamic Language Models for Continuously Evolving Content | |
| US20240135187A1 (en) | Method for Training Large Language Models to Perform Query Intent Classification | |
| US20240256964A1 (en) | Pretraining Already-Pretrained Models for Diverse Downstream Tasks | |
| US20250252137A1 (en) | Zero-Shot Multi-Modal Data Processing Via Structured Inter-Model Communication | |
| WO2024254051A1 (fr) | Recherche autonome d'informations visuelles avec des modèles de langage appris automatiquement | |
| US12494005B2 (en) | Techniques for generating dynamic content | |
| US20250124256A1 (en) | Efficient Knowledge Distillation Framework for Training Machine-Learned Models | |
| US20250131321A1 (en) | Efficient Training Mixture Calibration for Training Machine-Learned Models | |
| CN118468868A (zh) | 使用潜变量推断来调谐生成模型 | |
| US20250061312A1 (en) | Knowledge Graphs for Dynamically Generating Content Using a Machine-Learned Content Generation Model | |
| US20250086554A1 (en) | Recommendation systems integrating extended reality to generate personalized recommendations | |
| WO2024191658A1 (fr) | Contrôleur d'intelligence artificielle responsable avec paramètres accordables pour une optimisation de modèle à travers de multiples variables | |
| US20250356210A1 (en) | Calibrated Distillation | |
| WO2025095958A1 (fr) | Adaptations en aval de modèles de traitement de séquence | |
| WO2025101175A1 (fr) | Classification d'image agile centrée sur llm | |
| US12307198B1 (en) | Multi-speaker speech signal to text signal validation | |
| US20250315668A1 (en) | Domain-Specific Generative Model for Generating News Content Items | |
| US20250315609A1 (en) | Infrastructure for Interfacing with a Generative Model for Content Evaluation and Customization | |
| US20250355958A1 (en) | On-Demand Generative Response Simplification | |
| US20250265087A1 (en) | Machine-Learned Model Alignment With Synthetic Data | |
| US20250315428A1 (en) | Machine-Learning Collaboration System | |
| US20250356223A1 (en) | Machine-Learning Systems and Methods for Conversational Recommendations | |
| US20250244960A1 (en) | Generative Model Integration with Code Editing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24771398 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |