WO2025226270A1

WO2025226270A1 - Request processing using a sequence of generative neural networks

Info

Publication number: WO2025226270A1
Application number: PCT/US2024/026234
Authority: WO
Inventors: Florian Nils HARTMANN
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2024-04-25
Filing date: 2024-04-25
Publication date: 2025-10-30
Anticipated expiration: 2026-10-25

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing a request using a sequence of generative neural networks. One of the methods includes receiving an input for a task; and processing the input by a sequence of generative neural networks, including: for each intermediate generative neural network in the sequence, receiving a previous request and a previous summary generated from a previous generative neural network in the sequence; generating, based on the previous request and the previous summary, a next request; determining whether the next request is consistent with the previous summary; in response to determining that the next request is consistent with the previous summary, generating, based on the previous request and the previous summary, a next summary; and providing the next request and the next summary to the next generative neural network in the sequence.

Description

REQUEST PROCESSING USING A SEQUENCE

OF GENERATIVE NEURAL NETWORKS

BACKGROUND

[1] This specification relates to processing a request using generative neural networks.

[2] Generative neural networks have demonstrated state of the art performance across a wide range of tasks, such as text generation (e.g., writing, summarization, translation, coding), image generation, and audio generation. Generative neural networks use very large neural network models that are trained on vast amounts of data. For example, a large language model (LLM) can include a transformer-based neural netw ork model with selfattention capabilities and can achieve general-purpose language understanding and generation in response to a query. Thus, generative neural networks are being deployed in various applications, e.g., as a coding assistant, as an email writing assistant, and for generating images in a presentation.

SUMMARY

[3] This specification is related to processing a request using a sequence of generative neural networks. A sequence of generative neural networks can be invoked to provide a response to a request. A generative neural network can interact with another generative neural network using natural language as the interface. For example, an LLM in a sequence of LLMs can process a request related to a task and can invoke the next LLM in the sequence to handle the task, until the last LLM in the sequence provides the final output for the task. In some cases, a generative neural network in the sequence can obtain additional information related to the task. In this chain of interactions, it is important that information related to the task does not degrade at each step of the process.

[4] However, it is often impossible to pass along all the information at every step of the process. First, because of the memory constraint of the generative neural network systems, the size of an input to the generative neural netw ork is often limited. Second, for privacyreasons, a generative neural network may not be allowed to forward all the information to the next generative neural network operated by another party. Thus, information can get lost in the process using the sequence of generative neural networks, which is not desired. Furthermore, the loss of information can add up along the sequence of the generative neural networks and can cause severe, e.g., exponential, degradation of information. A final output of the sequence may not properly respond to, or may not be related to the input for the task. [5] This specification describes systems and techniques for maintaining information in a sequence of generative neural networks to ensure that important information in the initial request is not lost. In particular, the systems and techniques maintain and update a summary of the information at each generative neural network in the sequence and use the summary to verify whether a request for the next generative neural network is consistent with the summary.

[6] In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving an input for a task to be processed by a sequence of generative neural networks to generate a final output for the task, the sequence including an initial generative neural network, one or more intermediate generative neural networks, and a final generative neural network; and processing the input by the sequence of the generative neural networks, including: for each intermediate generative neural network, receiving, by the generative neural network, a previous request and a previous summary generated from a previous generative neural network in the sequence; generating, by the generative neural network and based on the previous request and the previous summary, a next request; determining, by the generative neural network, whether the next request is consistent with the previous summary; in response to determining that the next request is consistent with the previous summary⁷, generating, by the generative neural network and based on the previous request and the previous summary, a next summary; and providing, by the generative neural network, the next request and the next summary to the next generative neural network in the sequence. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions.

[7] The foregoing and other embodiments can each optionally include one or more of the following features, alone or in combination. In particular, one embodiment includes all the following features in combination. The previous summary⁷ includes properties of the input and the previous request, and determining whether the next request is consistent with the previous summary includes determining whether properties of the next request is consistent with the properties included in the previous summary. Generating the next summary includes: obtaining, by the generative neural network, new information based on the previous request; and generating, by the generative neural network, the next summary based on the previous request, the previous summary, and the new information. Generating the next summary includes generating, by the generative neural network, the next summary based on the previous request, the previous summary, and the next request. The actions include, for a particular intermediate generative neural network: generating, by the generative neural network and based on the previous request and the previous summary', an initial next request; determining, by the generative neural network, whether the initial next request is consistent with the previous summary; and in response to determining that the initial next request is not consistent with the previous summary, generating, by the generative neural network and based on the previous request and the previous summary, the next request. Generating the next request includes generating, by the generative neural network, the next request based on (i) the previous request, (ii) the previous summary, and (hi) data indicating that the initial next request is not consistent with the previous summary. The actions include receiving, by the initial generative neural network, the input; processing, by the initial generative neural network, the input to generate the next request and the next summary'; and providing, by the initial generative neural netw ork, the next request and the next summary to the next generative neural network in the sequence. The actions include receiving, by the final generative neural network, the previous request and the previous summary generated from a previous intermediate generative neural network in the sequence; and processing, by the final generative neural network, the previous request and the previous summary to generate the final output for the task.

[8] Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages.

[9] The systems and techniques can use a summary of the information in the initial request as a natural language checksum to ensure that important information in the initial request is not lost. In some implementations, each generative neural network in the sequence can define their own summary, e.g., adding new- information obtained at the generative neural netw ork to the summary', and can use the summary' to ensure that both information in the initial request and the new information is preserved in the sequence.

[10] Some generative neural networks have a limited context length due to memory constraints and thus it is often impossible to pass along all the information to each generative neural network in a sequence of generative neural networks. The systems and methods described in this specification can generate a summary of the information in the initial request and optionally additional information collected by some intermediate generative neural networks in the sequence. Instead of passing along the full information, the systems and techniques can pass the summary', thus saving memory consumed by the systems, while still ensuring that the important information is maintained in the summary. Furthermore, by passing the summary instead of the full information, the system can send less data over the network that connects the generative models implemented at different devices or in the cloud, thus reducing the network bandwidth consumption by the sequence of generative neural networks.

[11] Some information in the initial request may include private data, such as personal identification information, financial information, etc., and should be not provided to other generative neural networks in the sequence. The systems and methods described in this specification can generate a summary of the information that does not include the private data. Thus, important information of the initial request can be passed along in the sequence of the generative neural networks without leaking private information, while still ensuring that the important information is maintained in the summary.

[12] The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[13] FIG. 1 is a diagram of an example system.

[14] FIG. 2A is a diagram of an example system for maintaining information in a sequence of generative neural networks.

[15] FIG. 2B is a diagram of an example system for maintaining information in a sequence of generative neural networks.

[16] FIG. 3 is a flow chart of an example process for maintaining information in a sequence of generative neural networks.

[17] FIG. 4 is a flow chart of an example process performed by each intermediate generative neural network for maintaining information in a sequence of generative neural networks. [18] Like reference numbers and designations in the various draw ings indicate like elements.

DETAILED DESCRIPTION

[19] FIG. 1 is a diagram of an example system 100. The system 100 is an example of a system implemented as computer programs on one or more computers in one or more locations, in which the systems, components, and techniques described in this specification are implemented. The one or more computers can include personal computers, mobile communication devices, servers, and other devices that can send and receive data over a network. The network (not shown), such as a local area network (“LAN’'), wide area network (“WAN”), the Internet, or a combination thereof, connects the one or more computers that implements the system. The system can use a single computer or multiple computers operating in conjunction with one another, including, for example, a set of remote computers deployed as a cloud computing service.

[20] The system 100 receives an input for a task and generates an output for the task. In order to perform the task, the system uses a sequence of generative neural networks. Each generative neural network receives an input, e.g., the original input for the task or the output of a preceding generative neural network, and generates an output. However, in some systems, at least a portion of information passed through the sequence of generative neural networks sometimes is lost, e.g.. due to memory constraints or privacy constraints.

[21] The system 100 maintains information in a sequence of generative neural networks 104 to ensure that important information in an input 102 requesting a response for a task is not lost. The sy stem 100 maintains and updates a summary 108 of the important information at each intermediate generative neural network 104 in the sequence. The summary can include a description or properties of the input for the task and other information the system obtains for the task.

[22] Some generative neural networks have a limited context length. The context length of a generative neural network, e g., a language model neural network, is the maximum number of tokens that can be included in an input to the neural network. The context length can be determined, e.g., based on the memory or computation resource available on the hardware device on which the neural network is deployed, or based on the context length used during training of the neural network. Thus, it is often impossible to pass along all the information to each generative neural network in a sequence of generative neural networks. [23] The summary' 108 can have a shorter length than the data for all the information the system obtains up to a cunent time point. Thus, the summary can fit within the context length of the generative neural networks in the sequence.

[24] The system 100 verifies whether a request 106 for the next generative neural network is consistent with the summary' 108 to ensure that important information in the initial request is not lost. For example, at the second generative neural network 104(2), the system 100 can determine whether the request 106(2) for the next generative neural network 104(3) is consistent with the summary' 108(1) received by the second generative neural network 104(2).

[25] A generative neural network is a machine learning (ML) model that generates content, including text, images, audio, or other synthetic data, based on an input. During inference, each generative neural network 104 can generate an output, e.g., a request 106 to be sent to the next generative neural network 104 or a final output 110, in response to a query', e.g., the input 102 provided by a user device or a request from a previous generative neural network. The input 102 can include text data, e.g., a question or a search for a piece of information. In some implementations, the input 102 can include image, video, or audio. In some implementations, the generative neural network 104 can process a multi-modal input, e.g., a combination of text, image, video, and audio. For example, the input 102 can include an image and a corresponding question related to the image.

[26] In some implementations, the generative neural network 104 can be configured to process an input sequence of tokens to generate an output sequence of tokens. The tokens can represent any appropriate type of content, e.g., text, image, video, audio, or some combination of the above. For example, the generative neural network 104 can be a large language model (LLM) and can be configured to process an input sequence of tokens from a vocabulary of text tokens to generate an output sequence of tokens from vocabulary.

[27] More generally, the generative neural network 104 can be any appropriate neural network that receives an input sequence that includes text tokens and auto-regressively generates an output sequence that includes text tokens. For example, the generative neural network 104 can be a Transformer-based language model neural network or a recurrent neural network-based language model neural network.

[28] In some situations, the generative neural netw ork 104 can be referred to as an autoregressive neural network when the neural network used to implement the language model auto-regressively generates an output sequence of tokens. More specifically, the auto- regressively generated output is created by generating each particular token in the output sequence conditioned on a current input sequence that includes any tokens that precede the particular token in the output sequence, i.e., the tokens that have already been generated for any previous positions in the output sequence that precede the particular position of the particular token, and a context input that provides context for the output sequence.

[29] For example, the current input sequence when generating a token at any given position in the output sequence can include the input sequence and the tokens at any preceding positions that precede the given position in the output sequence. As a particular example, the current input sequence can include the input sequence followed by the tokens at any preceding positions that precede the given position in the output sequence. Optionally, the input and the current output sequence can be separated by one or more predetermined tokens within the current input sequence.

[30] More specifically, to generate a particular token at a particular position within an output sequence, the generative neural network 104 can process the current input sequence to generate a score distribution (e.g., a probability distribution) that assigns a respective score, e.g., a respective probability, to each token in a vocabulary of tokens. The generative neural network 104 can then select, as the particular token, a token from the vocabulary using the score distribution. For example, the neural network of the language model can greedily select the highest-scoring token or can sample, e.g., using nucleus sampling or another sampling technique, a token from the distribution.

[31] As a particular example, the generative neural network 104 can be an auto-regressive Transformer-based neural network that includes (i) a plurality of attention blocks that each apply a self-attention operation and (ii) an output subnetwork that processes an output of the last attention block to generate the score distribution.

[32] The generative neural network 104 can have any of a variety of Transformer-based neural network architectures. Examples of such architectures include those described in J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. d. L. Casas, L. A. Hendricks, J. Welbl, A. Clark, et al. Training compute-optimal large language models, arXiv preprint arXiv:2203.15556, 2022; J.W. Rae, S. Borgeaud, T. Cai, K. Millican, J. Hoffmann. H. F. Song. J. Aslanides. S. Henderson, R. Ring, S. Young. E. Rutherford, T. Hennigan, J. Menick, A. Cassirer, R. Powell, G. van den Driessche, L. A. Hendricks, M. Rauh, P. Huang, A. Glaese, J. Welbl, S. Dathathri, S. Huang, J. Uesato, J. Mellor, I. Higgins, A. Creswell, N. McAleese, A.Wu, E. Eisen, S. M. Jayakumar, E. Buchatskaya, D. Budden, E. Sutherland, K. Simonyan, M. Paganini, L. Sifre, L. Martens, X. L. Li, A. Kuncoro. A. Nematzadeh, E. Gribovskaya, D. Donato, A. Lazaridou, A. Mensch, J. Lespiau, M. Tsimpoukelli, N. Grigorev, D. Fritz, T. Sottiaux, M. Pajarskas, T. Pohlen, Z. Gong, D. Toyama, C. de Masson d’Autume. Y. Li, T. Terzi, V. Mikulik, I. Babuschkin, A. Clark, D. de Las Casas, A. Guy, C. Jones, J. Bradbury, M. Johnson, B. A. Hechtman, L. Weidinger, I. Gabriel, W. S. Isaac, E. Lockhart, S. Osindero, L. Rimell, C. Dyer, O. Vinyals, K. Ayoub, J. Stanway, L. Bennett, D. Hassabis, K. Kavukcuoglu, and G. Irving. Scaling language models: Methods, analysis & insights from training gopher. CoRR, abs/2112. 11446, 2021; Cohn RaffeL Noam Shazeer. Adam Roberts. Katherine Lee. Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683, 2019; Daniel Adiwardana, Minh- Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, and Quoc V. Le. Towards a humandike opendomain chatbot. CoRR, abs/2001.09977, 2020; and Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam. Girish Sastry. Amanda Askell, et al. Language models are few-shot learners. arXiv preprint arXiv:2005. 14165, 2020.

[33] In some implementations, the generative neural network 104 can use a decoder-only architecture that includes many decoder blocks, and without using an encoder. Each decoder block can include a self-attention layer and a feed forward neural network. The transformerbased generative neural network is an example of a generative neural network that the systems and techniques herein can be applicable.

[34] More generally, the system and techniques described herein are applicable to other ty pes of generative models. One example of the generative neural network 104 can be a latent diffusion model. As another example, the generative neural network 104 can be a diffusion model that uses a text-to-image diffusion model to generate a first image, and then applies one or more super-resolution diffusion models to generate a final image. As another example, the generative neural network 104 can be an auto-regressive generative model that auto- regressively generates tokens representing audio, video, images, or other data. As yet another example, the generative neural network 104 can be a masked token generative model that sequentially unmasks tokens that represent text, video, audio, images, or other data during generation.

[35] The system 100 receives an input 102 for a task to be processed by a sequence of generative neural networks 104 to generate a final output 110 for the task. For example, a user could ask a sequence of generative neural networks to perform a travel booking task or to perform a writing task. [36] As another example, the input can be for an advertisement negotiation task. The negotiating parties can each have an agent implementing a generative neural network. The agents for publishers, advertisers, and website owners can negotiate, in natural language, regarding where advertisements are shown and how much they are worth in an automatic way. This could involve a lot of back and forth among the multiple agents.

[37] As another example, the input can be for a task of arranging events in calendars. Resolving conflicts in arranging events in calendars can involve a sequence of generative neural networks. The system can receive input from user A for booking a calendar event with user B, which conflicts with user B’s calendar event with user C. Each user can have an agent implementing a generative neural network. The agents for users A, B, and C can interact with each other to resolve the conflict. In some cases, during the interaction, the data that Agent A provides to Agent B can be relevant to how Agent B interacts with Agent C. However, Agent B cannot just forward the raw data to Agent C, due to memory constraints or privacy constraints.

[38] As another example, the input can be for a task of generating training examples. Generating training examples can involve a sequence of generative neural networks. A first generative neural network can generate initial training examples for a second generative neural network. The second generative neural network can generate, based on the initial training examples, subsequent training examples for a third generative neural network. Because the initial training examples can include private information, the second generative neural network cannot just forward the initial training examples to the third generative neural network.

[39] In some implementations, the sequence of generative neural networks 104 can be implemented locally with regard to one another. For example, the sequence of generative neural networks 104 can be implemented on the same user device.

[40] In some implementations, the sequence of generative neural networks 104 can be implemented remotely from one another. For example, one generative neural network in the sequence can run on a user device and other generative neural networks can run on other user devices or in the cloud.

[41] The sequence of generative neural networks includes an initial generative neural network, one or more intermediate generative neural networks, and a final generative neural network. For example, the sequence of generative neural networks 104 includes an initial generative neural network 104(1), two intermediate generative neural networks 104(2) and 104(3), and a final generative neural network 104(4). [42] In some implementations, the system 100 can invoke the first generative neural network in response to receiving an input 102 for a task, and each generative neural network can invoke the next generative neural network in the sequence. In some implementations, the system 100 can invoke the sequence of generative neural networks in response to receiving the input 102. The sequence of generative neural networks can be models of different sizes, different neural network architectures, different capabilities, or different specializations.

[43] For example, the sequence of generative neural networks can include one neural work model with fewer parameters and another neural network model with more parameters.

[44] As another example, the sequence of generative neural networks can include a first neural work model trained to perform a broad range of tasks, and a second neural network model trained on training data in a task domain to perform a specific task in the task domain. The sequence of generative neural networks can use natural language as a common interface and can communicate with each other.

[45] In some implementations, the first neural network can be a smaller general neural network on the user device, and the other neural networks be larger or more specialized. For example, the system can receive an input asking for suggestions for rewriting text data, e.g.. grammar error correction or waiting style suggestions. The system can first process the input using a small on-device generative neural netw ork which is capable of handling some rewriting tasks. However, the small on-device generative neural network may need to delegate the task to a larger generative neural network on a server. When the on-device generative neural network sends the task to the server side generative neural network, the on- device generative neural network should not leak private data related to the task.

[46] In some implementations, all the generative neural networks in the sequence can be implemented in the cloud. For example, for a calendar scheduling task among multiple people, each person can have their own agent. All the agents can implement their respective generative models in the cloud, but the models can be from different model providers. As another example, for a travel booking task, the sequence of generative models can include a personal assistant agent, an agent representing a travel agency, an agent representing a hotel, and an agent representing a flight provider. All the agents can be implemented in the cloud.

[47] The initial generative neural network 104(1) receives the input 102 for a task. The initial generative neural network 104(1) can determine to ask the next generative neural network 104(2) in the sequence to help respond to the input.

[48] In some implementations, a user may request the initial generative neural network to invoke one or more other generative neural networks to generate an output for the task. [49] In some implementations, the initial generative neural network can process the input and generate an initial output for the task. Based on the initial output, the initial generative neural network can determine to invoke one or more other generative neural networks to generate an output for the task.

[50] The initial generative neural network 104(1) can generate a request 106(1) based on the input 102 and can provide the request 106(1) to the next generative neural network 104(2). The request can be a query in natural language, e.g., text data, which can be processed by the next generative neural network. In some implementations, the request can include a combination of image, video, audio, and text.

[51] Before the initial generative neural network 104(1) passes the request 106(1) to the next generative neural network 104(2), the initial generative neural network 104(1) can generate a summary 108(1). For example, the system 100 can provide a prompt to the initial generative neural network 104(1), asking the initial generative neural network 104(1) to summarize the input 102 in natural language. The prompt can be in the form of an instruction that can be processed by the initial generative neural network 104(1).

[52] In some implementations, the initial generative neural network 104 may have generated an initial output, the system may ask the initial generative neural network to summarize both the input and the initial output.

[53] The summary 108(1) can include a description or properties of the input 102 for the task. In some implementations, the summary can be generated based on the input 102 and the request 106(1). For example, the system 100 can generate a prompt saying: “you chose this request to be passed on, can you describe to me very compactly what are the most important properties in this input?’'

[54] The system 100 can use the summary to ensure that information relevant for the task is not lost in the sequence. Because the summary is smaller than the full information, the system 100 can save memory consumption by not passing along the entire information relevant for the task. Additionally, the system 100 can send less data over the network that connects the generative models implemented at different devices or in the cloud.

[55] In some implementations, the system 100 can generate a summary that does not include private data, such as particular names, addresses, numbers, personal identifiable information, financial data, or company proprietary information, thus avoiding leaking private data to other generative neural networks in the sequence. In some implementations, in the prompt for generating the summary, the system can ask the generative neural network to verily whether the summary includes private data. In some implementations, the system can provide that summary' to a verifier system and the verifier system can generate an output indicating whether the summary includes private data. If the system determines that the summary includes private data, the system can reject the summary and ask the generative neural network to generate a new summary, e.g., a new summary that does not include private data.

[56] The initial generative neural network 104(1) provides the request 106(1) and the summary 108(1) to the next generative neural network 104(2) in the sequence.

[57] Each intermediate generative neural network receives a previous request and a previous summary' generated from a previous generative neural network in the sequence. For example, the intermediate generative neural network 104(2) receives a previous request 106(1) and a previous summary 108(1) generated from a previous generative neural network 104(1) in the sequence.

[58] The intermediate generative neural network generates a next request based on the previous request and the previous summary. For example, the intermediate generative neural network 104(2) generates a next request 106(2) based on the previous request 106(1) and the previous summary 108(1).

[59] In some implementations, the intermediate generative neural network can use the previous request and other information it has obtained, e.g., information that is locally available, to generate the next request to the next generative neural network in the sequence. For example, intermediate generative neural network 104(2) generates a next request 106(2) based on the previous request 106(1), the previous summary 108(1) and information obtained by the intermediate generative neural network 104(2) related to the task.

[60] The system 100 determines whether the next request is consistent with the previous summary. Before sending the next request, the system 100 can use the previous summary to check whether the next request adheres to the previous summary. In some implementations, the system can provide a prompt to the intermediate generative neural network with the previous summary and the next request, and the prompt can ask the intermediate generative neural network to determine whether the next request is consistent with the previous summary. The system 100 can provide a prompt to the intermediate generative neural network that causes the intermediate generative neural network to determine whether properties of the next request are consistent with properties of the input included in the previous summary. For example, the generative neural network 104(2) can determine whether the next request 106(2) is consistent with the previous summary 108(1). [61] In some implementations, instead of directly prompting the intermediate generative neural network to determine whether the next request adheres to the previous summary, the system can prompt the intermediate generative neural network to reason about the decision through a series of intermediate reasoning steps, thus improving the ability of the intermediate generative neural network to perform complex reasoning.

[62] In some implementations, the system can use the intermediate generative neural network to generate a consistency score based on the previous summary and the next request. The system can compare the consistency score with a threshold. If the consistency score is higher than a threshold, the system can determine that the next request is consistent with the previous summary. The threshold can be a predetermined value tuned based on an evaluation data set. In some implementations, the system can generate the consistency score by processing the previous summary and the next request using a string formatting template.

[63] If the system 100 determines that the next request is consistent with the previous summary, the intermediate generative neural network can generate a next summary based on the previous request and the previous summary. For example, in response to determining that the next request 106(2) is consistent with the previous summary' 108(1), the intermediate generative neural network 104(2) can generate the next summary 108(2) based on the previous request 106(1) and the previous summary' 108(1).

[64] In some implementations, the intermediate generative neural network can generate the next summary based on the previous request, the previous summary’, and new information obtained by the intermediate generative neural network. The new information obtained by the intermediate generative neural network can include new input provided by a user, data generated by the intermediate generative neural network, or any other data obtained by the intermediate generative neural network. For example, the intermediate generative neural network can use an internet search engine to obtain initial search results for the task. The intermediate generative neural network can invoke a subsequent generative neural network to analyze the initial search results. The intermediate generative neural network can generate a summary based on the previous request and the previous summary received by the intermediate generative neural network, and the initial search results for the task.

[65] In some implementations, no relevant information is obtained by the intermediate generative neural network, and the intermediate generative neural network can forward the same summary, e.g., the previous summary, to the next generative neural network. For example, the intermediate generative network can determine, instead of processing the request by itself, to invoke another generative neural network to process the request. Because the intermediate generative network has not received or generated any new information, instead of generating a new summary, the intermediate generative network can forward the summary it received to the next generative neural network.

[66] In some implementations, the intermediate generative neural network can generate the next summary⁷ based on the previous request, the previous summary', and the next request. For example, certain properties of the new request can be relevant for responding to the next request. The intermediate generative neural network can generate the summary not only based on the previous request and the previous summary, but also based on the next request, such that relevant properties of the next request can be included in the next summary'.

[67] In some implementations, the intermediate generative neural network can generate the next request and the next summary’ sequentially. In some implementations, the intermediate generative neural network can generate both the next request and the next summary at the same time.

[68] In some implementations, the intermediate generative neural network can be specifically trained on training data to generate summaries based on previous requests and previous summaries. In some implementations, the intermediate generative neural network has not been specifically^ trained to generate summary, and the intermediate generative neural network can generate the next summary’ by conditioning on in-context examples without updating any parameters for the intermediate generative neural network, e.g.. though incontext learning.

[69] In some implementations, after generating the next summary, the system 100 can double check whether the next summary' is consistent with the previous summary. The system 100 can send a prompt input to the intermediate generative neural network, asking it to check whether the next summary is consistent with the previous summary. For example, the system can ask the intermediate generative neural network to generate a consistency score measuring the consistency between the next summary and the previous summary, and the system can determine whether a consistency score between the next summary' and the previous summary' is larger than a threshold. If the system 100 determines that the next summary’ is not consistent with the previous summary, the system 100 can ask the intermediate generative neural network to regenerate a summary that is consistent with the previous summary.

[70] If the system 100 determines that the next request is not consistent with the previous summary, the system 100 can send a prompt input to the intermediate generative neural network, asking it to refine the next request, thus ensuring the right information gets passed along the sequence of the generative neural networks. The intermediate generative neural network can regenerate the next request based on the previous request and the previous summary. For example, the system can ask an LLM with a prompt saying, “please reconcile the inconsistency.”

[71] In some implementations, in response to determining that the next request is not consistent with the previous summary, the intermediate generative neural network can generate multiple candidate next requests based on the previous request and the previous summary. Then, the intermediate generative neural network can select the next request from the candidate next requests.

[72] In some implementations, the intermediate generative neural network can regenerate the next request based on (i) the previous request, (ii) the previous summary, and (iii) data indicating that the initial next request is not consistent with the previous summary. Thus, the intermediate generative neural network is more likely to generate a next request that is consistent with the previous summary. For example, the system can ask an LLM with a prompt saying, “you already generated a request, and the request is not consistent with the previous summary, please try to incorporate this information to generate a new request.”

[73] In some implementations, the intermediate generative neural network can regenerate the next request multiple times until a satisfactory request is generated. In some implementations, after generating the next request for a threshold number of times, the intermediate generative neural network can select, among the requests generated for the threshold number of times, a request that is most consistent with the previous summan', e.g.. the request having the highest consistency score determined by the intermediate generative neural network. The intermediate generative neural network can provide the selected request as an input to the subsequent generative neural network.

[74] The intermediate generative neural network can provide the next request and the next summary to the next generative neural network in the sequence. For example, the intermediate generative neural network 104(2) can provide the next request 106(2) and the next summary⁷ 108(2) to the next generative neural network 104(3) in the sequence. This continues until the final generative neural network receives a request.

[75] The final generative neural network 104(4) receives the previous request 106(3) and the previous summary 108(3) generated from a previous intermediate generative neural network 104(3) in the sequence. The final generative neural network 104(4) processes the previous request 106(3) and the previous summary⁷ 108(3) to generate the final output 110 for the task. The system 100 can present the final output 110 on a device to a user. In some implementations, the sequence of generative neural networks can be implemented on the same device or remotely with regard to one another. In some implementations, the final generative neural network can be physically remote from a user device of the user. The system 100 can send the final output 1 10 from the remote computer that implements the final generative neural network to the user device for presentation to the user over the network.

[76] FIG. 2A is a diagram of an example system 200 for maintaining information in a sequence of generative neural networks.

[77] A user can provide an input 202 to a local generative neural network 204(1) on a user device, such as a mobile phone. For example, the user may want to book a hotel and the input 202 can be text data such as “book a hotel.” Here, the task is to book a hotel for the user.

[78] The system 200 can generate a final output 210 for input 202 for the task through a sequence of generative neural networks interacting with each other.

[79] The local generative neural network 204(1) can be a local on-device assistant that has access to user preferences of hotels. The local generative neural network 204(1) can process the input 202 and can decide to ask a larger, more capable, server-side generative neural network 204(2) for help. The local generative neural network 204(1) can generate a request 206(1) to be processed by the server-side generative neural network 204(2). The request 206(1) can include other information related to the task. For example, the request 206(1) can include: “The user wants to book a hotel. Here are five hotels they previously booked and whether they liked them.” The request 206(1) can include a list of the five hotels, with description and rating given by the user.

[80] The local generative neural network 204(1) can generate a summary 208(1) to be provided to the server-side generative neural network 204(2). The summary 208(1) can be generated based on the input 202 and the request 206. For example, the summary can include: “The user always liked hotels that have a spa. The user preferred non-smoking rooms.”

[81] The server-side generative neural network 204(2) receives the request 206(1) and the summaiy⁷ 208(1). The server-side generative neural network 204(2) can use a search engine and can find a hotel booking website. The server-side generative neural network 204(2) can generate a request 206(2) asking the hotel booking website to book a hotel. For example, the request 206(2) can include: “Book a hotel in region A.” The request 206(2) can be provided to a specialized generative neural network 204(3), e.g., a hotel-booking agent for the hotel booking website.

[82] In some examples, the server-side generative neural network 204(2) can generate the request 206(2) by incorporating some new information into the request 206(1). For example, the request 206(1) can list the five hotels that the user liked. Based on properties of the five hotels, the server-side generative neural network 204(2) can generate the request 206(2) byadding two other hotels that the user might like based on properties of the five hotels to a list of candidate hotels.

[83] Before sending the request 206(2) to the specialized generative neural network 204(3), the system 200 can ask the server-side generative neural network 204(2) to check whether the request 206(2) is consistent with the summary- 208(1). For example, the request 206(2) can be ‘‘Book a non-smoking room in region A”. Because the summary 208(1) indicates that “the user preferred non-smoking rooms,” the server-side generative neural network 204(2) can determine that the request 206(2) is consistent with the summary- 208(1).

[84] As another example, the request 206(2) can include the two other hotels and the server-side generative neural network 204(2) can determine whether properties of the two other hotels are consistent with the summary 208(1).

[85] As another example, the request 206(2) can be “Book a hotel in region A” and region A is far away from the city center. The summary- 208(1) can include: “hotels that are far away from the city center generally receive a low rating.” Thus, the server-side generative neural network 204(2) can determine that the request 206(2) is not consistent with the summary 208(1). The server-side generative neural network 204(2) can regenerate an updated request. For example, the server-side generative neural network 204(2) can generate an updated request “Book a hotel in region B” and region B is not far away from the city center.

[86] The system 200 can ask the server-side generative neural network 204(2) to check whether the updated request 206(2) is consistent with the summary 208(1 ). The system 200 can ask the server-side generative neural network 204(2) to regenerate the request 206(2) until the request is consistent with the summary 208(1). In some implementations, the system 200 can ask the server-side generative neural network 204(2) to regenerate the request 206(2) until a maximum number of requests has been generated. The system 200 may select a request from the generated requests that is most consistent with the summary-.

[87] The server-side generative neural network 204(2) can generate a summary- 208(2) based on the request 206(1), the summary 208(1), and the request 206(2). Because the request 206(2) is about booking a hotel at region A. the server-side generative neural network 204(2) can include relevant information of region A in the summary 208(2). For example, the serverside generative neural network 204(2) can generate the summary- 208(2), and the summary 208(2) can include, “The user liked hotels that have a spa and non-smoking rooms. Region A is 10 miles from the airport.” [88] The specialized generative neural network 204(3) can process the request 206(2) and the summary 208(2). In some implementations, the specialized generative neural network 204(3) and the server-side generative neural network 204(2) can interact with each other until they agree on a hotel. During the interaction, when a generative neural network generates a request, the generative neural network can perform a consistency check on the request against the summary before sending the request to another generative neural network.

[89] The specialized generative neural network 204(3) can generate a final output 210 based on the request 206(2) and the summary 208(2). The final output 210 can include: “Booked a room at Hotel ABC.” In some implementations, the specialized generative neural network 204(3) can select and reserve a room from a list of available rooms. For example, because the summary indicates that the user liked non-smoking rooms, the specialized generative neural network 204(3) can book a non-smoking room at Hotel ABC. As another example, because the summary indicates that region A is 10 miles from the airport, the specialized generative neural network 204(3) can book a hotel that has suitable check-in and check-out times based on the user's travel plan. The specialized generative neural network 204(3) can provide the final output to the user device.

[90] FIG. 2B is a diagram of an example system 220 for maintaining information in a sequence of generative neural networks.

[91] An initial generative neural network 224(1) can receive an input 222 for a writing task indicating what a user wants to write about. The initial generative neural network 224(1) can be a small size LLM installed on a mobile device. The initial generative neural network 224(1) can obtain private data related to the writing task, e.g., by accessing private notes or financial data stored on the mobile device. The initial generative neural network 224(1) can determine that the task is quite complex and the small size LLM on the mobile device is not powerful enough to handle the task. The initial generative neural network 224(1) can send a request 226(1) to a second LLM 224(2) on the server that uses a bigger model.

[92] Because private data cannot be shared with the second LLM 224(2), the initial generative neural network 224(1) can generate a summary 228(1) that includes important relevant information of the task, without sharing the private data. For example, the summary 228(1) can describe the user's writing preferences and other properties at a very high level.

[93] The second LLM 224(2) can process the request 226(1) and can utilize the internet to search new information about what the user wants to write about. The second LLM 224(2) can obtain new information related to the writing task. In some cases, the second LLM 224(2) on the server can determine that the user wants to write about content in a particular domain. such as art or biology. The second LLM 224(2) can reach out to a third LLM 224(3) that is specialized in the particular domain.

[94] The second LLM 224(2) on the server may have found lots of information on the internet and may not be able to pass all the information to the third LLM 224(3) due to memory limitations. The second LLM 224(2) on the server can generate a summary⁷ 228(2) that summarizes the information found on the internet and the previous summary 228(1) to a compact message. The third LLM 224(3) can process the request 226(2) and the summary 228(2) received from the second LLM 224(2) and can generate a final writing output 230 for the writing task. The third LLM can send the final writing output to a user device for presentation to a user.

[95] FIG. 3 is a flow chart of an example process 300 for maintaining information in a sequence of generative neural networks. FIG. 4 is a flow chart of an example process 400 performed by each intermediate generative neural network for maintaining information in a sequence of generative neural networks. The processes 300 and 400 will be described as being performed by an appropriately programmed computer system, such as the system 100.

[96] The system receives an input for a task to be processed by a sequence of generative neural networks to generate a final output for the task (302). The sequence can include an initial generative neural network, one or more intermediate generative neural networks, and a final generative neural network.

[97] The system processes the input by the sequence of the generative neural networks. The system processes the input using the initial generative neural network in the sequence of generative neural networks (304). In some implementations, the system can receive, by the initial generative neural network, the input. The system can process, by the initial generative neural network, the input to generate a next request and a next summary. The system can provide, by the initial generative neural network, the next request and the next summary to the next generative neural network in the sequence.

[98] The system processes an output from a respective previous generative neural network using each of one or more intermediate generative neural networks in the sequence of generative neural networks (306). For example, the second generative neural network processes an output generated from the initial generative neural network, and the third generative neural network processes an output generated from the second generative neural network.

[99] Referring to FIG. 4, for each intermediate generative neural network, the system receives, by the generative neural network, a previous request and a previous summary generated from a previous generative neural network in the sequence (402). The system generates, by the generative neural network and based on the previous request and the previous summary, a next request (404).

[100] The system determines, by the generative neural network, whether the next request is consistent with the previous summary (406). In some implementations, the previous summary can include properties of the input and the previous request, and determining whether the next request is consistent with the previous summary can include determining whether properties of the next request is consistent with the properties of the input included in the previous summary.

[101] In response to determining that the next request is consistent with the previous summary, the system generates, by the generative neural network and based on the previous request and the previous summary, a next summary (408). In some implementations, the system can obtain, by the generative neural network, new information based on the previous request, and the system can generate, by the generative neural network, the next summary⁷ based on the previous request, the previous summary, and the new information.

[102] In some implementations, the system can generate, by the generative neural network, the next summary based on the previous request, the previous summary⁷, and the next request. For example, the system can generate a prompt saying: “you chose this message to be passed on, can you describe to me very compactly what are the most important properties in this data?⁷’

[103] In some implementations, for a particular intermediate generative neural network, the system can generate, by the generative neural network and based on the previous request and the previous summary, an initial next request. The system can determine whether the initial next request is consistent with the previous summary. In response to determining that the initial next request is not consistent with the previous summary, the system can generate, by the generative neural network and based on the previous request and the previous summary, the next request. The system can determine that the next request is consistent with the previous summary.

[104] In some implementations, in response to determining that the initial next request is not consistent with the previous summary, the system can generate the next request based on (i) the previous request, (ii) the previous summary, and (iii) data indicating that the initial next request is not consistent with the previous summary. [105] The system provides, by the generative neural network, the next request and the next summary to the next generative neural network in the sequence (410). The process in FIG. 4 continues until the final generative neural network receives a request.

[106] Referring to FIG. 3, the system processes an output from the second to final generative neural network using a final generative neural network in the sequence of generative neural networks (308). In some implementations, the system can receive, by the final generative neural network, the previous request and the previous summary generated from a previous intermediate generative neural network in the sequence. The system can process, by the final generative neural network, the previous request and the previous summary to generate the final output for the task.

[107] This specification uses the term "‘configured⁷’ in connection with systems and computer program components. For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions.

[108] Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non- transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine- readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g.. a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.

[109] The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can also be, or further include, off-the-shelf or custom-made parallel processing subsystems, e.g., a GPU or another kind of special-purpose processing subsystem. The apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g.. code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

[HO] A computer program which may also be referred to or described as a program, software, a software application, an app. a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.

[Hl] As used in this specification, an “engine,” or “software engine,” refers to a software implemented input/ output system that provides an output that is different from the input. An engine can be an encoded block of functionality, such as a library, a platform, a software development kit (“SDK”), or an object. Each engine can be implemented on any appropriate type of computing device, e.g., servers, mobile phones, tablet computers, notebook computers, music players, e-book readers, laptop or desktop computers, PDAs, smart phones, or other stationary or portable devices, that includes one or more processors and computer readable media. Additionally, two or more of the engines may be implemented on the same computing device, or on different computing devices.

[112] The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA or an ASIC, or by a combination of special purpose logic circuitry and one or more programmed computers.

[113] Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory- or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. The central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry-. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.

[114] Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory- devices, including by way' of example semiconductor memory- devices, e.g., EPROM, EEPROM, and flash memorydevices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.

[115] To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and pointing device, e.g., a mouse, trackball, or a presence sensitive display or other surface by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory- feedback, e.g., visual feedback, auditory- feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s device in response to requests received from the web browser. Also, a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone, running a messaging application, and receiving responsive messages from the user in return. [116] While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially be claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

[117] Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

[US] Particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.

[119] What is claimed is:

Claims

1. A method performed by one or more computers, the method comprising: receiving an input for a task to be processed by a sequence of generative neural networks to generate a final output for the task, the sequence comprising an initial generative neural network, one or more intermediate generative neural networks, and a final generative neural network; and processing the input by the sequence of the generative neural networks, comprising: for each intermediate generative neural network, receiving, by the generative neural network, a previous request and a previous summary generated from a previous generative neural network in the sequence; generating, by the generative neural network and based on the previous request and the previous summary, a next request: determining, by the generative neural network, whether the next request is consistent with the previous summary; in response to determining that the next request is consistent with the previous summary, generating, by the generative neural network and based on the previous request and the previous summary, a next summary; and providing, by the generative neural network, the next request and the next summary to the next generative neural network in the sequence.

2. The method of claim 1, wherein the previous summary comprises properties of the input and the previous request, and determining whether the next request is consistent with the previous summary comprises determining whether properties of the next request is consistent with the properties comprised in the previous summary.

3. The method of claim 1, wherein generating the next summary comprises: obtaining, by the generative neural network, new information based on the previous request; and generating, by the generative neural netw ork, the next summary based on the previous request, the previous summary, and the new information.

4. The method of claim 1 , wherein generating the next summary comprises generating, by the generative neural network, the next summary based on the previous request, the previous summary, and the next request.

5. The method of any of claims 1-4, comprising, for a particular intermediate generative neural network: generating, by the generative neural network and based on the previous request and the previous summary, an initial next request; determining, by the generative neural network, whether the initial next request is consistent with the previous summary; and in response to determining that the initial next request is not consistent with the previous summary, generating, by the generative neural network and based on the previous request and the previous summary, the next request.

6. The method of claim 5, wherein generating the next request comprises generating, by the generative neural network, the next request based on (i) the previous request, (ii) the previous summary, and (iii) data indicating that the initial next request is not consistent with the previous summary.

7. The method of any of claims 1-4, comprising: receiving, by the initial generative neural network, the input; processing, by the initial generative neural network, the input to generate the next request and the next summary; and providing, by the initial generative neural network, the next request and the next summary to the next generative neural network in the sequence.

8. The method of any of claims 1-4, comprising: receiving, by the final generative neural network, the previous request and the previous summary generated from a previous intermediate generative neural network in the sequence; and processing, by the final generative neural network, the previous request and the previous summary to generate the final output for the task.

9. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an input for a task to be processed by a sequence of generative neural networks to generate a final output for the task, the sequence comprising an initial generative neural network, one or more intermediate generative neural networks, and a final generative neural network; and processing the input by the sequence of the generative neural networks, comprising: for each intermediate generative neural network, receiving, by the generative neural network, a previous request and a previous summary generated from a previous generative neural network in the sequence; generating, by the generative neural network and based on the previous request and the previous summary, a next request; determining, by the generative neural network, whether the next request is consistent with the previous summary; in response to determining that the next request is consistent with the previous summary, generating, by the generative neural network and based on the previous request and the previous summary⁷, a next summary ; and providing, by the generative neural network, the next request and the next summary to the next generative neural network in the sequence.

10. The system of claim 9, wherein the previous summary’ comprises properties of the input and the previous request, and determining whether the next request is consistent with the previous summary comprises determining whether properties of the next request is consistent with the properties comprised in the previous summary.

11. The system of claim 9, wherein generating the next summary comprises: obtaining, by the generative neural network, new information based on the previous request; and generating, by the generative neural network, the next summary’ based on the previous request, the previous summary’, and the new information.

12. The system of claim 9, wherein generating the next summary comprises generating, by the generative neural network, the next summary based on the previous request, the previous summary, and the next request.

13. The system of any of claims 9-12, wherein the operations comprise, for a particular intermediate generative neural network: generating, by the generative neural network and based on the previous request and the previous summary, an initial next request; determining, by the generative neural network, whether the initial next request is consistent with the previous summary; and in response to determining that the initial next request is not consistent with the previous summary, generating, by the generative neural network and based on the previous request and the previous summary, the next request.

14. The system of claim 13, wherein generating the next request comprises generating, by the generative neural network, the next request based on (i) the previous request, (ii) the previous summary, and (iii) data indicating that the initial next request is not consistent with the previous summan'

15. The system of any of claims 9-12, wherein the operations comprise: receiving, by the initial generative neural network, the input; processing, by the initial generative neural network, the input to generate the next request and the next summary; and providing, by the initial generative neural network, the next request and the next summary to the next generative neural network in the sequence.

16. The system of any of claims 9-12, wherein the operations comprise: receiving, by the final generative neural network, the previous request and the previous summary generated from a previous intermediate generative neural network in the sequence; and processing, by the final generative neural network, the previous request and the previous summary to generate the final output for the task.

17. One or more non-transitory storage media encoded with instructions that when executed by a computing device cause the computing device to perform operations comprising: receiving an input for a task to be processed by a sequence of generative neural networks to generate a final output for the task, the sequence comprising an initial generative neural network, one or more intermediate generative neural networks, and a final generative neural network; and processing the input by the sequence of the generative neural networks, comprising: for each intermediate generative neural network, receiving, by the generative neural network, a previous request and a previous summary generated from a previous generative neural network in the sequence; generating, by the generative neural network and based on the previous request and the previous summary, a next request; determining, by the generative neural network, whether the next request is consistent with the previous summary; in response to determining that the next request is consistent with the previous summary, generating, by the generative neural network and based on the previous request and the previous summary⁷, a next summary ; and providing, by the generative neural network, the next request and the next summary to the next generative neural network in the sequence.

18. The non-transitory storage media of claim 17, wherein the previous summary' comprises properties of the input and the previous request, and determining whether the next request is consistent with the previous summary comprises determining whether properties of the next request is consistent with the properties comprised in the previous summary.

19. The non-transitory storage media of claim 17, wherein generating the next summary' comprises: obtaining, by the generative neural network, new information based on the previous request; and generating, by the generative neural network, the next summary' based on the previous request, the previous summary, and the new information.

20. The non-transitory storage media of claim 17, wherein generating the next summary comprises generating, by the generative neural network, the next summary based on the previous request, the previous summary, and the next request.