[go: up one dir, main page]

WO2025031095A1 - Method for updating parameters of pre-trained model and data processing method for pre-trained model - Google Patents

Method for updating parameters of pre-trained model and data processing method for pre-trained model Download PDF

Info

Publication number
WO2025031095A1
WO2025031095A1 PCT/CN2024/104765 CN2024104765W WO2025031095A1 WO 2025031095 A1 WO2025031095 A1 WO 2025031095A1 CN 2024104765 W CN2024104765 W CN 2024104765W WO 2025031095 A1 WO2025031095 A1 WO 2025031095A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
feature data
initial
backbone network
tuning module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/104765
Other languages
French (fr)
Chinese (zh)
Inventor
江泽胤子
黄子渊
马傲
毛超杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Publication of WO2025031095A1 publication Critical patent/WO2025031095A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present disclosure relates to the field of large models and data processing, and in particular, to a method for updating parameters in a pre-trained model and processing data of the pre-trained model.
  • the disclosed embodiments provide a method for updating parameters in a pre-trained model and processing data of the pre-trained model, so as to at least solve the technical problems of high resource consumption and computational efficiency in the training process of the model.
  • a method for updating parameters in a pre-trained model may include: obtaining feature data output by a backbone network of the pre-trained model, wherein the backbone network comes from an initial backbone network; calling an initial bypass network to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; updating the parameters of the initial bypass network based on the converted feature data to obtain a target bypass network, wherein, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the feature data.
  • a data processing method for a pre-trained model may include: responding to an inquiry message received in a generative interactive interface, wherein the inquiry message includes at least: keywords of text generation information; calling the backbone network of the pre-trained model to at least analyze the text generation information and output text feature data, wherein the backbone network comes from the initial backbone network; calling the target bypass network to convert the text feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted text feature data, generating at least one reply result that matches the inquiry instruction; displaying the reply result The generated interactive interface.
  • another data processing method for a pre-trained model may include: obtaining feature data obtained by processing conditional data by a backbone network in a pre-trained model, wherein the backbone network comes from an initial backbone network, and the conditional data is used to determine the generation conditions of target data; calling a target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, generating target data corresponding to the conditional data, wherein the type of the target data includes at least one of the following: text information, image information, video information, and voice information.
  • another data processing method for a pre-trained model may include: inputting multimodal information in a dialogue interface, wherein the type of the multimodal information includes at least one of the following: text information containing character information, video frame information containing frame image information, and audio information; calling the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network; calling the target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, generating reply information corresponding to the multimodal information, wherein the type of the reply information includes at least one of the following: text information, image information, video information, and voice information.
  • a data processing system of a pre-trained model may include: a client, configured to display a dialogue interface and capture multimodal information input in the dialogue interface, wherein the type of multimodal information includes at least one of the following: inquiry information containing character information, video frame information containing frame image information, and audio information; a server, configured to call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, and call the target bypass network to convert the feature data, and based on the converted feature data, generate reply information corresponding to the multimodal information, wherein the backbone network comes from the initial backbone network, the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network, and the type of reply information includes at least one of the following: text information, image information, video information, and voice information.
  • another data processing method of a pre-trained model may include: inputting a speech to be converted on a virtual reality VR device or an augmented reality AR device; extracting feature data from the speech to be converted using a backbone model in the pre-trained model, wherein the backbone network comes from an initial backbone network; calling a target bypass network in the pre-trained model to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, determining the image information corresponding to the speech to be converted; using the image information to activate the VR device or AR device, and displaying the image information in the VR device or AR device.
  • an electronic device which may include a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions.
  • the above-mentioned computer-executable instructions are executed by the processor, the above-mentioned method of any one of the above-mentioned items is implemented.
  • a processor is further provided, the processor being used to run a program, wherein any one of the above methods is executed when the program is running.
  • a computer-readable storage medium including a stored program, wherein when the program is executed, the device where the storage medium is located is controlled to execute any one of the above methods.
  • FIG5 is a flow chart of another method for processing data of a pre-training model according to an embodiment of the present disclosure
  • FIG6 is a schematic diagram of a data processing result of a pre-training model according to an embodiment of the present disclosure
  • FIG7 is a flow chart of another method for processing data of a pre-training model according to an embodiment of the present disclosure.
  • FIG8( b ) is a schematic diagram of a tuning module according to an embodiment of the present disclosure.
  • FIG8( c ) is a schematic diagram of another tuning module according to an embodiment of the present disclosure.
  • FIG9 is a schematic diagram of a pre-training model according to an embodiment of the present disclosure.
  • FIG10 is a schematic diagram of another pre-training model according to an embodiment of the present disclosure.
  • FIG11 is a schematic diagram of a device for updating parameters in a pre-training model according to an embodiment of the present disclosure
  • FIG12 is a schematic diagram of a data processing device for a pre-training model according to an embodiment of the present disclosure
  • FIG13 is a schematic diagram of another data processing device for a pre-training model according to an embodiment of the present disclosure.
  • FIG14 is a schematic diagram of another data processing device for a pre-training model according to an embodiment of the present disclosure.
  • FIG15 is a structural block diagram of a computer terminal according to an embodiment of the present disclosure.
  • FIG16 is a block diagram of an electronic device according to a method for updating parameters in a pre-training model according to an embodiment of the present disclosure.
  • the technical solution provided by the present disclosure is mainly implemented by large-scale model technology.
  • the large model here refers to a deep learning model with large-scale model parameters, which can usually contain hundreds of millions, tens of billions, hundreds of billions, trillions or even more than ten trillion model parameters.
  • the large model can also be called a cornerstone model/foundation model (Foundation Model).
  • the large model is pre-trained through large-scale unlabeled corpus to produce a pre-trained model with more than 100 million parameters.
  • This model can adapt to a wide range of downstream tasks, and the model has good generalization ability, such as large-scale language model (Large Language Model, LLM), multi-modal pre-training model (multi-modal pre-training model), etc.
  • LLM Large Language Model
  • multi-modal pre-training model multi-modal pre-training model
  • the pre-trained model can be fine-tuned through a small number of samples so that the large model can be applied to different tasks.
  • the large model can be widely used in natural language processing (NLP), computer vision and other fields, and can be specifically applied to computer vision tasks such as visual question answering (VQA), image caption (IC), image generation, etc. It can also be widely used in natural language processing tasks such as text-based sentiment classification, text summary generation, and machine translation. Therefore, the main application scenarios of the large model include but are not limited to digital assistants, intelligent robots, search, online education, office software, e-commerce, intelligent design, etc.
  • data processing through pre-trained models in scenarios such as machine intelligence technology, basic visual intelligence, and video artificial intelligence (AI) are used as examples for explanation.
  • AI video artificial intelligence
  • Pre-trained models can be produced after large-scale training and tuning on representative data sets. They can be used as initialization models for training other downstream tasks, and can be used to speed up training or achieve better results.
  • Foundation Model A model trained in different fields using large amounts of data, powerful computing power, and a well-designed structure, which can be used to process downstream tasks.
  • Transfer Learning can improve the learning of new tasks by transferring knowledge from related tasks that have already been learned
  • Parameter-efficient transfer learning refers to a tuning training method that modifies a small number of parameters or adds a small number of additional parameters based on a pre-trained model.
  • Memory-efficient Transfer Learning is a method for fine-tuning training methods based on pre-trained models using relatively small amounts of memory.
  • the deep learning model can be a network architecture consisting of several layers of encoders and decoders;
  • Vision Transformer can be a network architecture, which can be a migration application of Transformer architecture in the visual field;
  • Submodule which can be a submodule of Transformer, can be composed of multi-head attention and feedforward network;
  • Multi-head Attention can be a submodule in a Transformer, which can be a module that calculates the related vectors of query, key, and value;
  • Feed-forward Network can be a submodule in a Transformer and can be composed of multiple fully connected layers and activation functions;
  • Multi-Layer Perceptron can be a network module. It can be composed of one or more hidden layers and activation functions;
  • Adapter which can be a tuning training method, can act on a feedforward network and can contain a small module consisting of two fully connected layers and an activation function;
  • Prompt tuning method can be a tuning training method, which can refer to tuning training with the help of a learnable parameter spliced with the input;
  • Prefix tuning can be a tuning method that uses two learnable parameters concatenated with the key and value in the multi-head attention layer to perform tuning training;
  • Discriminative tasks can refer to tasks that discriminate input data, such as discriminating Y based on X, for example, image classification tasks;
  • Generative tasks can be tasks that generate operations on input data, such as generating Y based on X, for example, image generation tasks;
  • U-Net can be a network architecture that can be composed of a downsampling encoder, an upsampling decoder, and skip connections;
  • the deep learning text-to-image generation model can be a network structure for generation tasks, which can be composed of an autoencoder, a text encoder, and a U-Net structure.
  • a method for updating parameters in a pre-trained model is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.
  • FIG. 1 is a schematic diagram of an application scenario of a method for updating parameters in a pre-trained model according to an embodiment of the present disclosure.
  • the large model is deployed in a server 10, and the server 10 can be connected to one or more client devices 20 through a local area network connection, a wide area network connection, an Internet connection, or other types of data networks.
  • the client device 20 here may include but is not limited to: a smart phone, a tablet computer, a laptop computer, a PDA, a personal computer, a smart home device, a vehicle-mounted device, etc.
  • the client device 20 can interact with the user through a graphical user interface, for example, by inputting classification conditions in the graphical user interface, so as to realize the call of the large model, and then realize the method provided in the embodiment of the present disclosure.
  • the system composed of the client device and the server can execute the following steps: the client device obtains the image to be classified.
  • the server executes step S101 to obtain the feature data obtained by processing the image to be classified by the backbone network in the pre-trained model; step S102, calling the target bypass network to convert the feature data, wherein the target bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the backbone network; step S103, based on the converted feature data, determining the category to which the image to be classified belongs.
  • the server 10 can transmit the category to which the image to be classified belongs to the client and display it in the client's graphical user interface. Need It is noted that, when the operating resources of the client device can meet the deployment and operating conditions of the large model, the embodiments of the present disclosure can be performed in the client device.
  • Figure 2 is a flow chart of a method for updating parameters in a pre-trained model according to an embodiment of the present disclosure. As shown in Figure 2, the method may include the following steps.
  • Step S202 obtaining feature data output by a backbone network of the pre-trained model, wherein the backbone network comes from an initial backbone network.
  • the backbone network can be obtained from the initial backbone network.
  • Feature data output by the backbone network can be obtained.
  • the pre-trained model can also be called a pre-trained network, which can be a model produced after large-scale training and tuning on a representative data set, which can be used as an initialization model when training other downstream tasks, and can be used to speed up the training speed or obtain better results.
  • Feature data can also be called output features, which can be data obtained by the backbone network processing voice, image and other information, and can be represented by x 0 and x 1. This is only for example, and no specific restrictions are made on the representation of feature data.
  • the backbone network can be a convolutional neural network
  • the input of the backbone network can be an image.
  • a set of abstract feature representations is obtained.
  • Step S204 calling the initial bypass network to convert the characteristic data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.
  • At least one tuning module can be extracted from the backbone network, and the extracted at least one tuning module can be constructed to obtain an initial bypass network.
  • the constructed initial bypass network can be called to convert the feature data.
  • the tuning module can also be called a sub-module, which can be extracted from the initial backbone network and can be used to improve the accuracy of the pre-trained model prediction.
  • the initial bypass model can also be called a tuning framework (Residual Tuning, abbreviated as Res-Tuning), which can be represented by Bypass.
  • a tuning module can be extracted from the initial backbone network, and a bypass network can be constructed using the tuning module to obtain an independent bypass network and backbone network.
  • the feature data output by the backbone network can be used as the input of the initial bypass network, and the initial bypass network can be called to convert the feature data.
  • Step S206 updating the parameters of the initial bypass network based on the converted characteristic data to obtain a target bypass network.
  • the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the characteristic data.
  • the converted feature data can be output to the pre-trained model to output the corresponding results, and based on the output results, the parameters of the initial bypass network are adjusted to obtain the target bypass network.
  • the target bypass network can be a new tuning network independent of the backbone network.
  • the parameters of the initial bypass network can be used to characterize the influence of different tuning modules on the feature data, and can be weights, biases, regularization parameters, etc., which are only examples here, and no specific restrictions are made on the types of parameters.
  • the output results can be images, text, voice and other data recognized by the pre-trained model. This is only an example, and no specific restrictions are made on the types of output results.
  • the data flow of the initial bypass network is independent of the data flow of the backbone network. Therefore, the data flow of the initial bypass network cannot be transmitted to the backbone network.
  • the tuning module is extracted from the initial backbone network, and the initial bypass model is constructed using the tuning module to obtain an independent initial bypass network and backbone network.
  • the data stream of the initial bypass network is prohibited from being transmitted to the backbone network, so as to independently train the initial bypass network and obtain the target bypass network, thereby achieving the technical effect of reducing resource consumption in the training process of the model and improving computing efficiency, and solving the technical problems of high resource consumption and computing efficiency in the training process of the model.
  • the data flow of the initial bypass network can be prohibited from being transmitted to the backbone network, and the parameters of the initial bypass network can be adjusted based on the converted feature data through back propagation or stochastic gradient descent to obtain the target bypass network.
  • the tuning modules can be extracted from the initial backbone network, and the extracted tuning modules can be combined and constructed to obtain the initial bypass network.
  • the initial backbone network with the extracted tuning modules can be used as the backbone network to obtain the feature data output by the backbone network.
  • the initial bypass network can be called to convert the feature data, and the converted feature data can be output to the pre-trained model to output the corresponding results. Based on the output results, the parameters of the initial bypass network can be adjusted to obtain the target bypass network.
  • a tuning module extracted from a backbone network is used to construct an initial bypass network.
  • the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (i.e., a new tuning module) that is relatively independent of the backbone network.
  • step S204 calling the initial bypass network to convert the feature data, includes: calling the tuning module in the initial bypass network to adjust the feature data; and performing weighted summation on the adjusted feature data based on the weight of the initial bypass network.
  • the tuning module in the initial bypass network can be called to adjust the feature data, and the adjusted feature data can be weighted summed based on the weight of the initial bypass network.
  • the weight in the initial network bypass is continuously learned, and can be used to characterize the proportion of different tuning modules, and can also be called the ⁇ coefficient, which can be in a state of continuous learning.
  • the intermediate output of the backbone network i.e., feature data
  • the tuning module in the initial bypass network can be called to adjust the feature data
  • the adjusted feature data can be weighted summed based on the parameters of the initial bypass network.
  • a framework for synchronous combined training of a backbone network and an initial bypass network is used, and the initial bypass network is constructed by combining tuning modules extracted from the initial backbone network.
  • the initial bypass network is disconnected from the backbone network, and only the initial bypass network is passed through, so that there is no need to further calculate the gradient of the backbone network, thereby achieving memory saving and improving the training speed, thereby achieving the purpose of saving memory occupied during data processing, and obtaining a new target bypass network independent of the backbone network.
  • the tuning module in the initial bypass network is called to adjust the characteristic data, including: calling the first vertical tuning module corresponding to the first layer of the backbone network in the backbone network to adjust the characteristic data output by the zeroth layer of the backbone network, and calling the first horizontal tuning module associated with the first vertical tuning module to adjust the characteristic data output by the first layer of the backbone network.
  • the first vertical tuning module corresponding to the first layer of the backbone network in the backbone network can be called to adjust the feature data output by the zeroth layer of the backbone network in the backbone network
  • the first horizontal tuning module associated with the first vertical tuning module is called to adjust the feature data output by the first layer of the backbone network.
  • the first layer of the backbone network can be a coding layer (Decoder), a convolutional layer (block), etc., which is only an example here, and there is no specific restriction on the category of the first layer of the backbone network.
  • the zeroth layer of the backbone network can be an intermediate layer network (Middle), a convolutional layer, etc., which is only an example here, and there is no specific restriction on the category of the zeroth layer of the backbone network.
  • the first vertical tuning module corresponds to the first layer of the backbone network, and can be a vertical tuning module, which can also be called a vertical Res-Tuner.
  • the first horizontal tuning module is associated with the first vertical tuning module, and can be a horizontal tuning module, which can also be called a horizontal Res-Tuner.
  • a tuning module can be extracted from the initial bypass network, and a complete initial bypass network can be constructed by combining different tuning modules.
  • the first vertical tuning module corresponding to the first layer of the backbone network in the initial bypass network can be called to perform feature number analysis on the output of the zeroth layer of the backbone network.
  • the feature data output by the first backbone network is adjusted, and the first horizontal tuning module associated with the first vertical tuning module is called to adjust the feature data output by the first backbone network.
  • the feature data output by the zeroth backbone network can be represented by x 0.
  • the feature data output by the first backbone network can be represented by x 1 .
  • the adjusted feature data is weightedly summed, including: based on the weight of the initial bypass network, the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module are weighted summed to obtain the first feature data.
  • the weight of the initial bypass network can be determined.
  • the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module are obtained.
  • the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module are weighted summed to obtain the first feature data after the weighted summation.
  • updating the parameters of the initial bypass network based on the converted characteristic data to obtain the target bypass network includes: in response to the first layer backbone network being the last layer network in the backbone network, updating the parameters of the initial bypass network based on the first characteristic data to obtain the target bypass network.
  • the first-layer backbone network is the last layer of the backbone network. If the first-layer backbone network is the last layer of the backbone network, the first feature data can be used as the output of the initial bypass network, and the first feature data can be used as the training data for back propagation to adjust the parameters of the initial bypass network to obtain the target bypass network.
  • the overall structure of the pre-trained model can be divided into two parts, one part is the backbone model, the parameters of which are frozen during the training process; the other part is the trainable initial bypass network (also referred to as the bypass structure), which can be composed of several tuning modules.
  • Each layer of the backbone network has a horizontal tuning module and a vertical tuning module, which are respectively used to receive the intermediate feature data from the layer in the backbone network and the output data of the previous layer in the bypass network, and perform weighted summation and output to the next layer, until the last layer of features is output to the pre-trained model (for example, the discriminative task network) for the output of the corresponding results, and the output results are obtained.
  • the parameters of the initial bypass network can be updated to obtain the target bypass network.
  • training data e.g., labeled images, speech, and other data
  • the backbone network can be used to process the training data and output feature data.
  • the first vertical tuning module corresponding to the first-layer backbone network in the initial bypass network can be called to adjust the feature data output by the zero-th layer backbone network
  • the first horizontal tuning module corresponding to the first vertical tuning module can be called to adjust the feature data output by the first-layer backbone network.
  • a weighted sum is taken between the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module to obtain the first feature data after the weighted sum, and it is determined whether the first-layer backbone network is the last layer of the backbone network. If the first-layer backbone network is the last layer of the backbone network, the first feature data can be used as the output of the initial bypass network.
  • the first feature data can be input into the output layer (e.g., Head) in the pre-trained model, and the output layer adjusts the first feature data.
  • the data is converted to obtain the final training result, and back propagation calculation can be performed based on the training result to update the parameters of the initial bypass network to obtain the target bypass network.
  • updating the parameters of the initial bypass network based on the converted characteristic data to obtain the target bypass network includes: in response to the last layer of the network in the non-backbone network of the first layer backbone network, calling the second vertical tuning module corresponding to the second layer backbone network in the backbone network to adjust the first characteristic data, and calling the second horizontal tuning module associated with the second vertical tuning module to adjust the characteristic data of the second layer backbone network in the backbone network; based on the weight of the initial bypass network, performing weighted summation between the adjusted first characteristic data corresponding to the second vertical tuning module and the adjusted characteristic data corresponding to the second horizontal tuning module to obtain the second characteristic data; in response to the last layer of the network in the non-backbone network of the second layer backbone network, executing the following steps:
  • the invention discloses a method for transmitting a bypass network to a third layer of a backbone network, wherein the first layer of the backbone network is determined as the second layer of the backbone network; the second vertical tuning module is called to adjust the first characteristic data,
  • a weighted sum is performed between the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module to obtain the first feature data after the weighted sum, and it is determined whether the first-layer backbone network is the last layer of the backbone network. If the first-layer backbone network is not the last layer of the backbone network, the first feature data can be used as input data of the second vertical tuning module corresponding to the second-layer backbone network, and the second vertical tuning module corresponding to the second-layer backbone network in the backbone network can be called to adjust the first feature data, and the second horizontal tuning module associated with the second vertical tuning module can be called to adjust the feature data of the second-layer backbone network in the backbone network. Based on the weight of the initial bypass network, a weighted sum can be performed between the adjusted first feature data corresponding to the second vertical tuning module and the adjusted feature data corresponding to the second horizontal tuning module to obtain the second feature data.
  • this embodiment determines whether the second-layer backbone network is the last layer of the backbone network. If the second-layer backbone network is the last layer of the backbone network, the second feature data can be output to the output layer of the pre-trained model. The output layer converts the second feature data to obtain the final output result. Back-propagation calculation can be performed based on the output result to update the parameters of the initial bypass network to obtain the target bypass network.
  • the second-layer backbone network can be determined as the first-layer backbone network, and the third-layer backbone network in the backbone network can be determined as the second-layer backbone network; the second vertical tuning module is called to adjust the first feature data, and the second horizontal tuning module is called to adjust the feature data of the second-layer backbone network; based on the weight of the initial bypass network, the corresponding adjustment of the second vertical tuning module is performed.
  • a weighted sum is performed between the first feature data after the training and the adjusted feature data corresponding to the second level tuning module to obtain the second feature data, until the second-layer backbone network is the last layer of the backbone network, and the obtained second feature data is output to the output layer of the pre-trained model.
  • the result of the output layer is back-propagated to update the parameters of the initial bypass network to obtain the target bypass network.
  • the second characteristic data output by the bypass network may be determined based on the following formula:
  • Res-Tuner(x l ) can be used to characterize the output of the horizontal tuning module at layer l. It can be used to characterize the output of the vertical tuning module of the l-1th layer, x0 can be used to characterize the characteristic data of the output of the zeroth layer backbone network, xl can be used to characterize the output characteristics of the lth layer, and ⁇ can be used to characterize the weight of the initial bypass network.
  • the weight of the initial bypass network may be a learnable weight coefficient, which may be continuously changed during the training of the initial bypass network.
  • the initial bypass network uses a horizontal tuning module and a vertical tuning module, respectively, to process the feature data output from the lth layer of the backbone network and the first feature data of the l-1 layer of the bypass network, and the feature data of the two can be weighted and summed using weights.
  • the initial input into the entire module is called layer 0.
  • step S206 updating the parameters of the initial bypass network based on the converted feature data to obtain the target bypass network, includes: determining the processing result obtained by the output layer in the pre-trained model processing the converted feature data; determining the difference between the processing result and the actual processing result corresponding to the processing result; adjusting the parameters of the initial bypass network based on the difference value to obtain the target bypass network.
  • feature data after conversion of the initial bypass network is obtained, and the feature data can be processed through the output layer in the pre-trained network to obtain a processing result.
  • the difference value between the processing result and the real processing result corresponding to the processing result can be determined.
  • the parameters of the initial bypass network can be adjusted to obtain the target bypass network.
  • the processing result can be an image, text, sequence data, classification situation, etc.
  • the category of the processing result can be flexibly changed according to the use scenario of the pre-trained model. This is only an example, and no specific limitation is made to the category of the processing result.
  • the difference value can be used to characterize the difference between the processing result and the real processing result corresponding to the processing result, and can be used to determine the loss function.
  • this embodiment obtains feature data after conversion of the initial bypass network, and can process the feature data through the output layer in the pre-trained network to obtain a processing result.
  • the loss function between the processing result and the actual processing result corresponding to the processing result can be calculated, and the gradient of the parameters of the initial bypass network is calculated based on the loss function, thereby determining the adjustment direction and size of the parameters to obtain an updated target bypass network.
  • back propagation can be performed based on the processing results of the output layer, and the change of the loss function can be determined according to the processing results and the actual processing results.
  • the parameters of the initial bypass network can be adjusted based on the change of the loss function, so that the processing results of the pre-trained model are closer to the actual processing results.
  • the gradient can be efficiently calculated through back propagation, which speeds up the training of the initial bypass network.
  • back propagation also enables the initial bypass network to learn the characteristics and laws of the data, and continuously optimizes the performance of the pre-trained model by updating the parameters of the initial bypass network, thereby improving the accuracy and generalization ability of the pre-trained model.
  • the backbone network and at least one tuning module are connected based on residuals.
  • the initial bypass network and the backbone network can be connected by residual connection, and the initial bypass network can include at least one horizontal tuning module or a vertical tuning module, so the backbone network and at least one tuning module can be connected based on residual. It should be noted that there is no specific restriction on the number of tuning modules in the initial bypass network.
  • the tuning modules in the initial bypass network can be connected based on residual connections and act in parallel in the backbone network.
  • the tuning module is extracted from the initial backbone network, and the tuning modules are connected through residual connections to construct an initial bypass network, and the initial bypass network can be connected to the backbone network in parallel. During the back propagation process, it is prohibited to transmit the data stream from the initial bypass network to the backbone network, thereby achieving the effect of reducing memory usage.
  • the above-mentioned residual connection can be changed according to actual conditions. For example, it can also be a full connection, a skip connection, etc. Different units can also be combined or replaced inside a single tuning module. No specific restrictions are made here on the connection method between the tuning modules and the components of the tuning modules.
  • the state of the parameters of the backbone network is a locked state.
  • the state of the parameters of the backbone network is a locked state, that is, the parameters of the backbone network are not updated.
  • the locked state can also be called a frozen state, which can be used to characterize that the parameters of the backbone network do not change.
  • the backbone network keeps parameters frozen, and at the same time, data no longer flows to the backbone network during the back propagation process, thereby achieving the purpose of saving memory.
  • a tuning module extracted from the backbone network is used to construct an initial bypass network.
  • the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (new tuning module) relatively independent of the backbone network, so that the initial bypass network
  • a target bypass network new tuning module
  • the disclosed embodiment is directed to an artificial intelligence generated content (AIGC) system and also provides a data processing method for a pre-trained model.
  • FIG3 is a flow chart of a data processing method for a pre-trained model according to an embodiment of the disclosed embodiment. As shown in FIG3, the method may include the following steps.
  • Step S302 responding to the query information received in the generative interactive interface, wherein the query information at least includes: keywords of the text generation information.
  • the generative interaction interface when the generative interaction interface receives the inquiry information, the inquiry information received in the generative interaction interface can be responded to.
  • the generative interaction interface can be an interface that generates a dialogue using a natural speech generation model, and can be an operation interface of a mobile terminal, a client, etc. This is only an example, and there is no specific restriction on the type of generative interaction interface.
  • the inquiry information may at least include: keywords of the text generation information.
  • the inquiry information may be an instruction issued by the client, and may be an instruction to ask questions in a dialogue or interaction scenario to obtain guidance behavior, etc.
  • the user may enter a question or request in the generative interactive interface, "Hello, I would like to know how to unsubscribe from my service?”, and the query information may include the keyword "how" of the text generation information.
  • the query information received in the generated dialogue interface may be responded to.
  • Step S304 calling the backbone network of the pre-trained model to at least analyze the text generation information and output text feature data, wherein the backbone network comes from the initial backbone network.
  • the backbone network of the pre-trained model in response to the query information received in the generative interactive interface, can be called to at least analyze the text generation information and output text feature data.
  • the backbone network can come from the initial backbone network.
  • the text generation information can be used to determine the question raised by the query information.
  • the text feature data can be used to characterize the text content, for example, it can be a vector, texture feature, etc. This is only for example, and no specific limitation is made on the type of feature data.
  • Step S306 calling a target bypass network to convert the text feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.
  • the target bypass network can be called to convert the text feature data.
  • Step S308 generating at least one response result matching the query information based on the converted text feature data.
  • the pre-trained model can process the converted text feature data to generate a response result that matches the inquiry information.
  • Step S310 displaying the reply result in the generative interactive interface.
  • the reply result can be displayed in the generative interactive interface.
  • the reply result can be an image, text, voice, etc., which is only an example and does not specifically limit the content of the reply result.
  • a generative interactive interface receives a query message such as “What is the recipe for pasta?”
  • the backbone network in the pre-trained model can be called to at least analyze the text generation information “recipe for pasta” in the query message, output text feature data, and call the target bypass network to convert the text feature data.
  • the reply result Based on the converted text feature data, at least one reply result matching the query message is generated, and the reply result can be displayed in the generative interactive interface.
  • the method also includes: in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the impact of the tuning module on the text feature data.
  • the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters in the initial bypass network can be used to characterize the influence of the tuning module on the text feature data.
  • the backbone network keeps parameters frozen, and the initial bypass network uses a tuning module extracted from the backbone network, while cutting off the data flow from the tuning module to the backbone network to form an initial bypass network that is relatively independent of the backbone network.
  • a query information received in a generative interactive interface is responded to, wherein the query information includes at least: keywords of text generation information; a backbone network of a pre-trained model is called to at least analyze the text generation information and output text feature data, wherein the backbone network comes from an initial backbone network; a target bypass network is called to convert the text feature data, wherein the target bypass network is obtained by updating the parameters of an initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted text feature data, at least one reply result matching the query information is generated; and the reply result is displayed in the generative interactive interface, thereby achieving the technical effect of improving the efficiency of obtaining feedback results in a generative dialogue product and solving the technical problems of high resource consumption and computational efficiency in the training process of the model.
  • this embodiment can extract the tuning module from the initial backbone network to obtain the tuning module and the backbone network.
  • the extracted tuning modules can be combined to obtain the initial bypass network, and the parameters in the initial bypass network can be updated to obtain the target bypass network.
  • the backbone network in the pre-trained model can be obtained to process the image to be classified and obtain feature data.
  • the trained target bypass network is called to convert the feature data.
  • the feature data converted by the target bypass model can be input into the output layer, and the output layer processes the feature data to obtain the category to which the image to be classified belongs.
  • the overall structure of the pre-trained model (e.g., discriminative task network) can be divided into two parts.
  • One part is the backbone network, which is frozen during the training process.
  • the other part is The trainable initial bypass network (also called bypass structure) can be composed of several tuning modules, where each network layer of the corresponding backbone network has a horizontal tuning module and a vertical tuning module, which are respectively used to receive the feature data output from the layer in the backbone network and the output of the previous layer in the initial bypass network, and the two are weightedly summed and output to the next layer.
  • a back propagation process can be performed based on the output of the corresponding result, and during the back propagation process, the back propagation path from the bypass network to the backbone network will be disconnected to obtain the trained target bypass grid.
  • the backbone model can be obtained to process the image to be classified and obtain the feature data.
  • the target bypass network can be called to convert the feature data, and the converted feature data can be output to the output layer in the pre-trained model to determine the category to which the image to be classified belongs.
  • the feature data of the image to be classified can be determined through the backbone network in the pre-trained model, the target bypass network can be called to convert the feature data, and based on the converted feature data, the category of the image to be classified is determined to be an image containing kittens.
  • step S404 calling the target bypass network to convert the text feature data, includes: calling the first vertical tuning module corresponding to the first decoding layer in the backbone network to convert the text feature data output by the middle layer of the backbone network, and calling the first horizontal tuning module associated with the first vertical tuning module to convert the text feature data output by the first decoding layer.
  • the first vertical tuning module corresponding to the first decoding layer in the backbone network can be called to convert the text feature data output by the middle layer of the backbone network
  • the first horizontal tuning module associated with the first vertical tuning module can be called to convert the text feature data output by the first decoding layer.
  • the overall structure of the pre-trained model may include a backbone network whose parameters are frozen during training and a target bypass network whose parameters are updated, wherein the backbone network may be a deep learning model (U-Net) structure based on stable diffusion (StableDiffusion).
  • U-Net deep learning model
  • StableDiffusion stable diffusion
  • the target bypass network may be a trainable bypass structure, which may be composed of several tuning modules.
  • the backbone network may include three encoding layers, one middle layer and four decoding layers.
  • Each decoding layer corresponds to a horizontal tuning module and a vertical tuning module, and the horizontal tuning module and the vertical tuning module can be used to receive text feature data output from the middle layer or the decoding layer and the text feature data converted from the previous layer in the target bypass network.
  • the first vertical tuning module corresponding to the first decoding layer can be used to receive the text feature data of the middle layer of the backbone network and convert the text feature data of the middle layer.
  • the first horizontal tuning module associated with the first vertical tuning module can obtain the text feature data output by the first decoding layer and convert the text feature data output by the first decoding layer.
  • generating at least one reply result matching the query instruction includes: based on the weight of the target bypass network, performing a weighted summation between the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module to obtain The converted text feature data; based on the converted text feature data, determining a response result.
  • a weighted sum can be performed on the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module to obtain converted text feature data, and a response result can be determined based on the converted text feature data.
  • each corresponding decoding layer of the backbone network corresponds to a horizontal tuning module and a vertical tuning module.
  • the horizontal tuning module and the vertical tuning module can respectively receive the text feature data output from the middle layer or the decoding layer in the backbone network and the text feature data output from the previous layer in the target bypass network, and can perform weighted summation on the two and output them to the vertical tuning module corresponding to the next layer of the backbone network, until the weighted summation result of the horizontal tuning module and the vertical tuning module corresponding to the last decoding layer of the backbone network is output to the generative task network for outputting the corresponding result, so as to obtain the response result.
  • the backbone network may include three encoding layers, one intermediate layer and three decoding layers.
  • Each decoding layer corresponds to a horizontal tuning module and a vertical tuning module, which can be used to receive text feature data output from the intermediate layer or the decoding layer and text feature data converted from the previous layer in the target bypass network.
  • the first vertical tuning module corresponding to the first decoding layer can be used to receive the text feature data of the intermediate layer of the backbone network and convert the text feature data of the intermediate layer.
  • the first horizontal tuning module associated with the first vertical tuning module can obtain the text feature data output by the first decoding layer and convert the text feature data output by the first decoding layer.
  • a weighted sum can be performed between the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module.
  • the text feature data obtained by weighted summing the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module can be input into the second vertical tuning module corresponding to the second decoding layer to obtain the text feature data converted by the second vertical tuning module, and the text feature data output by the second decoding layer is output to the second horizontal tuning module to obtain the corresponding text feature data adjusted by the second horizontal tuning module, and the text feature data adjusted by the second horizontal tuning module and the text feature data adjusted by the second vertical tuning module can be weighted summed.
  • the text feature data obtained by weighted summing the text feature data adjusted by the second horizontal tuning module and the text feature data adjusted by the second vertical tuning module are continuously transmitted to the third vertical tuning module corresponding to the third decoding layer, and the text feature data output by the third decoding layer is transmitted to the third horizontal tuning module corresponding to the third decoding layer, and the text feature data adjusted by the third horizontal tuning module and the text feature data adjusted by the third vertical tuning module can be weighted summed to obtain the text feature data finally converted by the target bypass network.
  • the text feature data can be processed based on the output layer in the pre-trained model to obtain the response result.
  • step S406 determines the reply result, including: using the converted text feature data as the output data of the last decoding layer in the pre-trained model; converting the output data into the reply result.
  • the target bypass model can be used as the last encoding layer in the pre-trained model.
  • the converted text feature data is used as the output data of the last decoding layer in the pre-trained model.
  • the output layer can convert the output data to obtain the response result.
  • the target bypass network in this embodiment can also be used as a branch in a pre-trained model (e.g., a controllable generative task), that is, a coding module for conditional input.
  • a controllable generative task can be a task generated based on controllable conditional data.
  • controllable coding part can also use the target bypass network as a branch to achieve the purpose of reducing memory consumption.
  • feature data obtained by processing an image to be classified by a backbone network in a pre-trained model is obtained, wherein the backbone network comes from an initial backbone network; a target bypass network is called to convert the feature data, wherein the target bypass network is obtained by updating parameters of an initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, the category to which the image to be classified belongs is determined, thereby achieving a technical effect of reducing resource consumption in the training process of the model, and solving the technical problems of high resource consumption and computational efficiency in the training process of the model.
  • FIG4 is a flow chart of another method for processing data of a pre-trained model according to an embodiment of the present disclosure. As shown in FIG4, the method may include the following steps:
  • Step S402 obtaining feature data obtained by processing the conditional data by the backbone network in the pre-trained model, wherein the backbone network comes from the initial backbone network, and the conditional data is used to determine the generation condition of the target data.
  • conditional data can be obtained.
  • the backbone network processes the conditional data to obtain feature data.
  • the conditional data can be used to determine the generation conditions of the target data, for example, it can be "drawing a blue dwarf cat", it can be a line drawing, a depth map, a posture and other contents, and can include text conditions, image conditions, etc. This is only for example explanation, and no specific restrictions are made on the content of the conditional data.
  • the backbone network can be a conditional two-dimensional deep learning model (Conditional 2D U-Net), which can include an encoding layer (Encoder), a decoding layer (Docoder) and a middle layer (Middle).
  • conditional data "write a set of codes for filtering data" can be obtained. Keywords in the conditional data can be extracted and converted, and feature data corresponding to the keywords can be obtained. It should be noted that the content of the above conditional data and the determination of the feature data are only for illustration and are not specifically limited here.
  • Step S404 calling a target bypass network to convert the characteristic data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.
  • a tuning module can be extracted from the backbone network, an initial bypass network can be constructed based on at least one tuning module, and the parameters of the initial bypass network can be updated based on the training data to obtain a target bypass network.
  • the trained target bypass network can be called to convert the feature data.
  • the training data can be data with an identifier, which can be pre-acquired and used to convert the initial bypass network.
  • the data for network training for example, can be image data, text data, voice data, etc., which is only for example and does not limit the type of training data.
  • the parameters of the initial bypass network can be weights, bias data, etc., which is only for example and does not limit the type of parameters.
  • this embodiment extracts a tuning module from the initial backbone network to obtain a tuning module and a backbone network.
  • the extracted tuning modules can be combined to obtain an initial bypass network, and the parameters in the initial bypass network can be updated to obtain a target bypass network.
  • the conditional data of the backbone network in the pre-trained model can be obtained to process the feature data.
  • the trained target bypass network is called to convert the feature data.
  • Step S406 generating target data corresponding to the conditional data based on the converted feature data, wherein the type of the target data includes at least one of the following: text information, image information, video information and voice information.
  • the feature data converted by the target bypass model can be input into the output layer, and the output layer processes the feature data to generate target data corresponding to the conditional data.
  • the target data can be an animation, story, landscape, face image, dialogue, etc. generated based on the conditional data, and the type of the target data can include at least one of the following: text information, image information, video information, and voice information. This is only for example, and the category of the target data is not specifically limited.
  • the pre-trained model can be a conditional pre-trained transformer model (Chat Conditional Pretrained Transformer, referred to as ChatCPT), which can be used to generate dialogue responses and conduct dialogue interactions.
  • ChatCPT conditional Pretrained Transformer
  • the conditional data is the dialogue context
  • the inquiry information can be determined based on the dialogue context.
  • the backbone network in the pre-trained model can be called to process the conditional data, and the feature data can be obtained.
  • the target bypass network is called to transform the feature data; based on the converted feature data, a dialogue response corresponding to the conditional data (i.e., target data) can be generated.
  • the conditional data is a question raised by the user
  • the pre-trained model can be used to generate an answer corresponding to the user's question information.
  • conditional data is a long text message and the user wants to simplify the text message
  • a concise and general summary content can be generated through the pre-trained model.
  • the pre-trained model can assist in writing, and the conditional data can be processed to obtain articles, storylines, dialogue scripts, etc. corresponding to the field data.
  • feature data is obtained by processing conditional data by a backbone network in a pre-trained model, wherein the backbone network comes from an initial backbone network, and the conditional data is used to determine the generation condition of target data; a target bypass network is called to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, target data corresponding to the conditional data is generated, wherein the type of the target data includes at least one of the following: text information, image information, video information, and voice information,
  • a data processing method for a pre-trained model in a virtual reality scenario such as a virtual reality VR device and an augmented reality AR device is also provided.
  • a virtual reality scenario such as a virtual reality VR device and an augmented reality AR device.
  • FIG5 is a flowchart of another method for processing data of a pre-training model according to an embodiment of the present disclosure. As shown in FIG5 , the method may include the following steps.
  • Step S502 input the speech to be converted on a virtual reality VR device or an augmented reality AR device.
  • Step S504 using the backbone model in the pre-trained model to extract feature data from the speech to be converted, wherein the backbone network comes from the initial backbone network.
  • Step S506 calling the target bypass network in the pre-trained model to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.
  • Step S508 determining the image information corresponding to the speech to be converted based on the converted feature data.
  • Step S510 activating the VR device or AR device using the image information, and displaying the image information in the VR device or AR device.
  • the data processing method of the pre-trained model can be applied to a hardware environment composed of a server and a virtual reality device.
  • the VR device or AR device is controlled to perform a human-computer interaction operation corresponding to the positioning information.
  • the server can be a server corresponding to the media file operator.
  • the network includes but is not limited to: a wide area network, a metropolitan area network or a local area network.
  • the virtual reality device is not limited to: a virtual reality helmet, virtual reality glasses, a virtual reality all-in-one machine, etc.
  • the virtual reality device may include: a memory, a processor, and a transmission device.
  • the memory is used to store an application, which can be used to execute: inputting a speech to be converted on a virtual reality VR device or an augmented reality AR device; extracting feature data from the speech to be converted using a backbone model in a pre-trained model, wherein the backbone network comes from an initial backbone network; calling a target bypass network in the pre-trained model to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; determining image information corresponding to the speech to be converted based on the converted feature data; using the image information to activate the VR device or AR device, and displaying the image information in the VR device or AR device.
  • the above-mentioned processing method of the pre-trained model applied in the VR device or AR device in this embodiment may include the method of the embodiment shown in Figure 5, so as to control the VR device or AR device to perform human-computer interaction operations corresponding to the positioning information.
  • the processor of this embodiment can call the application stored in the memory to execute the above steps through a transmission device.
  • the transmission device can receive media files sent by the server through the network, and can also be used for data transmission between the processor and the memory.
  • a head-mounted display with eye tracking is provided, wherein the screen in the HMD is used to display the displayed video images, the eye tracking module in the HMD is used to obtain the real-time movement path of the user's eyes, the tracking system is used to track the user's position information and motion information in the real three-dimensional space, and the computing processing unit is used to obtain the user's real-time position and motion information from the tracking system, and calculate the three-dimensional coordinates of the user's head in the virtual three-dimensional space, as well as the user's field of view direction in the virtual three-dimensional space, etc.
  • a virtual reality device may be connected to a terminal, and the terminal and the server are connected via a network.
  • the virtual reality device is not limited to: a virtual reality helmet, virtual reality glasses, a virtual reality all-in-one machine, etc.
  • the terminal is not limited to a PC, a mobile phone, a tablet computer, etc.
  • the server may be a server corresponding to a media file operator, and the network includes but is not limited to: a wide area network, a metropolitan area network or a local area network.
  • Figure 6 is a schematic diagram of a data processing result of a pre-trained model according to an embodiment of the present disclosure.
  • the speech to be converted "Draw a deer” is displayed on the presentation screen of the VR device or AR device, and the backbone model in the pre-trained model can be used to extract feature data from the speech to be converted, and the target bypass network in the pre-trained model is called to convert the feature data.
  • the image information corresponding to the speech to be converted can be determined; the image information can be used to activate the VR device or AR device, and the image information "deer" can be displayed in the VR device or AR device.
  • the speech to be converted is input on the virtual reality VR device or the augmented reality AR device; the backbone model in the pre-trained model is used to extract feature data from the speech to be converted, wherein the backbone network comes from the initial backbone network; the target bypass network in the pre-trained model is called to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, the image information corresponding to the speech to be converted is determined; the image information is used to activate the VR device or AR device, and the image information is displayed in the VR device or AR device, thereby solving the technical problems of high resource consumption and computing efficiency in the training process of the model, and achieving the technical effect of reducing resource consumption in the training process of the model.
  • a data processing method for a pre-trained model that can be applied to an artificial intelligence generated content (AIGC) system.
  • AIGC artificial intelligence generated content
  • FIG7 is a flowchart of another method for processing data of a pre-training model according to an embodiment of the present disclosure. As shown in FIG7 , the method may include the following steps.
  • Step S702 input multimodal information in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: inquiry information including character information, video frame information including frame image information, and audio information.
  • multimodal information input in the dialogue interface can be obtained, wherein the type of multimodal information can include at least one of the following: query information containing character information, video frame information containing frame image information, and audio information. It should be noted that this is only an example and does not specifically limit the type of multimodal information.
  • the dialogue interface can be the display interface of the terminal where the artificial intelligence content generation system is located. This is only an example.
  • the dialogue interface that can be used for the dialogue system should be within the protection scope of the present disclosure and is not specifically limited here.
  • Step S704 calling the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network.
  • the AIGC system can obtain multimodal information input in the dialogue interface, and call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data.
  • Step S706 calling the target bypass network to convert the characteristic data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.
  • Step S708 generating reply information corresponding to the multimodal information based on the converted feature data, wherein the type of the reply information includes at least one of the following: text information, image information, video information and voice information.
  • the pre-trained model processes the multimodal information to obtain converted feature data.
  • reply information corresponding to the multimodal information can be generated, wherein the type of the reply information includes at least one of the following: text information, image information, video information and voice information. It should be noted that there is no specific restriction on the type of reply information here.
  • the trained pre-trained model can be called to process the multimodal information to obtain response information corresponding to the multimodal information: "The usage scenarios of neural networks can be image usage scenarios, voice usage scenarios, and code direction usage scenarios.”
  • step S706 calling the target bypass network to convert the characteristic data, includes: calling the tuning module in the target bypass network to adjust the characteristic data; and performing weighted summation on the adjusted characteristic data based on the weight of the target bypass network.
  • the multimodal information input by the user in the dialogue interface can be obtained, and the backbone network in the pre-trained model can be called to analyze and process the multimodal information to obtain feature data.
  • the feature data can be input into the trained target bypass network, and the tuning module in the target bypass network adjusts the feature data, and the base The adjusted feature data is weighted and summed in the target bypass network to obtain the final converted feature data.
  • step S708 based on the converted feature data, generates reply information corresponding to the multimodal information, including: processing the converted feature data based on the output layer in the pre-trained model to obtain reply information corresponding to the multimodal information.
  • the converted feature data may be processed based on the output layer in the pre-trained model to convert the feature data into reply information corresponding to the multimodal information.
  • the reply information may be displayed in the dialogue interface.
  • multimodal information is input in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: inquiry information containing character information, video frame information containing frame image information, and audio information; the backbone network in the pre-trained model is called to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network; the target bypass network is called to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, reply information corresponding to the multimodal information is generated, wherein the type of the reply information includes at least one of the following: text information, image information, video information, and voice information, thereby solving the technical problems of high resource consumption and computational efficiency in the training process of the model, and achieving the technical effect of reducing resource consumption in the training process of the model.
  • user information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • user information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the technical solution of the present disclosure, or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of each embodiment of the present disclosure.
  • a storage medium such as ROM/RAM, a disk, or an optical disk
  • a terminal device which can be a mobile phone, a computer, a server, or a network device, etc.
  • FIG8(a) is a schematic diagram of a data processing system for a pre-trained model according to an embodiment of the present disclosure.
  • the data processing system 800 of the pre-trained model may include: a client 802 and a server 804.
  • the client 802 is configured to display a dialogue interface and capture multimodal information input in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: query information containing character information, video frame information containing frame image information, and audio information.
  • the multimodal information is input in the dialogue interface of the client 802 .
  • the server 804 is configured to call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, and call the target bypass network to convert the feature data, and generate reply information corresponding to the multimodal information based on the converted feature data, wherein the backbone network comes from the initial backbone network, the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network, and the type of the reply information includes at least one of the following: text information, image information, video information and voice information.
  • the client 802 may transmit the acquired multimodal information to the server 804.
  • the server 804 may call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, and call the target bypass network to convert the feature data, and generate reply information corresponding to the multimodal information based on the converted feature data.
  • the server 804 may transmit the reply information to the dialogue interface of the client 802 for display.
  • a dialogue interface is displayed through a client, and multimodal information input in the dialogue interface is captured, wherein the type of multimodal information includes at least one of the following: query information including character information, video frame information including frame image information, and audio information; through a server, a backbone network in a pre-trained model is called to at least analyze and process the multimodal information to obtain feature data, and a target bypass network is called to convert the feature data, and based on the converted feature data, reply information corresponding to the multimodal information is generated, wherein the backbone network comes from an initial backbone network, the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network, and the type of reply information includes at least one of the following: text information, image information, video information, and voice information, thereby achieving a technical effect of reducing resource consumption in the training process of the model, and solving the technical problems of high resource consumption and computational efficiency in the training process of the
  • parameter-efficient transfer learning methods based on large-scale pre-trained basic models have achieved great success in various downstream applications.
  • Existing tuning methods usually introduce lightweight and trainable structures to perform fine-tuning for specific tasks while freezing the basic model.
  • most existing parameter-efficient transfer learning methods change the intermediate state of the pre-trained model, so more memory and time are required for training, and redundant calculations of different tasks need to be performed through the backbone during multi-task reasoning, resulting in the problem of high consumption and low computational efficiency of the training process of the pre-trained model.
  • FIG8(b) is a schematic diagram of a tuning module according to an embodiment of the present disclosure.
  • the original tuning method will adjust different optimization methods in the original training architecture according to different specific tasks. Some parts are light-weighted (such as prompt, prefix, adapter, etc.), but this method embeds the parameter-efficient tuning module deeply into the original backbone network, where the original backbone network (initial backbone network) can include a feedforward network and multi-head attention.
  • different tasks need to be calculated through the initial backbone network, resulting in a waste of memory and inference overhead.
  • this embodiment proposes an efficient tuning method for parameters and memory based on a pre-trained model.
  • This method constructs a bypass network (also called a tuning framework) with efficient parameters and memory, extracts the tuning module from the backbone network, and constructs forward and reverse propagation links independent of the backbone network. While maintaining the advantage of efficient parameters, it reduces the memory overhead during training and redundant calculations during reasoning multi-tasks, thereby achieving the purpose of reducing resource consumption during model training and improving computing efficiency.
  • a bypass network also called a tuning framework
  • FIG 8(c) is a schematic diagram of another tuning module according to an embodiment of the present disclosure. As shown in Figure 8(c), this embodiment separates the tuning module from the initial backbone network to form an independent sub-module.
  • the separated sub-modules can act in parallel on the backbone network in the form of residuals and can be freely combined to form an independent initial bypass network.
  • the separated sub-modules can act on the backbone network in the form of residuals, or in other connection modes such as skip connections. No specific restrictions are made on the connection mode here.
  • the initial bypass network can use the intermediate output (feature data) of the backbone network during forward propagation, but during reverse propagation, it is disconnected from the backbone network, and the data flow only passes through the bypass network. Therefore, during reverse propagation, further calculation of the gradient of the backbone network can be avoided, thereby saving memory and improving training speed, thereby achieving the purpose of reducing resource consumption and improving computing efficiency during model training, and solving the problem of high resource consumption and low computing efficiency during model training.
  • the pre-trained model can be composed of a backbone network and an initial bypass network with residual connections, which is both parameter efficient and memory efficient.
  • the backbone network can include multiple convolutional layers.
  • the overall structure of the pre-trained model can be divided into two parts, one part is the backbone model, the parameters of the backbone model are frozen during the training process; the other part is the trainable initial bypass network (also called bypass structure), which can be composed of several tuning modules.
  • the data flow from the tuning module to the backbone network can be cut off to form a new tuning module that is relatively independent of the backbone network.
  • Different tuning modules Res-Tuner
  • the parameters of the initial bypass network can be updated to obtain the target bypass network.
  • the data flow between the initial bypass network and the backbone network is cut off during reverse propagation, the data flow of the tuning module in the initial bypass network does not flow to the backbone network, thereby saving memory and achieving the purpose of reducing memory loss.
  • the output of the initial bypass network can be determined by the following formula:
  • Res-Tuner(x l ) can be used to characterize the output of the horizontal tuning module at layer l. It can be used to characterize the output of the vertical tuning module of the l-1th layer, x0 can be used to characterize the characteristic data of the output of the zeroth layer backbone network, xl can be used to characterize the output characteristics of the lth layer, and ⁇ can be used to characterize the weight of the initial bypass network.
  • each network layer corresponding to the backbone network has a corresponding horizontal tuning module and a vertical tuning module in the initial bypass network.
  • the horizontal tuning module and the vertical tuning module can be used to process the output features from the lth layer of the backbone network and the feature outputs from the l-1th layer of the initial bypass network, respectively, and the outputs of the two can be weighted and summed using a learnable weight ( ⁇ coefficient).
  • the weight coefficients may be continuously adjusted through a back-propagation algorithm.
  • the feature data can be input into the initial bypass network after all the input data are converted by the 12 network layers.
  • the feature data can be transmitted from the first network layer of the backbone network to the first stage of the initial bypass; and then the feature data can be transmitted from the second network layer of the backbone network to the second stage of the initial bypass, and so on.
  • the pre-trained model can be applied to the discriminative task.
  • FIG9 is a schematic diagram of a pre-training model according to an embodiment of the present disclosure.
  • the overall structure of the pre-training model can be divided into two parts.
  • One part is a backbone network, and the backbone network is frozen during the training process, wherein the backbone network may include multiple training layers (e.g., training layer_1, training layer_2...training layer_N), and the above training layer may be a convolutional layer.
  • the backbone network may include multiple training layers (e.g., training layer_1, training layer_2...training layer_N), and the above training layer may be a convolutional layer.
  • the other part is a trainable initial bypass network (also referred to as a bypass structure), and the initial bypass network may be composed of several tuning modules, wherein each network layer of the corresponding backbone network has a horizontal tuning module and a vertical tuning module, which are respectively used to receive the feature data output from the layer in the backbone network and the output of the previous layer in the initial bypass network, and the two are weighted and summed, and output to the next layer, until the converted feature data of the last layer is output to the output layer in the pre-training model for the output of the corresponding result, and the reverse propagation process can be performed based on the output of the corresponding result, and in the process of reverse propagation, the reverse propagation path from the bypass network to the backbone network will be disconnected to obtain the trained target bypass grid.
  • each network layer of the corresponding backbone network has a horizontal tuning module and a vertical tuning module, which are respectively used to receive the feature data output from the layer in the backbone network and the output of the previous layer in the initial bypass network, and the two are
  • the backbone model can be used to process the image to be classified to obtain feature data.
  • the target bypass network can be called to convert the feature data, and the converted feature data can be output to the output layer in the pre-trained model to determine the category to which the image to be classified belongs.
  • the pre-trained model can be applied to the generative task.
  • FIG. 10 is a schematic diagram of another pre-training model according to an embodiment of the present disclosure.
  • the overall structure of the pre-trained model may include a backbone network whose parameters are frozen during training and a target bypass network whose parameters are updated, wherein the backbone network may be a deep learning model (U-Net) structure based on stable diffusion, and may include a coding layer of 64*64, a coding layer of 32*32, a coding layer of 16*16, a coding layer of 8*8, an intermediate layer of 8*8, a decoding layer of 8*8, a decoding layer of 16*16, a decoding layer of 32*32, and a decoding layer of 64*64.
  • the target bypass network may be a trainable bypass structure, and may be composed of several tuning modules.
  • Each corresponding decoding layer of the backbone network corresponds to a horizontal tuning module and a vertical tuning module.
  • the horizontal tuning module and the vertical tuning module can respectively receive feature data output from the intermediate layer or decoding layer in the backbone network and feature data output from the previous layer in the target bypass network, and can perform weighted summation on the two and output them to the vertical tuning module corresponding to the next layer of the backbone network, until the weighted summation result of the horizontal tuning module and the vertical tuning module corresponding to the last decoding layer of the backbone network is output to the generative task network for output of the corresponding result to obtain the target data.
  • the target bypass model can be used as the last encoding layer in the pre-trained model, and the converted feature data can be used as the output data of the last decoding layer in the pre-trained model. Based on the output data, the output layer can convert the output data to obtain the target data.
  • the target bypass network in this embodiment can also be used as a branch in a controllable generative task (pre-training model), that is, it can be an encoding module for conditional input.
  • pre-training model pre-training model
  • Controllable generative tasks are tasks that are generated based on controllable conditional data.
  • the controllable encoding part can also use the target bypass network as a branch to achieve the purpose of reducing memory consumption.
  • the backbone network may include three encoding layers, one intermediate layer, and three decoding layers.
  • Each decoding layer corresponds to a horizontal tuning module and a vertical tuning module, and the horizontal tuning module and the vertical tuning module can be used to receive feature data output from the intermediate layer or the decoding layer and feature data converted from the previous layer in the target bypass network.
  • the first vertical tuning module corresponding to the first decoding layer can be used to receive feature data of the intermediate layer of the backbone network and convert the feature data of the intermediate layer.
  • the first horizontal tuning module associated with the first vertical tuning module can obtain feature data output by the first decoding layer and convert feature data output by the first decoding layer. Based on the weight of the target bypass network, a weighted sum can be performed between the feature data adjusted by the first vertical tuning module and the feature data adjusted by the first horizontal tuning module.
  • the feature data obtained by weighted summing the feature data adjusted by the first vertical tuning module and the feature data adjusted by the first horizontal tuning module can be input into the second vertical tuning module corresponding to the second decoding layer to obtain the feature data converted by the second vertical tuning module, and the feature data output by the second decoding layer is output to the second horizontal tuning module to obtain the corresponding feature data adjusted by the second horizontal tuning module, and the feature data adjusted by the second horizontal tuning module and the feature data adjusted by the second vertical tuning module can be weighted summed.
  • the feature data obtained by weighted summing the feature data adjusted by the second horizontal tuning module and the feature data adjusted by the second vertical tuning module is continuously transmitted to the third vertical tuning module corresponding to the third decoding layer.
  • the feature data output by the third decoding layer is transmitted to the third horizontal tuning module corresponding to the third decoding layer, and the feature data adjusted by the third horizontal tuning module and the feature data adjusted by the third vertical tuning module can be weighted summed to obtain the feature data finally converted by the target bypass network.
  • the feature data can be processed based on the output layer in the pre-trained model to obtain the target data.
  • a parameter- and memory-efficient initial bypass network is constructed with the help of a tuning module separated from the initial backbone, and the weights in the initial bypass network are updated to obtain a target bypass network.
  • this method can greatly reduce memory consumption and the cost of multi-task reasoning, and is applicable to both discriminative and generative tasks.
  • a tuning module extracted from a backbone network is used to construct an initial bypass network. Meanwhile, during the process of adjusting the parameters of the initial bypass network, the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (new tuning module) independent of the backbone network.
  • the user information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • weather forecast results and other data are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the technical solution of the present disclosure, or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of each embodiment of the present disclosure.
  • a storage medium such as ROM/RAM, a disk, or an optical disk
  • a terminal device which can be a mobile phone, a computer, a server, or a network device, etc.
  • a device for updating parameters in a pre-trained model for implementing the method for updating parameters in a pre-trained model shown in FIG. 2 .
  • FIG11 is a schematic diagram of a device for updating parameters in a pre-training model according to an embodiment of the present disclosure.
  • the device 1100 for updating parameters in the pre-training model may include: a first acquisition component 1102, a first A component 1104 is called and a component 1106 is updated.
  • the first acquisition component 1102 is configured to acquire feature data output by a backbone network of a pre-trained model, wherein the backbone network comes from an initial backbone network.
  • a first calling component 1104 is configured to call an initial bypass network to convert feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network;
  • the update component 1106 is configured to update the parameters of the initial bypass network based on the converted characteristic data to obtain a target bypass network, wherein in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the characteristic data.
  • the first acquisition component 1102, the first calling component 1104 and the update component 1106 correspond to steps S202 to S206 in Example 1, and the three components and the corresponding steps implement the same instances and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1.
  • the above-mentioned components can be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned components can also be run in the computer terminal provided in Example 1 as part of the device.
  • another data processing device for a pre-trained model for implementing the data processing method for a pre-trained model shown in FIG. 3 is also provided.
  • Figure 12 is a schematic diagram of a data processing device for a pre-trained model according to an embodiment of the present disclosure.
  • the data processing device 1200 for the pre-trained model may include: a first processing component 1202, a second processing component 1204, a first conversion component 1206, a first generation component 1208 and a display component 1210.
  • the first processing component 1202 is configured to respond to query information received in the generative interactive interface, wherein the query information at least includes: keywords of the text generation information.
  • the second processing component 1204 is configured to call a backbone network of a pre-trained model to at least analyze text generation information and output text feature data, wherein the backbone network comes from an initial backbone network.
  • the first conversion component 1206 is configured to call a target bypass network to convert text feature data, wherein the target bypass network is obtained by updating parameters of an initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from an initial backbone network.
  • a first generating component 1208 is configured to generate at least one response result matching the query instruction based on the converted text feature data
  • the display component 1210 is configured to display the reply result in the generative interactive interface.
  • first processing component 1202, the second processing component 1204, the first conversion component 1206, the first generation component 1208 and the display component 1210 correspond to steps S302 to S310 in Example 1.
  • the five components and the corresponding steps implement the same instances and application scenarios, but are not limited to the contents disclosed in Example 1.
  • the above components can be hardware components or software components stored in a memory and processed by one or more processors. The above components can also be part of the device and can be run on the computer provided in Example 1. in a computer terminal.
  • another data processing device for a pre-trained model for implementing the data processing method for a pre-trained model shown in FIG. 4 is also provided.
  • Figure 13 is a schematic diagram of another data processing device for a pre-trained model according to an embodiment of the present disclosure.
  • the data processing device 1300 for the pre-trained model may include: a second acquisition component 1302, a second conversion component 1304 and a second generation component 1306.
  • the second acquisition component 1302 is configured to obtain feature data obtained by processing the conditional data by the backbone network in the pre-trained model, wherein the backbone network comes from the initial backbone network, and the conditional data is used to determine the generation conditions of the target data.
  • a second conversion component 1304 is configured to call a target bypass network to convert the feature data, wherein the target bypass network is obtained by updating parameters of an initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network;
  • the second generating component 1306 is configured to generate target data corresponding to the conditional data based on the converted feature data, wherein the type of the target data includes at least one of the following: text information, image information, video information and voice information.
  • the second acquisition component 1302, the second conversion component 1304 and the second generation component 1306 correspond to steps S402 to S406 in Example 1, and the three components and the corresponding steps implement the same examples and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1.
  • the above-mentioned components can be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned components can also be run in the computer terminal provided in Example 1 as part of the device.
  • another data processing device for a pre-trained model for implementing the data processing method for a pre-trained model shown in FIG. 5 is also provided.
  • Figure 14 is a schematic diagram of another data processing device of a pre-trained model according to an embodiment of the present disclosure.
  • the data processing device 1400 of the pre-trained model may include: an input component 1402, a third processing component 1404, a third conversion component 1406 and a third generation component 1408.
  • the input component 1402 is configured to input multimodal information in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: query information including character information, video frame information including frame image information, and audio information.
  • the third processing component 1404 is configured to call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network.
  • the third conversion component 1406 is configured to call a target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.
  • the third generating component 1408 is configured to generate an answer corresponding to the multimodal information based on the converted feature data.
  • the reply information includes at least one of the following types: text information, image information, video information and voice information.
  • the above-mentioned input component 1402, the third processing component 1404, the third conversion component 1406 and the third generation component 1408 correspond to steps S702 to S708 in Example 1, and the four components and the corresponding steps implement the same examples and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1.
  • the above-mentioned components can be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned components can also be run in the computer terminal provided in Example 1 as part of the device.
  • a tuning module extracted from the backbone network is used to construct an initial bypass network.
  • the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (new tuning module) that is relatively independent of the backbone network.
  • the embodiment of the present disclosure may provide a computer terminal, which may be any computer terminal device in a computer terminal group.
  • the computer terminal may also be replaced by a terminal device such as a mobile terminal.
  • the computer terminal may be located in at least one network device among a plurality of network devices of the computer network.
  • the above-mentioned computer terminal can execute the program code of the following steps in the method for updating parameters in the pre-trained model: obtaining feature data output by the backbone network of the pre-trained model, wherein the backbone network comes from the initial backbone network; calling the initial bypass network to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; updating the parameters of the initial bypass network based on the converted feature data to obtain a target bypass network, wherein, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the feature data.
  • Fig. 15 is a structural block diagram of a computer terminal according to an embodiment of the present disclosure.
  • the computer terminal A may include: one or more (only one is shown in the figure) processors 1502 , a memory 1504 , and a transmission device 1506 .
  • the memory can be used to store software programs and modules, such as the program instructions/modules corresponding to the method and device for updating parameters in the pre-trained model in the embodiment of the present disclosure.
  • the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, realizing the method for updating parameters in the pre-trained model mentioned above.
  • the memory may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory may further include a memory remotely located relative to the processor, and these remote memories may be connected via a network.
  • Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the processor can call the information and application stored in the memory through the transmission device to perform the following steps: obtain the feature data output by the backbone network of the pre-trained model, wherein the backbone network comes from the initial backbone network; call the initial bypass network to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; update the parameters of the initial bypass network based on the converted feature data to obtain the target bypass network, wherein in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the feature data.
  • the processor may also execute program codes of the following steps: calling a tuning module in the initial bypass network to adjust the characteristic data; and performing weighted summation on the adjusted characteristic data based on the weight of the initial bypass network.
  • the processor may also execute program codes of the following steps: calling a tuning module in the initial bypass network to adjust the characteristic data; and performing weighted summation on the adjusted characteristic data based on the weight of the initial bypass network.
  • the processor may also execute the following program code: based on the weight of the initial bypass network, perform weighted summation of the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module to obtain the first feature data.
  • the processor may also execute program code of the following steps: in response to the first-layer backbone network being the last layer of network in the backbone network, updating parameters of the initial bypass network based on the first characteristic data to obtain a target bypass network.
  • the processor may also execute the program code of the following steps: in response to the last layer of network in the non-backbone network of the first layer backbone network, calling the second vertical tuning module corresponding to the second layer backbone network in the backbone network to adjust the first feature data, and calling the second horizontal tuning module associated with the second vertical tuning module to adjust the feature data of the second layer backbone network in the backbone network; based on the weight of the initial bypass network, performing weighted summation on the adjusted first feature data corresponding to the second vertical tuning module and the adjusted feature data corresponding to the second horizontal tuning module to obtain the weighted summed second feature data; in response to the last layer of network in the non-backbone network of the second layer backbone network, The following steps are performed: the second-layer backbone network is determined as the first-layer backbone network, and the third-layer backbone network in the backbone network is determined as the second-layer backbone network; the second vertical tuning module is called to adjust the first feature data, and the second horizontal tuning module is called to adjust the feature data of
  • the processor may also execute the program code of the following steps: determining a processing result obtained by processing the converted feature data by the output layer in the pre-trained model; determining a difference value between the processing result and a true processing result corresponding to the processing result; and adjusting the parameters of the initial bypass network based on the difference value to obtain a target bypass network.
  • the processor can call the information and application stored in the memory through the transmission device to perform the following steps: respond to the query information received in the generative interactive interface, wherein the query information at least includes: keywords of the text generation information; call the backbone network of the pre-trained model to at least analyze the text generation information and output text feature data, wherein the backbone network comes from the initial backbone network; call the target bypass network to convert the text feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted text feature data, generate at least one reply result that matches the query instruction; and display the reply result in the generative interactive interface.
  • the query information at least includes: keywords of the text generation information
  • call the backbone network of the pre-trained model to at least analyze the text generation information and output text feature data, wherein the backbone network comes from the initial backbone network
  • call the target bypass network to convert the text feature data, where
  • the processor may also execute program code of the following steps: in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the characteristic data.
  • the processor may call the information and application stored in the memory through the transmission device to perform the following steps: obtaining feature data obtained by processing the conditional data by the backbone network in the pre-trained model, wherein the backbone network comes from the initial backbone network, and the conditional data is used to determine the generation conditions of the target data; calling the target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, generating target data corresponding to the conditional data, wherein the type of the target data includes at least one of the following: text information, image information, video information and voice information.
  • the processor may also execute the program code of the following steps: calling the first vertical tuning module corresponding to the first decoding layer in the backbone network to convert the text feature data output by the middle layer of the backbone network, and calling the first horizontal tuning module associated with the first vertical tuning module to convert the text feature data output by the first decoding layer.
  • the processor may also execute the program code of the following steps: based on the weight of the target bypass network, performing weighted summation on the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module to obtain converted text feature data; and determining a response result based on the converted text feature data.
  • the processor may also execute the following program code: using the converted text feature data as output data of the last decoding layer in the pre-trained model; and converting the output data into a response result.
  • the processor can call the information and application stored in the memory through the transmission device.
  • the program is used to execute the following steps: input multimodal information in a dialogue interface, wherein the type of the multimodal information includes at least one of the following: text information containing character information, video frame information containing frame image information, and audio information; call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network; call the target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, generate reply information corresponding to the multimodal information, wherein the type of the reply information includes at least one of the following: text information, image information, video information, and voice information.
  • the processor may also execute program codes of the following steps: calling a tuning module in the target bypass network to adjust the characteristic data; and performing weighted summation on the adjusted characteristic data based on the weight of the target bypass network.
  • the processor may also execute the following program code: processing the converted feature data based on the output layer in the pre-trained model to obtain response information corresponding to the multimodal information.
  • a tuning module extracted from the backbone network is used to construct an initial bypass network.
  • the data flow from the initial bypass network to the backbone network is truncated to obtain a target bypass network (new tuning module) independent of the backbone network.
  • the structure shown in FIG. 15 is for illustration only, and the computer terminal A may also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a PDA, a mobile Internet device (MID), a PAD, and other terminal devices.
  • FIG. 15 does not limit the structure of the above-mentioned computer terminal A.
  • the computer terminal A may also include more or fewer components (such as a network interface, a display device, etc.) than those shown in FIG. 15 , or may have a configuration different from that shown in FIG. 15 .
  • a person of ordinary skill in the art may understand that all or part of the steps in the various methods of the above embodiments may be completed by instructing the hardware related to the terminal device through a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: a flash drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, etc.
  • the embodiment of the present disclosure further provides a computer-readable storage medium.
  • the computer-readable storage medium can be used to store the program code executed by the method for updating parameters in the pre-trained model provided in the first embodiment.
  • the computer-readable storage medium may be located in any one of the computer terminals in a computer terminal group in a computer network, or in any one of the mobile terminals in a mobile terminal group.
  • the computer readable storage medium is configured to store a computer readable storage medium for executing the above-mentioned processor.
  • the information stored in the memory and the program code executed by the application program can be called up through the transmission device.
  • a tuning module extracted from a backbone network is used to construct an initial bypass network.
  • the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (new tuning module) independent of the backbone network.
  • An embodiment of the present disclosure may provide an electronic device, which may include a memory and a processor.
  • Figure 16 is a block diagram of an electronic device according to a method for updating parameters in a pre-trained model according to an embodiment of the present disclosure.
  • the electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • the electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present disclosure described and/or required herein.
  • the device 1600 includes a computing unit 1601, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 1602 or a computer program loaded from a storage unit 1608 to a random access memory (RAM) 1603.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for the operation of the device 1600 can also be stored.
  • the computing unit 1601, the ROM 1602, and the RAM 1603 are connected to each other via a bus 1604.
  • An input/output (I/O) interface 1605 is also connected to the bus 1604.
  • a number of components in the device 1600 are connected to the I/O interface 1605, including: an input unit 1606, such as a keyboard, a mouse, etc.; an output unit 1604, such as various types of displays, speakers, etc.; a storage unit 1608, such as a disk, an optical disk, etc.; and a communication unit 1609, such as a network card, a modem, a wireless communication transceiver, etc.
  • the communication unit 1609 allows the device 1600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 1601 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the computing unit 1601 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 1601 performs the various methods and processes described above, such as a method for updating parameters in a pre-trained model.
  • the method for updating parameters in a pre-trained model may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as a storage unit 1608.
  • part or all of the computer program may be transmitted to a computer system via ROM 1602 and/or a communication unit.
  • the computer program is loaded and/or installed on the device 1600 by the computing unit 16010.
  • the computing unit 1601 may be configured to execute the method for updating the parameters in the pre-trained model by any other appropriate means (e.g., by means of firmware).
  • Various embodiments of the systems and techniques described above herein may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard parts (ASSPs), system on chip systems (SOCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard parts
  • SOCs system on chip systems
  • CPLDs complex programmable logic devices
  • These various embodiments may include: being implemented in one or more computer programs that are executable and/or interpreted on a programmable system including at least one programmable processor that may be a special purpose or general purpose programmable processor that may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • a programmable processor that may be a special purpose or general purpose programmable processor that may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • the program code for implementing the method of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that the program code, when executed by the processor or controller, implements the functions/operations specified in the flow chart and/or block diagram.
  • the program code may be executed on the machine, partially on the machine, partially on the machine as a stand-alone software package and partially on a remote machine, or entirely on a remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or equipment, or any suitable combination of the foregoing.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory, a read-only memory, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • the systems and techniques described herein may be implemented on a computer having: a display device (e.g., a cathode ray tube or liquid crystal display, monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or path ball) through which the user can provide input to the computer.
  • a display device e.g., a cathode ray tube or liquid crystal display, monitor
  • a keyboard and pointing device e.g., a mouse or path ball
  • Other types of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); And the input from the user can be received in any form (including acoustic input, voice input or tactile input).
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such back-end components, middleware components, or front-end components.
  • the components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area networks, wide area networks, and the Internet.
  • a computer system may include a client and a server.
  • the client and the server are generally remote from each other and usually interact through a communication network.
  • the relationship of client and server is generated by computer programs running on respective computers and having a client-server relationship with each other.
  • the server may be a cloud server, a server of a distributed system, or a server combined with a blockchain.
  • the disclosed technical content can be implemented in other ways.
  • the device embodiments described above are only schematic, for example, the division of units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of units or modules, which can be electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present disclosure, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server or network device, etc.) to execute the methods of each embodiment of the present disclosure.
  • the aforementioned storage medium includes: a USB flash drive, a read-only memory, a random access memory, a mobile hard disk, a magnetic disk or an optical disk, and other media that can store program codes.
  • the solution provided by the embodiment of the present disclosure can be applied to the process of model training to obtain feature data output by a backbone network of a pre-trained model, wherein the backbone network comes from an initial backbone network; call an initial bypass network to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; update the parameters of the initial bypass network based on the converted feature data to obtain a target bypass network, wherein, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the feature data, thereby solving the technical problems of high resource consumption and computational efficiency in the model training process.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to the fields of large model technology and data processing. Disclosed are a method for updating parameters of a pre-trained model and a data processing method for a pre-trained model. The method for updating the parameters of the pre-trained model comprises: acquiring feature data output by a backbone network of the pre-trained model, wherein the backbone network comes from an initial backbone network; calling an initial bypass network to convert the feature data, wherein the initial bypass network is constructed on the basis of at least one tuning module, and the tuning module is extracted from the initial backbone network; updating parameters of the initial bypass network on the basis of the converted feature data to obtain a target bypass network, wherein in the process of updating the parameters of the initial bypass network, data streams of the initial bypass network are independent of data streams of the backbone network, and the parameters of the initial bypass network are used for representing the effect of the tuning module on the feature data.

Description

预训练模型中参数的更新和预训练模型的数据处理方法Updating parameters in pre-trained models and data processing methods for pre-trained models

交叉援引Cross-references

本公开要求于2023年08月09日提交中国专利局、申请号为202311003358.5、发明名称为“预训练模型中参数的更新和预训练模型的数据处理方法”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application filed with the China Patent Office on August 9, 2023, with application number 202311003358.5 and invention name “Updating parameters in a pre-trained model and data processing method for a pre-trained model”, the entire contents of which are incorporated by reference in this disclosure.

技术领域Technical Field

本公开涉及大模型领域、数据处理领域,具体而言,涉及一种预训练模型中参数的更新和预训练模型的数据处理方法。The present disclosure relates to the field of large models and data processing, and in particular, to a method for updating parameters in a pre-trained model and processing data of the pre-trained model.

背景技术Background Art

目前,针对大规模预训练模型的参数高效迁移,通常是依据预训练模型执行任务的不同,对大规模预训练模型的原始训练架构中的不同部分进行轻量化调整,但是,在预训练模型训练时,仍需要通过初始主干网络对不同任务进行冗余的计算,使得模型的训练过程资源消耗多,导致生成式对话产品中获取反馈结果的效率低,从而存在模型的训练过程资源消耗多、计算效率低的技术问题。At present, for efficient parameter migration of large-scale pre-trained models, different parts of the original training architecture of the large-scale pre-trained models are usually lightweight adjusted according to the different tasks performed by the pre-trained models. However, when training the pre-trained models, it is still necessary to perform redundant calculations on different tasks through the initial backbone network, which makes the model training process consume a lot of resources and leads to low efficiency in obtaining feedback results in generative dialogue products. As a result, there are technical problems such as high resource consumption and low computational efficiency in the model training process.

针对上述的问题,目前尚未提出有效的解决方案。To address the above-mentioned problems, no effective solution has been proposed yet.

发明内容Summary of the invention

本公开实施例提供了一种预训练模型中参数的更新和预训练模型的数据处理方法,以至少解决模型的训练过程资源消耗多、计算效率的技术问题。The disclosed embodiments provide a method for updating parameters in a pre-trained model and processing data of the pre-trained model, so as to at least solve the technical problems of high resource consumption and computational efficiency in the training process of the model.

根据本公开实施例的一个方面,提供了一种预训练模型中参数的更新方法。该方法可以包括:获取预训练模型的主干网络输出的特征数据,其中,主干网络来自初始主干网络;调用初始旁路网络对特征数据进行转换,其中,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,其中,在更新初始旁路网络的参数的过程中,初始旁路网络的数据流独立于主干网络的数据流,初始旁路网络的参数用于表征调优模块对特征数据的影响情况。According to one aspect of an embodiment of the present disclosure, a method for updating parameters in a pre-trained model is provided. The method may include: obtaining feature data output by a backbone network of the pre-trained model, wherein the backbone network comes from an initial backbone network; calling an initial bypass network to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; updating the parameters of the initial bypass network based on the converted feature data to obtain a target bypass network, wherein, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the feature data.

根据本公开实施例的另一方面,还提供了一种预训练模型的数据处理方法。该方法可以包括:响应生成式交互界面中接收到的询问信息,其中,询问信息至少包括:文本生成信息的关键词;调用预训练模型的主干网络至少对文本生成信息进行分析,输出文本特征数据,其中,主干网络来自初始主干网络;调用目标旁路网络对文本特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取得到;基于转换后的文本特征数据,生成至少一与询问指令匹配的答复结果;将答复结果显 示在生成式交互界面中。According to another aspect of the embodiment of the present disclosure, a data processing method for a pre-trained model is also provided. The method may include: responding to an inquiry message received in a generative interactive interface, wherein the inquiry message includes at least: keywords of text generation information; calling the backbone network of the pre-trained model to at least analyze the text generation information and output text feature data, wherein the backbone network comes from the initial backbone network; calling the target bypass network to convert the text feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted text feature data, generating at least one reply result that matches the inquiry instruction; displaying the reply result The generated interactive interface.

根据本公开实施例的另一方面,还提供了另一种预训练模型的数据处理方法。该方法可以包括:获取预训练模型中主干网络对条件数据进行处理,得到的特征数据,其中,主干网络来自初始主干网络,条件数据用于确定目标数据的生成条件;调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,生成与条件数据对应的目标数据,其中,目标数据的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。According to another aspect of the embodiment of the present disclosure, another data processing method for a pre-trained model is also provided. The method may include: obtaining feature data obtained by processing conditional data by a backbone network in a pre-trained model, wherein the backbone network comes from an initial backbone network, and the conditional data is used to determine the generation conditions of target data; calling a target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, generating target data corresponding to the conditional data, wherein the type of the target data includes at least one of the following: text information, image information, video information, and voice information.

根据本公开实施例的另一方面,还提供了另一种预训练模型的数据处理方法。该方法可以包括:在对话界面中输入多模态信息,其中,多模态信息的类型包括如下至少之一:包含字符信息的文本信息、包含帧图像信息的视频帧信息、音频信息;调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据,其中,主干网络来自初始主干网络;调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,生成与多模态信息对应的答复信息,其中,答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。According to another aspect of the embodiment of the present disclosure, another data processing method for a pre-trained model is also provided. The method may include: inputting multimodal information in a dialogue interface, wherein the type of the multimodal information includes at least one of the following: text information containing character information, video frame information containing frame image information, and audio information; calling the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network; calling the target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, generating reply information corresponding to the multimodal information, wherein the type of the reply information includes at least one of the following: text information, image information, video information, and voice information.

根据本公开实施例的另一方面,还提供了一种预训练模型的数据处理系统。该系统可以包括:客户端,设置为显示对话界面,并捕获在对话界面中输入的多模态信息,其中,多模态信息的类型包括如下至少之一:包含字符信息的询问信息、包含帧图像信息的视频帧信息、音频信息;服务端,设置为调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据,且调用目标旁路网络对特征数据进行转换,基于转换后的特征数据,生成与多模态信息对应的答复信息,其中,主干网络来自初始主干网络,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出,答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。According to another aspect of the embodiment of the present disclosure, a data processing system of a pre-trained model is also provided. The system may include: a client, configured to display a dialogue interface and capture multimodal information input in the dialogue interface, wherein the type of multimodal information includes at least one of the following: inquiry information containing character information, video frame information containing frame image information, and audio information; a server, configured to call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, and call the target bypass network to convert the feature data, and based on the converted feature data, generate reply information corresponding to the multimodal information, wherein the backbone network comes from the initial backbone network, the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network, and the type of reply information includes at least one of the following: text information, image information, video information, and voice information.

根据本公开实施例的另一方面,还提供了另一种预训练模型的数据处理方法。该方法可以包括:在虚拟现实VR设备或增强现实AR设备上输入待转换语音;使用预训练模型中的主干模型从待转换语音中提取出特征数据,其中,主干网络来自初始主干网络;调用预训练模型中的目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,确定待转换语音对应的图像信息;使用图像信息激活VR设备或AR设备,并将图像信息展示在VR设备或AR设备中。 According to another aspect of the embodiment of the present disclosure, another data processing method of a pre-trained model is also provided. The method may include: inputting a speech to be converted on a virtual reality VR device or an augmented reality AR device; extracting feature data from the speech to be converted using a backbone model in the pre-trained model, wherein the backbone network comes from an initial backbone network; calling a target bypass network in the pre-trained model to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, determining the image information corresponding to the speech to be converted; using the image information to activate the VR device or AR device, and displaying the image information in the VR device or AR device.

根据本公开实施例的另一方面,还提供了一种电子设备,该电子设备可以包括存储器和处理器;存储器用于存储计算机可执行指令,处理器用于执行计算机可执行指令,上述计算机可执行指令被处理器执行时,实现上述任意一项的上述方法。According to another aspect of an embodiment of the present disclosure, an electronic device is also provided, which may include a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions. When the above-mentioned computer-executable instructions are executed by the processor, the above-mentioned method of any one of the above-mentioned items is implemented.

根据本公开实施例的另一方面,还提供了一种处理器,该处理器用于运行程序,其中,在程序运行时执行上述任意一项的上述方法。According to another aspect of an embodiment of the present disclosure, a processor is further provided, the processor being used to run a program, wherein any one of the above methods is executed when the program is running.

根据本公开实施例的另一方面,还提供了一种计算机可读存储介质,计算机可读存储介质包括存储的程序,其中,在程序运行时控制存储介质所在设备执行上述任意一项的上述方法。According to another aspect of an embodiment of the present disclosure, a computer-readable storage medium is further provided, the computer-readable storage medium including a stored program, wherein when the program is executed, the device where the storage medium is located is controlled to execute any one of the above methods.

根据本公开实施例的另一方面,还提供了一种计算机程序产品,包括非易失性计算机可读存储介质,该非易失性计算机可读存储介质存储计算机程序,该计算机程序被处理器执行时实现上述任一项所述的方法的步骤。According to another aspect of an embodiment of the present disclosure, a computer program product is also provided, including a non-volatile computer-readable storage medium, wherein the non-volatile computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of any of the methods described above are implemented.

在本公开实施例中,获取预训练模型的主干网络输出的特征数据,其中,主干网络来自初始主干网络;调用初始旁路网络对特征数据进行转换,其中,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,其中,在更新初始旁路网络的参数的过程中,初始旁路网络的数据流独立于主干网络的数据流,初始旁路网络的参数用于表征调优模块对特征数据的影响情况。也即,在该实施例中使用了从主干网络中抽离的调优模块构建初始旁路网络,同时,在初始旁路网络参数调整的过程中,将初始旁路网络到主干网络的数据流进行截断,得到与主干网络独立的目标旁路网络(例如,新调优模块),从而使得在初始旁路网络训练过程中,不需要进一步计算主干网络的参数梯度,实现了内存的节省和训练速度的提升,进而达到了减少模型的训练过程中的资源消耗技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。In an embodiment of the present disclosure, feature data output by a backbone network of a pre-trained model is obtained, wherein the backbone network comes from an initial backbone network; an initial bypass network is called to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; parameters of the initial bypass network are updated based on the converted feature data to obtain a target bypass network, wherein, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the feature data. That is, in this embodiment, a tuning module extracted from the backbone network is used to construct an initial bypass network. At the same time, during the adjustment of the parameters of the initial bypass network, the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (for example, a new tuning module) independent of the backbone network. As a result, during the training of the initial bypass network, there is no need to further calculate the parameter gradient of the backbone network, thereby saving memory and improving the training speed, thereby achieving the technical effect of reducing resource consumption during the training process of the model, and solving the technical problems of high resource consumption and computing efficiency during the training process of the model.

容易注意到的是,上面的通用描述和后面的详细描述仅仅是为了对本公开进行举例和解释,并不构成对本公开的限定。It is easily noted that the above general description and the following detailed description are merely for exemplifying and explaining the present disclosure, and do not constitute a limitation of the present disclosure.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本公开的进一步理解,构成本公开的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:The drawings described herein are used to provide a further understanding of the present disclosure and constitute a part of the present disclosure. The illustrative embodiments of the present disclosure and their descriptions are used to explain the present disclosure and do not constitute an improper limitation on the present disclosure. In the drawings:

图1是根据本公开实施例的一种预训练模型中参数的更新方法的应用场景的示意图;FIG1 is a schematic diagram of an application scenario of a method for updating parameters in a pre-training model according to an embodiment of the present disclosure;

图2是根据本公开实施例的一种预训练模型中参数的更新方法的流程图;FIG2 is a flow chart of a method for updating parameters in a pre-training model according to an embodiment of the present disclosure;

图3是根据本公开实施例的一种预训练模型的数据处理方法的流程图;FIG3 is a flow chart of a data processing method for a pre-training model according to an embodiment of the present disclosure;

图4是根据本公开实施例的另一种预训练模型的数据处理方法的流程图; FIG4 is a flow chart of another method for processing data of a pre-training model according to an embodiment of the present disclosure;

图5是根据本公开实施例的另一种预训练模型的数据处理方法的流程图;FIG5 is a flow chart of another method for processing data of a pre-training model according to an embodiment of the present disclosure;

图6是根据本公开实施例的一种预训练模型的数据处理结果的示意图;FIG6 is a schematic diagram of a data processing result of a pre-training model according to an embodiment of the present disclosure;

图7是根据本公开实施例的另一种预训练模型的数据处理方法的流程图;FIG7 is a flow chart of another method for processing data of a pre-training model according to an embodiment of the present disclosure;

图8(a)是根据本公开实施例的一种预训练模型的数据处理系统的示意图;FIG8( a ) is a schematic diagram of a data processing system for a pre-training model according to an embodiment of the present disclosure;

图8(b)是根据本公开实施例的一种调优模块的示意图;FIG8( b ) is a schematic diagram of a tuning module according to an embodiment of the present disclosure;

图8(c)是根据本公开实施例的另一种调优模块的示意图;FIG8( c ) is a schematic diagram of another tuning module according to an embodiment of the present disclosure;

图9是根据本公开实施例的一种预训练模型的示意图;FIG9 is a schematic diagram of a pre-training model according to an embodiment of the present disclosure;

图10是根据本公开实施例的另一种预训练模型的示意图;FIG10 is a schematic diagram of another pre-training model according to an embodiment of the present disclosure;

图11是根据本公开实施例的一种预训练模型中参数的更新装置的示意图;FIG11 is a schematic diagram of a device for updating parameters in a pre-training model according to an embodiment of the present disclosure;

图12是根据本公开实施例的一种预训练模型的数据处理装置的示意图;FIG12 is a schematic diagram of a data processing device for a pre-training model according to an embodiment of the present disclosure;

图13是根据本公开实施例的另一种预训练模型的数据处理装置的示意图;FIG13 is a schematic diagram of another data processing device for a pre-training model according to an embodiment of the present disclosure;

图14是根据本公开实施例的另一种预训练模型的数据处理装置的示意图;FIG14 is a schematic diagram of another data processing device for a pre-training model according to an embodiment of the present disclosure;

图15是根据本公开实施例的一种计算机终端的结构框图;FIG15 is a structural block diagram of a computer terminal according to an embodiment of the present disclosure;

图16是根据本公开实施例的一种预训练模型中参数的更新方法的电子设备的框图。FIG16 is a block diagram of an electronic device according to a method for updating parameters in a pre-training model according to an embodiment of the present disclosure.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。In order to enable those skilled in the art to better understand the scheme of the present disclosure, the technical scheme in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in the field without creative work should fall within the scope of protection of the present disclosure.

需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the specification and claims of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments of the present disclosure described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products, or devices.

本公开提供的技术方案主要采用大模型技术实现,此处的大模型是指具有大规模模型参数的深度学习模型,通常可以包含上亿、上百亿、上千亿、上万亿甚至十万亿以上的模型参数。大模型又可以称为基石模型/基础模型(Foundation Model),通过大规模无标注的语料进行大模型的预训练,产出亿级以上参数的预训练模型,这种模型能适应广泛的下游任务,模型具有较好的泛化能力,例如大规模语言模型(Large Language Model,LLM)、多模态预训练模型(multi-modal pre-training model)等。 The technical solution provided by the present disclosure is mainly implemented by large-scale model technology. The large model here refers to a deep learning model with large-scale model parameters, which can usually contain hundreds of millions, tens of billions, hundreds of billions, trillions or even more than ten trillion model parameters. The large model can also be called a cornerstone model/foundation model (Foundation Model). The large model is pre-trained through large-scale unlabeled corpus to produce a pre-trained model with more than 100 million parameters. This model can adapt to a wide range of downstream tasks, and the model has good generalization ability, such as large-scale language model (Large Language Model, LLM), multi-modal pre-training model (multi-modal pre-training model), etc.

需要说明的是,大模型在实际应用时,可以通过少量样本对预训练模型进行微调,使得大模型可以应用于不同的任务中。例如,大模型可以广泛应用于自然语言处理(Natural Language Processing,简称NLP)、计算机视觉等领域,具体可以应用于如视觉问答(Visual Question Answering,简称VQA)、图像描述(Image Caption,简称IC)、图像生成等计算机视觉领域任务,也可以广泛应用于基于文本的情感分类、文本摘要生成、机器翻译等自然语言处理领域任务。因此,大模型主要的应用场景包括但不限于数字助理、智能机器人、搜索、在线教育、办公软件、电子商务、智能设计等。在本公开实施例中,以机器智能技术、基础视觉智能、视频人工智能(Artificial Intelligence,简称为AI)等场景下通过预训练模型进行数据处理为例进行解释说明。It should be noted that when the large model is actually applied, the pre-trained model can be fine-tuned through a small number of samples so that the large model can be applied to different tasks. For example, the large model can be widely used in natural language processing (NLP), computer vision and other fields, and can be specifically applied to computer vision tasks such as visual question answering (VQA), image caption (IC), image generation, etc. It can also be widely used in natural language processing tasks such as text-based sentiment classification, text summary generation, and machine translation. Therefore, the main application scenarios of the large model include but are not limited to digital assistants, intelligent robots, search, online education, office software, e-commerce, intelligent design, etc. In the embodiments of the present disclosure, data processing through pre-trained models in scenarios such as machine intelligence technology, basic visual intelligence, and video artificial intelligence (AI) are used as examples for explanation.

首先,在对本公开实施例进行描述的过程中出现的部分名词或术语适用于如下解释:First, some nouns or terms that appear in the process of describing the embodiments of the present disclosure are subject to the following explanations:

预训练模型(Pre-trainedModel),可以为在具有代表性的数据集上进行大规模训练、调优后产出的模型,可以用于作为其他下游任务训练时候的初始化模型,可以用于加快训练速度或获得更好的效果;Pre-trained models can be produced after large-scale training and tuning on representative data sets. They can be used as initialization models for training other downstream tasks, and can be used to speed up training or achieve better results.

基础模型(FoundationModel),可以为在不同领域中,通过大量数据、强大的算力、精心设计的结构而训练出来的模型,可以用于处理下游任务;Foundation Model: A model trained in different fields using large amounts of data, powerful computing power, and a well-designed structure, which can be used to process downstream tasks.

迁移学习(TransferLearning),可以从已学习的相关任务中,通过转移知识来改进新任务的学习;Transfer Learning can improve the learning of new tasks by transferring knowledge from related tasks that have already been learned;

参数高效的迁移学习方法(Parameter-efficientTransferLearning,简称为PETL),可以指在预训练模型的基础之上,通过修改少部分参数或增加少量额外参数的调优训练方法;Parameter-efficient transfer learning (PETL) refers to a tuning training method that modifies a small number of parameters or adds a small number of additional parameters based on a pre-trained model.

内存高效的迁移学习方法(Memory-efficientTransferLearning,简称为METL),可以为在预训练模型的基础之上,使用相对少量的内存进行调优训练方法;Memory-efficient Transfer Learning (METL) is a method for fine-tuning training methods based on pre-trained models using relatively small amounts of memory.

深度学习模型(Transformer),可以为一种网络架构,可以由若干层编码器和解码器组成;The deep learning model (Transformer) can be a network architecture consisting of several layers of encoders and decoders;

视觉转换器(VisionTransformer),可以为一种网络架构,可以是Transformer架构在视觉领域上的迁移应用;Vision Transformer can be a network architecture, which can be a migration application of Transformer architecture in the visual field;

子模块(TransformerBlock),可以为一个Transformer的子模块,可以由多头注意力和前馈网络组成;Submodule (TransformerBlock), which can be a submodule of Transformer, can be composed of multi-head attention and feedforward network;

多头注意力(Multi-headAttention,简称为MHA),可以为一个Transformer中的子模块,可以为将查询(Query)、键(Key)、值(Value)进行相关向量计算的模块;Multi-head Attention (MHA) can be a submodule in a Transformer, which can be a module that calculates the related vectors of query, key, and value;

前馈网络(Feed-forwardNetwork,简称为FFN),可以为一个Transformer中的子模块,可以由多个全连接层和激活函数组成;Feed-forward Network (FFN) can be a submodule in a Transformer and can be composed of multiple fully connected layers and activation functions;

多层感知机(Multi-LayerPerceptron,简称为MLP),可以为一个网络模块,可 以由一个或多个隐藏层和激活函数组成;Multi-Layer Perceptron (MLP) can be a network module. It can be composed of one or more hidden layers and activation functions;

适配器(Adapter),可以为一种调优训练方法,可以作用在前馈网络中,可以包含由两个全连接层和一个激活函数组成的小模块;Adapter, which can be a tuning training method, can act on a feedforward network and can contain a small module consisting of two fully connected layers and an activation function;

提示调优方法(Prompt),可以为一种调优训练方法,可以指借助一个和输入相拼接的可学习参数进行调优训练;Prompt tuning method (Prompt) can be a tuning training method, which can refer to tuning training with the help of a learnable parameter spliced with the input;

前缀调优(Prefix),可以是借助和多头注意力层中的键(Key)和值(Value)相拼接的两个可学习参数进行调优训练的调优方法;Prefix tuning can be a tuning method that uses two learnable parameters concatenated with the key and value in the multi-head attention layer to perform tuning training;

判别任务(discriminativetask),可以指对输入数据进行判别操作的任务,可以为根据X判别Y,比如,图像分类任务;Discriminative tasks can refer to tasks that discriminate input data, such as discriminating Y based on X, for example, image classification tasks;

生成任务(generativetask),可以为对输入数据进行生成操作的任务,可以为根据X生成Y,比如,图像生成任务;Generative tasks can be tasks that generate operations on input data, such as generating Y based on X, for example, image generation tasks;

卷积神经网络(U-Net),可以为一种网络架构,可以由下采样的编码器、上采样的解码器和跳跃连接组成;Convolutional Neural Network (U-Net) can be a network architecture that can be composed of a downsampling encoder, an upsampling decoder, and skip connections;

深度学习文本到图像生成模型(StableDiffusion),可以为一种用于生成任务的网络结构,可以由自动编码器、文本编码器和U-Net结构组成。The deep learning text-to-image generation model (StableDiffusion) can be a network structure for generation tasks, which can be composed of an autoencoder, a text encoder, and a U-Net structure.

根据本公开实施例,提供了一种预训练模型中参数的更新方法,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present disclosure, a method for updating parameters in a pre-trained model is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.

考虑到大模型的模型参数量庞大,且移动终端的运算资源有限,本公开实施例提供的上述预训练模型中参数的更新方法可以应用于如图1所示的应用场景,但不仅限于此。图1是根据本公开实施例的一种预训练模型中参数的更新方法的应用场景的示意图,在如图1所示的应用场景中,大模型部署在服务器10中,服务器10可以通过局域网连接、广域网连接、因特网连接,或者其他类型的数据网络,连接一个或多个客户端设备20,此处的客户端设备20可以包括但不限于:智能手机、平板电脑、笔记本电脑、掌上电脑、个人计算机、智能家居设备、车载设备等。客户端设备20可以通过图形用户界面,比如,可以通过在图形用户界面输入分类条件等方式,与用户进行交互,以实现对大模型的调用,进而实现本公开实施例所提供的方法。Considering the huge amount of model parameters of the large model and the limited computing resources of the mobile terminal, the method for updating parameters in the above-mentioned pre-trained model provided in the embodiment of the present disclosure can be applied to the application scenario shown in Figure 1, but is not limited thereto. Figure 1 is a schematic diagram of an application scenario of a method for updating parameters in a pre-trained model according to an embodiment of the present disclosure. In the application scenario shown in Figure 1, the large model is deployed in a server 10, and the server 10 can be connected to one or more client devices 20 through a local area network connection, a wide area network connection, an Internet connection, or other types of data networks. The client device 20 here may include but is not limited to: a smart phone, a tablet computer, a laptop computer, a PDA, a personal computer, a smart home device, a vehicle-mounted device, etc. The client device 20 can interact with the user through a graphical user interface, for example, by inputting classification conditions in the graphical user interface, so as to realize the call of the large model, and then realize the method provided in the embodiment of the present disclosure.

在本公开实施例中,客户端设备和服务器构成的系统可以执行如下步骤:客户端设备获取待分类图像。服务器执行步骤S101,获取预训练模型中主干网络处理待分类图像,得到的特征数据;步骤S102,调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为基于至少一调优模块构建得到,调优模块为从主干网络中提取出;步骤S103,基于转换后的特征数据,确定待分类图像所属的类别。服务器10可以将待分类图像所属的类别传输至客户端中,并在客户端的图形用户界面中进行显示。需要 说明的是,在客户端设备的运行资源能够满足大模型的部署和运行条件的情况下,本公开实施例可以在客户端设备中进行。In the disclosed embodiment, the system composed of the client device and the server can execute the following steps: the client device obtains the image to be classified. The server executes step S101 to obtain the feature data obtained by processing the image to be classified by the backbone network in the pre-trained model; step S102, calling the target bypass network to convert the feature data, wherein the target bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the backbone network; step S103, based on the converted feature data, determining the category to which the image to be classified belongs. The server 10 can transmit the category to which the image to be classified belongs to the client and display it in the client's graphical user interface. Need It is noted that, when the operating resources of the client device can meet the deployment and operating conditions of the large model, the embodiments of the present disclosure can be performed in the client device.

在上述运行环境下,本公开在判别式任务的情况下,提供了如图2所示的预训练模型中参数的更新方法。图2是根据本公开实施例的一种预训练模型中参数的更新方法的流程图。如图2所示,该方法可以包括如下步骤。In the above operating environment, the present disclosure provides a method for updating parameters in a pre-trained model as shown in Figure 2 in the case of a discriminant task. Figure 2 is a flow chart of a method for updating parameters in a pre-trained model according to an embodiment of the present disclosure. As shown in Figure 2, the method may include the following steps.

步骤S202,获取预训练模型的主干网络输出的特征数据,其中,主干网络来自初始主干网络。Step S202, obtaining feature data output by a backbone network of the pre-trained model, wherein the backbone network comes from an initial backbone network.

在本公开上述步骤S202提供的技术方案中,可以从初始主干网络中获取主干网路。可以获取主干网络输出的特征数据。其中,预训练模型又可以称为预训练网络,可以为在具有代表性的数据集上进行大规模训练、调优后产出的模型,可以用于作为其他下游任务训练时候的初始化模型,可以用于加快训练速度或获得更好的效果。特征数据又可以称为输出特征,可以为主干网络对语音、图像等信息进行处理得到的数据,可以用x0、x1表示,此处仅为举例说明,不对特征数据的表示方式做具体限制。主干网络可以用于图像分类、目标检测和语义分割等任务,可以包含前馈网络模块、多头注意力模块等,可以为视觉任务处理网络(Vision Transformer,简称为ViT)、卷积神经网络(Convolutional Neural Network,简称为CNN)等,此处仅为举例说明,不对主干网络包含的内容和主干网络的类型做具体限制。特征数据可以为抽象的特征表示,可以为向量、像素级的标签、边界框、类别概率等,此处仅为举例说明,不对特征数据的类型做具体限制。In the technical solution provided in the above step S202 of the present disclosure, the backbone network can be obtained from the initial backbone network. Feature data output by the backbone network can be obtained. Among them, the pre-trained model can also be called a pre-trained network, which can be a model produced after large-scale training and tuning on a representative data set, which can be used as an initialization model when training other downstream tasks, and can be used to speed up the training speed or obtain better results. Feature data can also be called output features, which can be data obtained by the backbone network processing voice, image and other information, and can be represented by x 0 and x 1. This is only for example, and no specific restrictions are made on the representation of feature data. The backbone network can be used for tasks such as image classification, target detection and semantic segmentation, and can include feedforward network modules, multi-head attention modules, etc., and can be a visual task processing network (Vision Transformer, referred to as ViT), a convolutional neural network (Convolutional Neural Network, referred to as CNN), etc. This is only for example, and no specific restrictions are made on the content contained in the backbone network and the type of the backbone network. The feature data may be an abstract feature representation, which may be a vector, a pixel-level label, a bounding box, a category probability, etc. This is only an example and does not impose any specific restrictions on the type of feature data.

可选地,主干网络在前向传播过程中可以输出特征数据。Optionally, the backbone network can output feature data during the forward propagation process.

举例而言,以图像分类任务为例,主干网络可以为卷积神经网络,主干网络的输入可以为一张图像,经过一系列卷积层、池化层和激活函数等操作,得到了一组抽象的特征表示。For example, taking the image classification task as an example, the backbone network can be a convolutional neural network, and the input of the backbone network can be an image. After a series of convolutional layers, pooling layers, activation functions and other operations, a set of abstract feature representations is obtained.

步骤S204,调用初始旁路网络对特征数据进行转换,其中,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出。Step S204, calling the initial bypass network to convert the characteristic data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.

在本公开上述步骤S204提供的技术方案中,可以从主干网络中提取出至少一调优模块,对提取出的至少一调优模块进行构建,得到初始旁路网络。可以调用构建的初始旁路网络对特征数据进行转换。其中,调优模块又可以称为子模块,可以为从初始主干网络中进行提取得到的,可以用于提高预训练模型预测的准确性。初始旁路模型又可以称为调优框架(Residual Tuning,简称为Res-Tuning),可以用Bypass表示。In the technical solution provided in the above step S204 of the present disclosure, at least one tuning module can be extracted from the backbone network, and the extracted at least one tuning module can be constructed to obtain an initial bypass network. The constructed initial bypass network can be called to convert the feature data. Among them, the tuning module can also be called a sub-module, which can be extracted from the initial backbone network and can be used to improve the accuracy of the pre-trained model prediction. The initial bypass model can also be called a tuning framework (Residual Tuning, abbreviated as Res-Tuning), which can be represented by Bypass.

可选地,可以从初始主干网络中提取出调优模块,并利用调优模块构建旁路网络,得到独立的旁路网络和主干网络,可以将主干网络输出的特征数据作为初始旁路网络的输入,调用初始旁路网络对特征数据进行转换。Optionally, a tuning module can be extracted from the initial backbone network, and a bypass network can be constructed using the tuning module to obtain an independent bypass network and backbone network. The feature data output by the backbone network can be used as the input of the initial bypass network, and the initial bypass network can be called to convert the feature data.

步骤S206,基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络, 其中,在更新初始旁路网络的参数的过程中,初始旁路网络的数据流独立于主干网络的数据流,初始旁路网络的参数用于表征调优模块对特征数据的影响情况。Step S206: updating the parameters of the initial bypass network based on the converted characteristic data to obtain a target bypass network. In the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the characteristic data.

在本公开上述步骤S206提供的技术方案中,可以将转换后的特征数据输出给预训练模型中进行相应结果的输出,基于输出结果,对初始旁路网络的参数进行调整,得到目标旁路网络。其中,目标旁路网络可以为与主干网络独立的新调优网络。初始旁路网络的参数可以用于表征不同调优模块对特征数据的影响情况,可以为权重、偏置、正则化参数等,此处仅为举例说明,不对参数的类型做具体限制。输出结果可以为预训练模型识别到的图像、文字,语音等数据,此处仅为举例说明,不对输出结果的类型做具体限制。在训练过程中,初始旁路网络的数据流独立于主干网络的数据流,因此,初始旁路网络的数据流无法传输至主干网络中。In the technical solution provided in the above step S206 of the present disclosure, the converted feature data can be output to the pre-trained model to output the corresponding results, and based on the output results, the parameters of the initial bypass network are adjusted to obtain the target bypass network. Among them, the target bypass network can be a new tuning network independent of the backbone network. The parameters of the initial bypass network can be used to characterize the influence of different tuning modules on the feature data, and can be weights, biases, regularization parameters, etc., which are only examples here, and no specific restrictions are made on the types of parameters. The output results can be images, text, voice and other data recognized by the pre-trained model. This is only an example, and no specific restrictions are made on the types of output results. During the training process, the data flow of the initial bypass network is independent of the data flow of the backbone network. Therefore, the data flow of the initial bypass network cannot be transmitted to the backbone network.

由于初始主干网络通常是将调优模块嵌入在主干网路中,导致在每次训练的过程中,仍需要主干网络进行冗余的计算,比如,在反向传播的过程中,仍需要进一步计算主干网络的梯度,从而存在浪费内存和推理时开销的问题,进而导致预训练模型的训练过程消耗时间长的技术问题。为解决上述问题,在该实施例中,从初始主干网络中提取出调优模块,利用调优模块构建初始旁路模型,得到独立的初始旁路网络和主干网络,在初始旁路网络训练的过程中,禁止初始旁路网络的数据流传输至主干网络中,以独立训练初始旁路网络,得到目标旁路网络,从而达到减少模型的训练过程中的资源消耗且提高计算效率的技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。Since the initial backbone network usually embeds the tuning module in the backbone network, the backbone network still needs to perform redundant calculations during each training process. For example, during the back propagation process, the gradient of the backbone network still needs to be further calculated, which wastes memory and inference overhead, and further leads to the technical problem that the training process of the pre-trained model consumes a long time. To solve the above problem, in this embodiment, the tuning module is extracted from the initial backbone network, and the initial bypass model is constructed using the tuning module to obtain an independent initial bypass network and backbone network. During the training of the initial bypass network, the data stream of the initial bypass network is prohibited from being transmitted to the backbone network, so as to independently train the initial bypass network and obtain the target bypass network, thereby achieving the technical effect of reducing resource consumption in the training process of the model and improving computing efficiency, and solving the technical problems of high resource consumption and computing efficiency in the training process of the model.

可选地,在更新初始旁路网络的参数的过程中,可以禁止初始旁路网络的数据流传输至主干网络中,可以基于转换后的特征数据通过反向传播或随机梯度下降等方式,对初始旁路网络的参数进行调整,得到目标旁路网络。Optionally, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network can be prohibited from being transmitted to the backbone network, and the parameters of the initial bypass network can be adjusted based on the converted feature data through back propagation or stochastic gradient descent to obtain the target bypass network.

举例而言,可以从初始主干网络对调优模块进行提取,对提取得到的调优模块进行组合构建,得到初始旁路网络。可以将提取了调优模块的初始主干网络作为主干网络,获取主干网络输出的特征数据。可以调用初始旁路网络对特征数据进行转换,且将转换后的特征数据输出给预训练模型中进行相应结果的输出,基于输出结果,可以对初始旁路网络的参数进行调整,得到目标旁路网络。For example, the tuning modules can be extracted from the initial backbone network, and the extracted tuning modules can be combined and constructed to obtain the initial bypass network. The initial backbone network with the extracted tuning modules can be used as the backbone network to obtain the feature data output by the backbone network. The initial bypass network can be called to convert the feature data, and the converted feature data can be output to the pre-trained model to output the corresponding results. Based on the output results, the parameters of the initial bypass network can be adjusted to obtain the target bypass network.

在本公开实施例中,使用了从主干网络中抽离的调优模块构建初始旁路网络,同时,在初始旁路网络参数调整的过程中,将初始旁路网络到主干网络的数据流进行截断,得到与主干网络相对独立的目标旁路网络(也即,新调优模块),从而使得在初始旁路网络训练过程中,不需要进一步计算主干网络的参数梯度,实现了内存的节省和训练速度的提升,进而达到了减少模型的训练过程中的资源消耗且提高计算效率的技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。In the disclosed embodiment, a tuning module extracted from a backbone network is used to construct an initial bypass network. At the same time, during the process of adjusting the parameters of the initial bypass network, the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (i.e., a new tuning module) that is relatively independent of the backbone network. As a result, during the training of the initial bypass network, there is no need to further calculate the parameter gradient of the backbone network, thereby saving memory and improving the training speed, thereby achieving the technical effect of reducing resource consumption in the training process of the model and improving computing efficiency, and solving the technical problems of high resource consumption and computing efficiency in the training process of the model.

下面对该实施例的上述方法进行进一步的介绍。 The above method of this embodiment is further introduced below.

作为一种可选的实施方式,步骤S204,调用初始旁路网络对特征数据进行转换,包括:调用初始旁路网络中的调优模块对特征数据进行调整;基于初始旁路网络的权重,对调整后的特征数据进行加权求和。As an optional implementation, step S204, calling the initial bypass network to convert the feature data, includes: calling the tuning module in the initial bypass network to adjust the feature data; and performing weighted summation on the adjusted feature data based on the weight of the initial bypass network.

在该实施例中,可以调用初始旁路网络中的调优模块对特征数据进行调整,且可以基于初始旁路网络的权重,对调整后的特征数据进行加权求和。其中,初始网络旁路中的权重是不断学习的,可以用于表征不同调优模块所占比例的大小,又可以称为λ系数,可以处于不断学习的状态。In this embodiment, the tuning module in the initial bypass network can be called to adjust the feature data, and the adjusted feature data can be weighted summed based on the weight of the initial bypass network. The weight in the initial network bypass is continuously learned, and can be used to characterize the proportion of different tuning modules, and can also be called the λ coefficient, which can be in a state of continuous learning.

可选地,初始旁路网络在训练的过程中,在前向传播时,可以使用主干网络的中间输出(也即,特征数据)作为初始旁路网络中调优模块的输入,可以调用初始旁路网络中的调优模块对特征数据进行调整,且可以基于初始旁路网络的参数,对调整后的特征数据进行加权求和。Optionally, during the training process of the initial bypass network, during forward propagation, the intermediate output of the backbone network (i.e., feature data) can be used as the input of the tuning module in the initial bypass network, the tuning module in the initial bypass network can be called to adjust the feature data, and the adjusted feature data can be weighted summed based on the parameters of the initial bypass network.

在该实施例中,使用了主干网络和初始旁路网络进行同步组合训练的框架,初始旁路网络由从初始主干网络中抽离的调优模块组合构建得到。初始旁路网络在训练中的前向传播过程中,使用了主干网络输出的特征数据,在反向传播过程中则将初始旁路网络与主干网络断开,只经过初始旁路网络,从而不需要进一步计算主干网络的梯度,实现了内存的节省和训练速度的提升,进而达到了节省数据处理过程中占用内存的目的,得到与主干网络独立的新的目标旁路网络。In this embodiment, a framework for synchronous combined training of a backbone network and an initial bypass network is used, and the initial bypass network is constructed by combining tuning modules extracted from the initial backbone network. During the forward propagation process of the initial bypass network in training, the feature data output by the backbone network is used, and during the reverse propagation process, the initial bypass network is disconnected from the backbone network, and only the initial bypass network is passed through, so that there is no need to further calculate the gradient of the backbone network, thereby achieving memory saving and improving the training speed, thereby achieving the purpose of saving memory occupied during data processing, and obtaining a new target bypass network independent of the backbone network.

作为一种可选的实施方式,调用初始旁路网络中的调优模块对特征数据进行调整,包括:调用与主干网络中第一层主干网络对应的第一垂直调优模块,对主干网络中第零层主干网络输出的特征数据进行调整,且调用与第一垂直调优模块关联的第一水平调优模块对第一层主干网络输出的特征数据进行调整。As an optional implementation, the tuning module in the initial bypass network is called to adjust the characteristic data, including: calling the first vertical tuning module corresponding to the first layer of the backbone network in the backbone network to adjust the characteristic data output by the zeroth layer of the backbone network, and calling the first horizontal tuning module associated with the first vertical tuning module to adjust the characteristic data output by the first layer of the backbone network.

在该实施例中,初始旁路网络中的调优模块的数量和主干网络的层数之间存在对应关系。可以调用与主干网络中第一层主干网络对应的第一垂直调优模块,对主干网络中第零层主干网络输出的特征数据进行调整,且调用与第一垂直调优模块关联的第一水平调优模块对第一层主干网络输出的特征数据进行调整。其中,第一层主干网络可以为编码层(Decoder)、卷积层(block)等,此处仅为举例说明,不对第一层主干网络的类别做具体限制。第零层主干网可以为中间层网络(Middle)、卷积层等,此处仅为举例说明,不对第零层主干网络的类别做具体限制。第一垂直调优模块和第一层主干网络相对应,可以为垂直的调优模块,又可以称为垂直的Res-Tuner。第一水平调优模块和第一垂直调优模块相关联,可以为水平的调优模块,又可以称为水平的Res-Tuner。In this embodiment, there is a corresponding relationship between the number of tuning modules in the initial bypass network and the number of layers of the backbone network. The first vertical tuning module corresponding to the first layer of the backbone network in the backbone network can be called to adjust the feature data output by the zeroth layer of the backbone network in the backbone network, and the first horizontal tuning module associated with the first vertical tuning module is called to adjust the feature data output by the first layer of the backbone network. Among them, the first layer of the backbone network can be a coding layer (Decoder), a convolutional layer (block), etc., which is only an example here, and there is no specific restriction on the category of the first layer of the backbone network. The zeroth layer of the backbone network can be an intermediate layer network (Middle), a convolutional layer, etc., which is only an example here, and there is no specific restriction on the category of the zeroth layer of the backbone network. The first vertical tuning module corresponds to the first layer of the backbone network, and can be a vertical tuning module, which can also be called a vertical Res-Tuner. The first horizontal tuning module is associated with the first vertical tuning module, and can be a horizontal tuning module, which can also be called a horizontal Res-Tuner.

可选地,可以从初始旁路网络中提取出调优模块(Res-Tuner),通过将不同调优模块的进行组合设计,构建得到完整的初始旁路网络。可以调用初始旁路网络中与第一层主干网络对应的第一垂直调优模块,对主干网络的第零层主干网络输出的特征数 据进行调整,且调用与第一垂直调优模块关联的第一水平调优模块对第一层主干网络输出的特征数据进行调整。其中,第零层主干网络输出的特征数据可以用x0表示。第一层主干网络输出的特征数据可以用x1表示。Optionally, a tuning module (Res-Tuner) can be extracted from the initial bypass network, and a complete initial bypass network can be constructed by combining different tuning modules. The first vertical tuning module corresponding to the first layer of the backbone network in the initial bypass network can be called to perform feature number analysis on the output of the zeroth layer of the backbone network. The feature data output by the first backbone network is adjusted, and the first horizontal tuning module associated with the first vertical tuning module is called to adjust the feature data output by the first backbone network. The feature data output by the zeroth backbone network can be represented by x 0. The feature data output by the first backbone network can be represented by x 1 .

作为一种可选的实施方式,基于初始旁路网络的参数,对调整后的特征数据进行加权求和,包括:基于初始旁路网络的权重,对第一垂直调优模块对应的调整后的特征数据,以及第一水平调优模块对应的调整后的特征数据二者之间进行加权求和,得到第一特征数据。As an optional implementation, based on the parameters of the initial bypass network, the adjusted feature data is weightedly summed, including: based on the weight of the initial bypass network, the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module are weighted summed to obtain the first feature data.

在该实施例中,可以确定初始旁路网络的权重。获取第一垂直调优模块对应的调整后的特征数据和第一水平调优模块对应的调整后的特征数据。基于初始旁路网络权重,对第一垂直调优模块对应的调整后的特征数据和第一水平调优模块对应的调整后的特征数据二者之间,进行加权求和,得到加权求和后的第一特征数据。In this embodiment, the weight of the initial bypass network can be determined. The adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module are obtained. Based on the initial bypass network weight, the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module are weighted summed to obtain the first feature data after the weighted summation.

作为一种可选的实施方式,基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,包括:响应于第一层主干网络为主干网络中的最后一层网络,基于第一特征数据更新初始旁路网络的参数,得到目标旁路网络。As an optional implementation, updating the parameters of the initial bypass network based on the converted characteristic data to obtain the target bypass network includes: in response to the first layer backbone network being the last layer network in the backbone network, updating the parameters of the initial bypass network based on the first characteristic data to obtain the target bypass network.

在该实施例中,确定第一层主干网络是否为主干网络中最后一层网络,如果第一层主干网络为主干网络中最后一层网络,则可以将第一特征数据作为初始旁路网络的输出,并将第一特征数据作为反向传播的训练数据,对初始旁路网络的参数进行调整,得到目标旁路网络。In this embodiment, it is determined whether the first-layer backbone network is the last layer of the backbone network. If the first-layer backbone network is the last layer of the backbone network, the first feature data can be used as the output of the initial bypass network, and the first feature data can be used as the training data for back propagation to adjust the parameters of the initial bypass network to obtain the target bypass network.

可选地,预训练模型的整体结构可以分为两部分,一部分是主干模型,在训练过程中主干模型的参数被冻结;另一部分是可训练的初始旁路网络(又可以称为旁路结构),可以由若干个调优模块组成。主干网络每一层上均有一个水平的水平调优模块和垂直的垂直调优模块,分别用于接收来自主干网络中该层的中间特征数据和旁路网络中上一层的输出数据两者进行加权求和输出到下一层中,直到最后一层特征输出给预训练模型(比如,判别式任务网络)中进行相应结果的输出,得到输出结果,基于输出结果可以对初始旁路网络的参数进行更新,得到目标旁路网络。Optionally, the overall structure of the pre-trained model can be divided into two parts, one part is the backbone model, the parameters of which are frozen during the training process; the other part is the trainable initial bypass network (also referred to as the bypass structure), which can be composed of several tuning modules. Each layer of the backbone network has a horizontal tuning module and a vertical tuning module, which are respectively used to receive the intermediate feature data from the layer in the backbone network and the output data of the previous layer in the bypass network, and perform weighted summation and output to the next layer, until the last layer of features is output to the pre-trained model (for example, the discriminative task network) for the output of the corresponding results, and the output results are obtained. Based on the output results, the parameters of the initial bypass network can be updated to obtain the target bypass network.

举例而言,可以获取训练数据(比如,带标记的图像、语音等数据),获取主干网络对训练数据进行处理,输出的特征数据,可以调用初始旁路网络中与第一层主干网络对应的第一垂直调优模块对第零层主干网络输出的特征数据进行调整,且调用与第一垂直调优模块对应的第一水平调优模块对第一层主干网络输出的特征数据进行调整。基于初始旁路网络的权重,对第一垂直调优模块对应的调整后的特征数据和第一水平调优模块对应的调整后的特征数据二者之间进行加权求和,得到加权求和后的第一特征数据,确定第一层主干网络是否为主干网络中的最后一层网络,如果第一层主干网络为主干网络的最后一层网络,则可以将第一特征数据作为初始旁路网络的输出。可以将第一特征数据输入至预训练模型中的输出层(比如,Head),输出层对第一特征数 据进行转换,得到最终的训练结果,可以基于训练结果进行反向传播计算,以对初始旁路网络的参数进行更新,得到目标旁路网络。For example, training data (e.g., labeled images, speech, and other data) can be obtained, and the backbone network can be used to process the training data and output feature data. The first vertical tuning module corresponding to the first-layer backbone network in the initial bypass network can be called to adjust the feature data output by the zero-th layer backbone network, and the first horizontal tuning module corresponding to the first vertical tuning module can be called to adjust the feature data output by the first-layer backbone network. Based on the weight of the initial bypass network, a weighted sum is taken between the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module to obtain the first feature data after the weighted sum, and it is determined whether the first-layer backbone network is the last layer of the backbone network. If the first-layer backbone network is the last layer of the backbone network, the first feature data can be used as the output of the initial bypass network. The first feature data can be input into the output layer (e.g., Head) in the pre-trained model, and the output layer adjusts the first feature data. The data is converted to obtain the final training result, and back propagation calculation can be performed based on the training result to update the parameters of the initial bypass network to obtain the target bypass network.

作为一种可选的实施方式,基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,包括:响应于第一层主干网络非主干网络中的最后一层网络,调用与主干网络中第二层主干网络对应的第二垂直调优模块,对第一特征数据进行调整,且调用与第二垂直调优模块关联的第二水平调优模块,对主干网络中的第二层主干网络的特征数据进行调整;基于初始旁路网络的权重,对第二垂直调优模块对应的调整后的第一特征数据和第二水平调优模块对应的调整后的特征数据二者之间,进行加权求和,得到第二特征数据;响应于第二层主干网络非主干网络中的最后一层网络,执行以下步骤:将第二层主干网络确定为第一层主干网络,且将主干网络中的第三层主干网络确定为第二层主干网络;调用第二垂直调优模块,对第一特征数据进行调整,且调用第二水平调优模块,对第二层主干网络的特征数据进行调整;基于初始旁路网络的权重,对第二垂直调优模块对应的调整后的第一特征数据和第二水平调优模块对应的调整后的特征数据二者之间,进行加权求和,得到第二特征数据,直至第二层主干网络为主干网络中的最后一层网络;基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,包括:基于第二特征数据更新初始旁路网络的参数,得到目标旁路网络。As an optional implementation, updating the parameters of the initial bypass network based on the converted characteristic data to obtain the target bypass network includes: in response to the last layer of the network in the non-backbone network of the first layer backbone network, calling the second vertical tuning module corresponding to the second layer backbone network in the backbone network to adjust the first characteristic data, and calling the second horizontal tuning module associated with the second vertical tuning module to adjust the characteristic data of the second layer backbone network in the backbone network; based on the weight of the initial bypass network, performing weighted summation between the adjusted first characteristic data corresponding to the second vertical tuning module and the adjusted characteristic data corresponding to the second horizontal tuning module to obtain the second characteristic data; in response to the last layer of the network in the non-backbone network of the second layer backbone network, executing the following steps: The invention discloses a method for transmitting a bypass network to a third layer of a backbone network, wherein the first layer of the backbone network is determined as the second layer of the backbone network; the second vertical tuning module is called to adjust the first characteristic data, and the second horizontal tuning module is called to adjust the characteristic data of the second layer of the backbone network; based on the weight of the initial bypass network, the adjusted first characteristic data corresponding to the second vertical tuning module and the adjusted characteristic data corresponding to the second horizontal tuning module are weightedly summed to obtain the second characteristic data, until the second layer of the backbone network is the last layer of the backbone network; the parameters of the initial bypass network are updated based on the converted characteristic data to obtain the target bypass network, including: the parameters of the initial bypass network are updated based on the second characteristic data to obtain the target bypass network.

在该实施例中,对第一垂直调优模块对应的调整后的特征数据和第一水平调优模块对应的调整后的特征数据二者之间进行加权求和,得到加权求和后的第一特征数据,确定第一层主干网络是否为主干网络中的最后一层网络,如果第一层主干网络非主干网络的最后一层网络,则可以将第一特征数据作为第二层主干网络对应的第二垂直调优模块的输入数据,可以调用与主干网络中第二层主干网络对应的第二垂直调优模块,对第一特征数据进行调整,且调用与第二垂直调优模块关联的第二水平调优模块,对主干网络中的第二层主干网络的特征数据进行调整。可以基于初始旁路网络的权重,对第二垂直调优模块对应的调整后的第一特征数据和第二水平调优模块对应的调整后的特征数据二者之间,进行加权求和,得到第二特征数据。In this embodiment, a weighted sum is performed between the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module to obtain the first feature data after the weighted sum, and it is determined whether the first-layer backbone network is the last layer of the backbone network. If the first-layer backbone network is not the last layer of the backbone network, the first feature data can be used as input data of the second vertical tuning module corresponding to the second-layer backbone network, and the second vertical tuning module corresponding to the second-layer backbone network in the backbone network can be called to adjust the first feature data, and the second horizontal tuning module associated with the second vertical tuning module can be called to adjust the feature data of the second-layer backbone network in the backbone network. Based on the weight of the initial bypass network, a weighted sum can be performed between the adjusted first feature data corresponding to the second vertical tuning module and the adjusted feature data corresponding to the second horizontal tuning module to obtain the second feature data.

可选地,该实施例确定第二层主干网络是否为主干网络的最后一层网络,如果第二层主干网络是主干网络的最后一层网络,则可以将第二特征数据输出至预训练模型的输出层,输出层对第二特征数据进行转换,得到最终的输出结果,可以基于输出结果进行反向传播计算,以对初始旁路网络的参数进行更新,得到目标旁路网络。如果第二层主干网络不是主干网络的最后一层网络,则可以将第二层主干网络确定为第一层主干网络,且将主干网络中的第三层主干网络确定为第二层主干网络;调用第二垂直调优模块,对第一特征数据进行调整,且调用第二水平调优模块,对第二层主干网络的特征数据进行调整;基于初始旁路网络的权重,对第二垂直调优模块对应的调整 后的第一特征数据和第二水平调优模块对应的调整后的特征数据二者之间,进行加权求和,得到第二特征数据,直至第二层主干网络为主干网络中的最后一层网络,将得到的第二特征数据输出至预训练模型的输出层中,对输出层输出的结果进行反向传播计算,以更新初始旁路网络的参数,得到目标旁路网络。Optionally, this embodiment determines whether the second-layer backbone network is the last layer of the backbone network. If the second-layer backbone network is the last layer of the backbone network, the second feature data can be output to the output layer of the pre-trained model. The output layer converts the second feature data to obtain the final output result. Back-propagation calculation can be performed based on the output result to update the parameters of the initial bypass network to obtain the target bypass network. If the second-layer backbone network is not the last layer of the backbone network, the second-layer backbone network can be determined as the first-layer backbone network, and the third-layer backbone network in the backbone network can be determined as the second-layer backbone network; the second vertical tuning module is called to adjust the first feature data, and the second horizontal tuning module is called to adjust the feature data of the second-layer backbone network; based on the weight of the initial bypass network, the corresponding adjustment of the second vertical tuning module is performed. A weighted sum is performed between the first feature data after the training and the adjusted feature data corresponding to the second level tuning module to obtain the second feature data, until the second-layer backbone network is the last layer of the backbone network, and the obtained second feature data is output to the output layer of the pre-trained model. The result of the output layer is back-propagated to update the parameters of the initial bypass network to obtain the target bypass network.

可选地,可以基于以下公式确定旁路网络输出的第二特征数据:

Optionally, the second characteristic data output by the bypass network may be determined based on the following formula:

其中,可以用于表示第一层水平调优模块对应的调整后的特征数据,可以用于表征第l层的旁路网络的输出,Res-Tuner(xl)可以用于表征第l层水平调优模块的输出,Res-Tuner可以用于表征第l-1层垂直调优模块的输出,x0可以用于表征第零层主干网络输出的特征数据,xl可以用于表征第l层的输出特征,λ可以用于表征初始旁路网络的权重。in, It can be used to represent the adjusted feature data corresponding to the first-level horizontal tuning module. It can be used to characterize the output of the bypass network at layer l. Res-Tuner(x l ) can be used to characterize the output of the horizontal tuning module at layer l. It can be used to characterize the output of the vertical tuning module of the l-1th layer, x0 can be used to characterize the characteristic data of the output of the zeroth layer backbone network, xl can be used to characterize the output characteristics of the lth layer, and λ can be used to characterize the weight of the initial bypass network.

可选地,初始旁路网络的权重可以为可学习的权重系数,可以在初始旁路网络训练的过程中,不断变化。Optionally, the weight of the initial bypass network may be a learnable weight coefficient, which may be continuously changed during the training of the initial bypass network.

可选地,每一个相应于主干网络的网络层中,初始旁路网络分别使用了水平调优模块和垂直调优模块,分别用于处理来自主干网络的第l层输出的特征数据和旁路网络l-1层的第一特征数据,可以通过权重对两者的特征数据进行加权求和。Optionally, in each network layer corresponding to the backbone network, the initial bypass network uses a horizontal tuning module and a vertical tuning module, respectively, to process the feature data output from the lth layer of the backbone network and the first feature data of the l-1 layer of the bypass network, and the feature data of the two can be weighted and summed using weights.

可选地,最初输入到整个模块中的输入称之为第0层。Optionally, the initial input into the entire module is called layer 0.

作为一种可选的实施方式,步骤S206,基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,包括:确定预训练模型中输出层对转换后的特征数据进行处理,得到的处理结果;确定处理结果和处理结果对应的真实处理结果二者之间的差异值;基于差异值调整初始旁路网络的参数,得到目标旁路网络。As an optional implementation, step S206, updating the parameters of the initial bypass network based on the converted feature data to obtain the target bypass network, includes: determining the processing result obtained by the output layer in the pre-trained model processing the converted feature data; determining the difference between the processing result and the actual processing result corresponding to the processing result; adjusting the parameters of the initial bypass network based on the difference value to obtain the target bypass network.

在该实施例中,获取初始旁路网络转换后的特征数据,可以通过预训练网络中的输出层对特征数据进行处理,得到处理结果,可以确定处理结果和处理结果对应的真实处理结果二者之间的差异值,基于二者之间的差异值可以对初始旁路网络的参数进行调整,得到目标旁路网络。其中,处理结果可以为图像、文本、序列数据、分类情况等,处理结果的类别可以根据预训练模型的使用场景灵活变化,此处仅为举例,不对处理结果的类别做具体限制。差异值可以用于表征处理结果和处理结果对应的真实处理结果二者之间的差异,可以用于确定损失函数。In this embodiment, feature data after conversion of the initial bypass network is obtained, and the feature data can be processed through the output layer in the pre-trained network to obtain a processing result. The difference value between the processing result and the real processing result corresponding to the processing result can be determined. Based on the difference value between the two, the parameters of the initial bypass network can be adjusted to obtain the target bypass network. Among them, the processing result can be an image, text, sequence data, classification situation, etc., and the category of the processing result can be flexibly changed according to the use scenario of the pre-trained model. This is only an example, and no specific limitation is made to the category of the processing result. The difference value can be used to characterize the difference between the processing result and the real processing result corresponding to the processing result, and can be used to determine the loss function.

可选地,该实施例获取初始旁路网络转换后的特征数据,可以通过预训练网络中的输出层对特征数据进行处理,得到处理结果,可以计算处理结果和与处理结果对应的真实处理结果二者之间的损失函数,基于损失函数计算初始旁路网络的参数的梯度,从而确定参数的调整方向和大小,以得到更新后的目标旁路网络。 Optionally, this embodiment obtains feature data after conversion of the initial bypass network, and can process the feature data through the output layer in the pre-trained network to obtain a processing result. The loss function between the processing result and the actual processing result corresponding to the processing result can be calculated, and the gradient of the parameters of the initial bypass network is calculated based on the loss function, thereby determining the adjustment direction and size of the parameters to obtain an updated target bypass network.

举例而言,为了使得模型能够更好地拟合训练数据,可以基于输出层的处理结果通过反向传播,可以根据处理结果和真实处理结果确定损失函数的变化情况,基于损失函数的变化情况对初始旁路网络的参数进行调整,使得预训练模型的处理结果与真实处理结果更加接近。For example, in order to make the model better fit the training data, back propagation can be performed based on the processing results of the output layer, and the change of the loss function can be determined according to the processing results and the actual processing results. The parameters of the initial bypass network can be adjusted based on the change of the loss function, so that the processing results of the pre-trained model are closer to the actual processing results.

由于大模型中参数数量庞大,直接通过数值方法计算梯度非常耗时,在该实施例中,通过反向传播可以高效地计算梯度,加快了初始旁路网络的训练速度。此外,反向传播还使得初始旁路网络能够学习到数据的特征和规律,通过更新初始旁路网络参数,不断优化预训练模型的性能,提高预训练模型的准确度和泛化能力。Due to the large number of parameters in the large model, it is very time-consuming to directly calculate the gradient through numerical methods. In this embodiment, the gradient can be efficiently calculated through back propagation, which speeds up the training of the initial bypass network. In addition, back propagation also enables the initial bypass network to learn the characteristics and laws of the data, and continuously optimizes the performance of the pre-trained model by updating the parameters of the initial bypass network, thereby improving the accuracy and generalization ability of the pre-trained model.

作为一种可选的实施方式,主干网络和至少一调优模块之间基于残差进行连接。As an optional implementation, the backbone network and at least one tuning module are connected based on residuals.

在该实施例中,初始旁路网络和主干网络之间可以通过残差连接的方式进行连接,初始旁路网络至少可以包含一个水平调优模块或垂直调优模块,因此,主干网络和至少一调优模块之间可以基于残差进行连接。需要说明的是,此处不对初始旁路网络中调优模块的数量做具体限制。In this embodiment, the initial bypass network and the backbone network can be connected by residual connection, and the initial bypass network can include at least one horizontal tuning module or a vertical tuning module, so the backbone network and at least one tuning module can be connected based on residual. It should be noted that there is no specific restriction on the number of tuning modules in the initial bypass network.

可选地,初始旁路网络中的调优模块之间可以基于残差连接的方式进行连接,并并行作用在主干网络中。Optionally, the tuning modules in the initial bypass network can be connected based on residual connections and act in parallel in the backbone network.

由于将调优模块嵌入在主干网络构建的初始主干网络在调优模块反向传播训练的过程中,需要主干网络也对训练数据进行处理,存在训练过程中,占用内存的问题。在该实施例中,从初始主干网络中提取出调优模块,且调优模块之间通过残差连接,构建得到初始旁路网络,且初始旁路网络可以以并联的方式与主干网络进行连接,在反向传播的过程中,禁止将数据流从初始旁路网络传输至主干网络中,从而达到了减少内存占用的效果。Since the initial backbone network constructed by embedding the tuning module in the backbone network needs to process the training data during the back propagation training of the tuning module, there is a problem of occupying memory during the training process. In this embodiment, the tuning module is extracted from the initial backbone network, and the tuning modules are connected through residual connections to construct an initial bypass network, and the initial bypass network can be connected to the backbone network in parallel. During the back propagation process, it is prohibited to transmit the data stream from the initial bypass network to the backbone network, thereby achieving the effect of reducing memory usage.

需要说明的是,上述残差连接可以根据实际情况进行改变,比如,还可以为全连接、跳跃连接等连接方式,单个调优模块内部也可以进行组合或更换不同的单元,此处不对调优模块之间的连接方式和调优模块的组成部件做具体限制。It should be noted that the above-mentioned residual connection can be changed according to actual conditions. For example, it can also be a full connection, a skip connection, etc. Different units can also be combined or replaced inside a single tuning module. No specific restrictions are made here on the connection method between the tuning modules and the components of the tuning modules.

作为一种可选的实施方式,在将特征数据传输至初始旁路网络的过程中,主干网络的参数的状态为锁定状态。As an optional implementation, during the process of transmitting the characteristic data to the initial bypass network, the state of the parameters of the backbone network is a locked state.

在该实施例中,在模型训练的过程中,主干网络的参数的状态为锁定状态,即,主干网络的参数不进行更新。其中,锁定状态又可以称为冻结状态,可以用于表征主干网络的参数不发生改变。In this embodiment, during the model training process, the state of the parameters of the backbone network is a locked state, that is, the parameters of the backbone network are not updated. The locked state can also be called a frozen state, which can be used to characterize that the parameters of the backbone network do not change.

在该实施例中,主干网络保持参数冻结,同时,在反向传播的过程中数据不再流向主干网络,从而达到了节省内存的目的。In this embodiment, the backbone network keeps parameters frozen, and at the same time, data no longer flows to the backbone network during the back propagation process, thereby achieving the purpose of saving memory.

在本公开实施例中,使用了从主干网络中抽离的调优模块构建初始旁路网络,同时,在初始旁路网络参数调整的过程中,将初始旁路网络到主干网络的数据流进行截断,得到与主干网络相对独立的目标旁路网络(新调优模块),从而使得在初始旁路网 络训练过程中,不需要进一步计算主干网络的参数梯度,实现了内存的节省和训练速度的提升,进而达到了减少模型的训练过程中的资源消耗的技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。In the embodiment of the present disclosure, a tuning module extracted from the backbone network is used to construct an initial bypass network. At the same time, during the process of adjusting the parameters of the initial bypass network, the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (new tuning module) relatively independent of the backbone network, so that the initial bypass network During the network training process, there is no need to further calculate the parameter gradients of the backbone network, which saves memory and improves the training speed, thereby achieving the technical effect of reducing resource consumption during the model training process and solving the technical problems of high resource consumption and low computing efficiency during the model training process.

本公开实施例针对人工智能生成内容(Artificial Intelligence Generated Content,简称为AIGC)系统,还提供了一种预训练模型的数据处理方法。图3是根据本公开实施例的一种预训练模型的数据处理方法的流程图,如图3所示,该方法可以包括以下步骤。The disclosed embodiment is directed to an artificial intelligence generated content (AIGC) system and also provides a data processing method for a pre-trained model. FIG3 is a flow chart of a data processing method for a pre-trained model according to an embodiment of the disclosed embodiment. As shown in FIG3, the method may include the following steps.

步骤S302,响应生成式交互界面中接收到的询问信息,其中,询问信息至少包括:文本生成信息的关键词。Step S302, responding to the query information received in the generative interactive interface, wherein the query information at least includes: keywords of the text generation information.

在本公开上述步骤S302提供的技术方案中,当生成式交互界面接收到询问信息时,可以响应生成式交互界面中接收到的询问信息。其中,生成式交互界面可以为一种利用自然语音生成模型生成对话的界面,可以为移动终端、客户端的操作界面等,此处仅为举例说明,不对生成式交互界面的类型做具体限制。询问信息至少可以包括:文本生成信息的关键字。询问信息可以为客户端发出的指令,可以为在对话或者交互场景中,提出问题以获取指导行为的指令等。In the technical solution provided in the above step S302 of the present disclosure, when the generative interaction interface receives the inquiry information, the inquiry information received in the generative interaction interface can be responded to. Among them, the generative interaction interface can be an interface that generates a dialogue using a natural speech generation model, and can be an operation interface of a mobile terminal, a client, etc. This is only an example, and there is no specific restriction on the type of generative interaction interface. The inquiry information may at least include: keywords of the text generation information. The inquiry information may be an instruction issued by the client, and may be an instruction to ask questions in a dialogue or interaction scenario to obtain guidance behavior, etc.

可选地,用户可以在生成式交互界面中输入问题或请求“您好,我想了解一下如何退订我的服务?”,询问信息中包含文本生成信息的关键词“如何”。可以响应生成对话界面中接收到的询问信息。Optionally, the user may enter a question or request in the generative interactive interface, "Hello, I would like to know how to unsubscribe from my service?", and the query information may include the keyword "how" of the text generation information. The query information received in the generated dialogue interface may be responded to.

步骤S304,调用预训练模型的主干网络至少对文本生成信息进行分析,输出文本特征数据,其中,主干网络来自初始主干网络。Step S304, calling the backbone network of the pre-trained model to at least analyze the text generation information and output text feature data, wherein the backbone network comes from the initial backbone network.

在本公开上述步骤S304提供的技术方案中,响应生成式交互界面中接收到的询问信息,可以调用预训练模型的主干网络至少对文本生成信息进行分析,输出文本特征数据。其中,主干网络可以来自初始主干网络。文本生成信息可以用于确定询问信息提出的问题。文本特征数据可以用于表征文本内容,比如,可以为向量、纹理特征等,此处仅为举例说明,不对特征数据的类型做具体限制。In the technical solution provided in the above step S304 of the present disclosure, in response to the query information received in the generative interactive interface, the backbone network of the pre-trained model can be called to at least analyze the text generation information and output text feature data. Among them, the backbone network can come from the initial backbone network. The text generation information can be used to determine the question raised by the query information. The text feature data can be used to characterize the text content, for example, it can be a vector, texture feature, etc. This is only for example, and no specific limitation is made on the type of feature data.

步骤S306,调用目标旁路网络对文本特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取得到。Step S306, calling a target bypass network to convert the text feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.

在本公开上述步骤S306提供的技术方案中,可以调用目标旁路网络对文本特征数据进行转换。In the technical solution provided in the above step S306 of the present disclosure, the target bypass network can be called to convert the text feature data.

步骤S308,基于转换后的文本特征数据,生成至少一与询问信息匹配的答复结果。Step S308: generating at least one response result matching the query information based on the converted text feature data.

在本公开上述步骤S308提供的技术方案中,预训练模型可以对转换后的文本特征数据进行处理,以生成与询问信息匹配的答复结果。In the technical solution provided in the above step S308 of the present disclosure, the pre-trained model can process the converted text feature data to generate a response result that matches the inquiry information.

步骤S310,将答复结果显示在生成式交互界面中。 Step S310, displaying the reply result in the generative interactive interface.

在本公开上述步骤S310提供的技术方案中,可以将答复结果显示在生成式交互界面中。其中,答复结果可以为图像、文字、语音等,此处仅为举例说明,不对答复结果的内容做具体限制。In the technical solution provided in the above step S310 of the present disclosure, the reply result can be displayed in the generative interactive interface. The reply result can be an image, text, voice, etc., which is only an example and does not specifically limit the content of the reply result.

举例而言,生成式交互界面中接收到“请问意大利面的食谱是什么”的询问信息,响应生成式交互界面接受到的询问信息,可以调用预训练模型中的主干网络至少对询问信息中的文本生成信息“意大利面的食谱”进行分析,输出文本特征数据,且调用目标旁路网络对文本特征数据进行转换,基于转换后的文本特征数据,生成至少一与询问信息匹配的答复结果,可以将答复结果显示在生成式交互界面中。For example, a generative interactive interface receives a query message such as “What is the recipe for pasta?” In response to the query message received by the generative interactive interface, the backbone network in the pre-trained model can be called to at least analyze the text generation information “recipe for pasta” in the query message, output text feature data, and call the target bypass network to convert the text feature data. Based on the converted text feature data, at least one reply result matching the query message is generated, and the reply result can be displayed in the generative interactive interface.

作为一种可选的实施方式,该方法还包括:在更新初始旁路网络的参数的过程中,初始旁路网络的数据流独立于主干网络的数据流,初始旁路网络的参数用于表征调优模块对文本特征数据的影响情况。As an optional implementation, the method also includes: in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the impact of the tuning module on the text feature data.

在该实施例中,在更新初始旁路网络的参数的过程中,初始旁路网络的数据流独立于主干网络的数据流,且初始旁路网络中的参数可以用于表征调优模块对文本特征数据的影响情况。In this embodiment, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters in the initial bypass network can be used to characterize the influence of the tuning module on the text feature data.

可选地,主干网络保持参数冻结,初始旁路网络使用了从主干抽离的调优模块,同时将调优模块到主干网络的数据流进行截断,形成与主干网络相对独立的初始旁路网络。Optionally, the backbone network keeps parameters frozen, and the initial bypass network uses a tuning module extracted from the backbone network, while cutting off the data flow from the tuning module to the backbone network to form an initial bypass network that is relatively independent of the backbone network.

在该实施例中,响应生成式交互界面中接收到的询问信息,其中,询问信息至少包括:文本生成信息的关键词;调用预训练模型的主干网络至少对文本生成信息进行分析,输出文本特征数据,其中,主干网络来自初始主干网络;调用目标旁路网络对文本特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取得到;基于转换后的文本特征数据,生成至少一与询问信息匹配的答复结果;将答复结果显示在生成式交互界面中,从而达到了提高生成式对话产品中获取反馈结果的效率的技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。In this embodiment, a query information received in a generative interactive interface is responded to, wherein the query information includes at least: keywords of text generation information; a backbone network of a pre-trained model is called to at least analyze the text generation information and output text feature data, wherein the backbone network comes from an initial backbone network; a target bypass network is called to convert the text feature data, wherein the target bypass network is obtained by updating the parameters of an initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted text feature data, at least one reply result matching the query information is generated; and the reply result is displayed in the generative interactive interface, thereby achieving the technical effect of improving the efficiency of obtaining feedback results in a generative dialogue product and solving the technical problems of high resource consumption and computational efficiency in the training process of the model.

作为一种可选的实施例,该实施例在判别式任务的应用场景下,可以从初始主干网络中提取调优模块,得到调优模块和主干网路。可以将提取出的调优模块进行组合,得到初始旁路网络,可以对初始旁路网络中的参数进行更新,得到目标旁路网络。在判别式任务的场景中,可以获取预训练模型中主干网络的对待分类图像进行处理,得到的特征数据。调用训练好的目标旁路网络对特征数据进行转换。可以将目标旁路模型转换后的特征数据输入至输出层中,输出层对特征数据进行处理,得到待分类图像所属的类别。As an optional embodiment, in the application scenario of discriminative tasks, this embodiment can extract the tuning module from the initial backbone network to obtain the tuning module and the backbone network. The extracted tuning modules can be combined to obtain the initial bypass network, and the parameters in the initial bypass network can be updated to obtain the target bypass network. In the scenario of discriminative tasks, the backbone network in the pre-trained model can be obtained to process the image to be classified and obtain feature data. The trained target bypass network is called to convert the feature data. The feature data converted by the target bypass model can be input into the output layer, and the output layer processes the feature data to obtain the category to which the image to be classified belongs.

可选地,在判别式任务中,预训练模型(比如,判别式任务网络)的整体结构可以分为两部分。一部分为主干网络,在训练过程中对主干网络进行冻结。另一部分为 可训练的初始旁路网络(又可以称为旁路结构),初始旁路网络可以由若干个调优模块组成,其中,每一个相应主干网络的网络层上均有一个水平调优模块和一个垂直调优模块,分别用于接收来自主干网络中该层输出的特征数据和初始旁路网络中上一层的输出,两者进行加权求和,输出到下一层中。Optionally, in a discriminative task, the overall structure of the pre-trained model (e.g., discriminative task network) can be divided into two parts. One part is the backbone network, which is frozen during the training process. The other part is The trainable initial bypass network (also called bypass structure) can be composed of several tuning modules, where each network layer of the corresponding backbone network has a horizontal tuning module and a vertical tuning module, which are respectively used to receive the feature data output from the layer in the backbone network and the output of the previous layer in the initial bypass network, and the two are weightedly summed and output to the next layer.

可选地,直到最后一层的转换后的特征数据输出给预训练模型中的输出层进行相应结果的输出,可以基于相应结果的输出进行反向传播过程,且在反向传播的过程中,会断开从旁路网络到主干网络的反向传播路径,以得到训练好的目标旁路网格。当获取到待分类图像时,可以获取主干模型对待分类图像进行处理,得到的特征数据。可以调用目标旁路网络对特征数据进行转换,且将转换后的特征数据输出至预训练模型中的输出层,以确定待分类图像所属的类别。Optionally, until the converted feature data of the last layer is output to the output layer in the pre-trained model to output the corresponding result, a back propagation process can be performed based on the output of the corresponding result, and during the back propagation process, the back propagation path from the bypass network to the backbone network will be disconnected to obtain the trained target bypass grid. When the image to be classified is obtained, the backbone model can be obtained to process the image to be classified and obtain the feature data. The target bypass network can be called to convert the feature data, and the converted feature data can be output to the output layer in the pre-trained model to determine the category to which the image to be classified belongs.

举例而言,给定一张待分类图像,可以通过预训练模型中的主干网络确定待分类图像的特征数据,可以调用目标旁路网络对特征数据进行转换,并基于转换后的特征数据,确定待分类图像的类别为包含小猫的图像。For example, given an image to be classified, the feature data of the image to be classified can be determined through the backbone network in the pre-trained model, the target bypass network can be called to convert the feature data, and based on the converted feature data, the category of the image to be classified is determined to be an image containing kittens.

作为一种可选的实施方式,步骤S404,调用目标旁路网络对文本特征数据进行转换,包括:调用与主干网络中第一解码层对应的第一垂直调优模块,对主干网络的中间层输出的文本特征数据进行转换,且调用与第一垂直调优模块关联的第一水平调优模块对第一解码层输出的文本特征数据进行转换。As an optional implementation, step S404, calling the target bypass network to convert the text feature data, includes: calling the first vertical tuning module corresponding to the first decoding layer in the backbone network to convert the text feature data output by the middle layer of the backbone network, and calling the first horizontal tuning module associated with the first vertical tuning module to convert the text feature data output by the first decoding layer.

在该实施例中,可以调用与主干网络中第一解码层对应的第一垂直调优模块,对主干网络的中间层输出的文本特征数据进行转换,且调用与第一垂直调优模块关联的第一水平调优模块对第一解码层输出的文本特征数据进行转换。In this embodiment, the first vertical tuning module corresponding to the first decoding layer in the backbone network can be called to convert the text feature data output by the middle layer of the backbone network, and the first horizontal tuning module associated with the first vertical tuning module can be called to convert the text feature data output by the first decoding layer.

可选地,在生成式任务(比如,图像生成)时,预训练模型的整体结构可以包括在训练过程中进行参数冻结的主干网络和更新好参数的目标旁路网络,其中,主干网络可以为基于稳定扩散(StableDiffusion)的深度学习模型(U-Net)结构。目标旁路网络可以为可训练的旁路结构,可以由若干个调优模块组成。Optionally, in a generative task (e.g., image generation), the overall structure of the pre-trained model may include a backbone network whose parameters are frozen during training and a target bypass network whose parameters are updated, wherein the backbone network may be a deep learning model (U-Net) structure based on stable diffusion (StableDiffusion). The target bypass network may be a trainable bypass structure, which may be composed of several tuning modules.

举例而言,主干网络可以包含三个编码层、一个中间层和四个解码层。每个解码层上均对应一个水平调优模块和垂直调优模块,水平调优模块和垂直调优模块可以用于接收来自中间层或解码层的输出的文本特征数据和目标旁路网络中上一层转换的文本特征数据。与第一解码层对应的第一垂直调优模块可以用于接收主干网络的中间层的文本特征数据,且对中间层的文本特征数据进行转换。与第一垂直调优模块关联的第一水平调优模块可以获取第一解码层输出的文本特征数据,且对第一解码层输出的文本特征数据进行转换。For example, the backbone network may include three encoding layers, one middle layer and four decoding layers. Each decoding layer corresponds to a horizontal tuning module and a vertical tuning module, and the horizontal tuning module and the vertical tuning module can be used to receive text feature data output from the middle layer or the decoding layer and the text feature data converted from the previous layer in the target bypass network. The first vertical tuning module corresponding to the first decoding layer can be used to receive the text feature data of the middle layer of the backbone network and convert the text feature data of the middle layer. The first horizontal tuning module associated with the first vertical tuning module can obtain the text feature data output by the first decoding layer and convert the text feature data output by the first decoding layer.

作为一种可选的实施方式,基于转换后的文本特征数据,生成至少一与询问指令匹配的答复结果,包括:基于目标旁路网络的权重,对第一垂直调优模块调整后的文本特征数据和第一水平调优模块调整后的文本特征数据二者之间进行加权求和,得到 转换后的文本特征数据;基于转换后的文本特征数据,确定答复结果。As an optional implementation, based on the converted text feature data, generating at least one reply result matching the query instruction includes: based on the weight of the target bypass network, performing a weighted summation between the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module to obtain The converted text feature data; based on the converted text feature data, determining a response result.

在该实施例中,可以基于目标旁路网络的权重,对第一垂直调优模块调整后的文本特征数据和第一水平调优模块调整后的文本特征数据二者之间进行加权求和,得到转换后的文本特征数据,基于转换后的文本特征数据,可以确定答复结果。In this embodiment, based on the weight of the target bypass network, a weighted sum can be performed on the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module to obtain converted text feature data, and a response result can be determined based on the converted text feature data.

可选地,每一个相应主干网络的解码层上均对应一个水平调优模块和一个垂直调优模块。水平调优模块和垂直调优模块,可以分别接收来自主干网络中中间层或解码层输出的文本特征数据和目标旁路网络中上一层输出的文本特征数据,可以对两者进行加权求和并输出到主干网络下一层对应的垂直调优模块中,直到主干网络最后一解码层对应的水平调优模块和垂直调优模块的加权求和结果输出给生成式任务网络中进行相应结果的输出,以得到答复结果。Optionally, each corresponding decoding layer of the backbone network corresponds to a horizontal tuning module and a vertical tuning module. The horizontal tuning module and the vertical tuning module can respectively receive the text feature data output from the middle layer or the decoding layer in the backbone network and the text feature data output from the previous layer in the target bypass network, and can perform weighted summation on the two and output them to the vertical tuning module corresponding to the next layer of the backbone network, until the weighted summation result of the horizontal tuning module and the vertical tuning module corresponding to the last decoding layer of the backbone network is output to the generative task network for outputting the corresponding result, so as to obtain the response result.

举例而言,主干网络可以包含三个编码层、一个中间层和三个解码层。每个解码层上均对应一个水平调优模块和垂直调优模块,水平调优模块和垂直调优模块可以用于接收来自中间层或解码层的输出的文本特征数据和目标旁路网络中上一层转换的文本特征数据。与第一解码层对应的第一垂直调优模块可以用于接收主干网络的中间层的文本特征数据,且对中间层的文本特征数据进行转换。与第一垂直调优模块关联的第一水平调优模块可以获取第一解码层输出的文本特征数据,且对第一解码层输出的文本特征数据进行转换。可以基于目标旁路网络的权重,对第一垂直调优模块调整后的文本特征数据和第一水平调优模块调整后的文本特征数据二者之间进行加权求和。可以将对第一垂直调优模块调整后的文本特征数据和第一水平调优模块调整后的文本特征数据二者之间进行加权求和得到的文本特征数据输入至与第二解码层对应的第二垂直调优模块中,得到第二垂直调优模块转换后的文本特征数据,且将第二解码层输出的文本特征数据输出至第二水平调优模块中,得到对应的第二水平调优模块调整后的文本特征数据,可以对第二水平调优模块调整后的文本特征数据和第二垂直调优模块调整后的文本特征数据进行加权求和。继续将对第二水平调优模块调整后的文本特征数据和第二垂直调优模块调整后的文本特征数据进行加权求和得到的文本特征数据传输至与第三解码层对应的第三垂直调优模块中,且将第三解码层输出的文本特征数据传输至与第三解码层对应的第三水平调优模块中,可以对第三水平调优模块调整后的文本特征数据和第三垂直调优模块调整后的文本特征数据进行加权求和,从而得到目标旁路网络最终转换得到的文本特征数据。可以基于预训练模型中的输出层对文本特征数据进行处理,得到答复结果。For example, the backbone network may include three encoding layers, one intermediate layer and three decoding layers. Each decoding layer corresponds to a horizontal tuning module and a vertical tuning module, which can be used to receive text feature data output from the intermediate layer or the decoding layer and text feature data converted from the previous layer in the target bypass network. The first vertical tuning module corresponding to the first decoding layer can be used to receive the text feature data of the intermediate layer of the backbone network and convert the text feature data of the intermediate layer. The first horizontal tuning module associated with the first vertical tuning module can obtain the text feature data output by the first decoding layer and convert the text feature data output by the first decoding layer. Based on the weight of the target bypass network, a weighted sum can be performed between the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module. The text feature data obtained by weighted summing the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module can be input into the second vertical tuning module corresponding to the second decoding layer to obtain the text feature data converted by the second vertical tuning module, and the text feature data output by the second decoding layer is output to the second horizontal tuning module to obtain the corresponding text feature data adjusted by the second horizontal tuning module, and the text feature data adjusted by the second horizontal tuning module and the text feature data adjusted by the second vertical tuning module can be weighted summed. The text feature data obtained by weighted summing the text feature data adjusted by the second horizontal tuning module and the text feature data adjusted by the second vertical tuning module are continuously transmitted to the third vertical tuning module corresponding to the third decoding layer, and the text feature data output by the third decoding layer is transmitted to the third horizontal tuning module corresponding to the third decoding layer, and the text feature data adjusted by the third horizontal tuning module and the text feature data adjusted by the third vertical tuning module can be weighted summed to obtain the text feature data finally converted by the target bypass network. The text feature data can be processed based on the output layer in the pre-trained model to obtain the response result.

作为一种可选的实施方式,步骤S406,基于转换后的文本特征数据,确定答复结果,包括:将转换后的文本特征数据作为预训练模型中最后一个解码层的输出数据;将输出数据转换为答复结果。As an optional implementation, step S406, based on the converted text feature data, determines the reply result, including: using the converted text feature data as the output data of the last decoding layer in the pre-trained model; converting the output data into the reply result.

在该实施例中,可以将目标旁路模型作为预训练模型中最后一个编码层,则可以 将转换后的文本特征数据作为预训练模型中最后一个解码层的输出数据,基于输出数据,输出层可以对输出数据进行转换,得到答复结果。In this embodiment, the target bypass model can be used as the last encoding layer in the pre-trained model. The converted text feature data is used as the output data of the last decoding layer in the pre-trained model. Based on the output data, the output layer can convert the output data to obtain the response result.

可选地,该实施例中目标旁路网络同样也可以作为预训练模型(比如,可控生成式任务)中的一个分支,即,可以为条件输入的编码模块。其中,可控生成式任务可以为依据可控的条件数据,进行生成的任务。Optionally, the target bypass network in this embodiment can also be used as a branch in a pre-trained model (e.g., a controllable generative task), that is, a coding module for conditional input. The controllable generative task can be a task generated based on controllable conditional data.

该实施例中,可控编码部分同样可以使用目标旁路网络作为分支,以达到减少内存消耗的目的。In this embodiment, the controllable coding part can also use the target bypass network as a branch to achieve the purpose of reducing memory consumption.

在该实施例中,获取预训练模型中主干网络处理待分类图像,得到的特征数据,其中,主干网络来自初始主干网络;调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,确定待分类图像所属的类别,从而达到了减少模型的训练过程中的资源消耗的技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。In this embodiment, feature data obtained by processing an image to be classified by a backbone network in a pre-trained model is obtained, wherein the backbone network comes from an initial backbone network; a target bypass network is called to convert the feature data, wherein the target bypass network is obtained by updating parameters of an initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, the category to which the image to be classified belongs is determined, thereby achieving a technical effect of reducing resource consumption in the training process of the model, and solving the technical problems of high resource consumption and computational efficiency in the training process of the model.

本公开实施例针对生成任务场景,比如,根据条件数据生成图像、视频或文字等场景中,还提供了另一种预训练模型的数据处理方法。图4是根据本公开实施例的另一种预训练模型的数据处理方法的流程图。如图4所示,该方法可以包括以下步骤:The embodiment of the present disclosure is directed to generating task scenarios, such as generating images, videos or texts according to conditional data, and also provides another method for processing data of a pre-trained model. FIG4 is a flow chart of another method for processing data of a pre-trained model according to an embodiment of the present disclosure. As shown in FIG4, the method may include the following steps:

步骤S402,获取预训练模型中主干网络对条件数据进行处理,得到的特征数据,其中,主干网络来自初始主干网络,条件数据用于确定目标数据的生成条件。Step S402, obtaining feature data obtained by processing the conditional data by the backbone network in the pre-trained model, wherein the backbone network comes from the initial backbone network, and the conditional data is used to determine the generation condition of the target data.

在本公开上述步骤S402提供的技术方案中,可以获取条件数据。主干网络对条件数据进行处理,得到特征数据。其中,条件数据可以用于确定目标数据的生成条件,比如,可以为“绘制一只蓝色的矮脚猫”,可以为线稿图、深度图、姿态等内容,可以包含文本条件、图像条件等,此处仅为举例说明,不对条件数据的内容做具体限制。主干网络可以为条件二维深度学习模型(Conditional 2D U-Net),可以包含编码层(Encoder)、解码层(Docoder)和中间层(Middle)。In the technical solution provided in the above step S402 of the present disclosure, conditional data can be obtained. The backbone network processes the conditional data to obtain feature data. Among them, the conditional data can be used to determine the generation conditions of the target data, for example, it can be "drawing a blue dwarf cat", it can be a line drawing, a depth map, a posture and other contents, and can include text conditions, image conditions, etc. This is only for example explanation, and no specific restrictions are made on the content of the conditional data. The backbone network can be a conditional two-dimensional deep learning model (Conditional 2D U-Net), which can include an encoding layer (Encoder), a decoding layer (Docoder) and a middle layer (Middle).

举例而言,可以获取条件数据“写一套用于筛选数据的代码”。可以对条件数据中的关键词进行提取和转换,关键词对应的得到特征数据。需要说明的是,上述条件数据的内容、特征数据的确定仅为举例说明,此处不做具体限制。For example, the conditional data "write a set of codes for filtering data" can be obtained. Keywords in the conditional data can be extracted and converted, and feature data corresponding to the keywords can be obtained. It should be noted that the content of the above conditional data and the determination of the feature data are only for illustration and are not specifically limited here.

步骤S404,调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出。Step S404, calling a target bypass network to convert the characteristic data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.

在本公开上述步骤S404提供的技术方案中,可以从主干网络中提取出调优模块,可以基于至少一调优模块构建得到初始旁路网络,可以基于训练数据对初始旁路网络的参数进行更新,得到目标旁路网络。可以调用训练好的目标旁路网络对特征数据进行转换。其中,训练数据可以为带标识的数据,可以为预先获取的,用于对初始旁路 网络进行训练的数据,比如,可以为图像数据、文字数据、语音数据等,此处仅为举例说明,不对训练数据的类别做具体限制。初始旁路网络的参数可以为权重、偏置数据等,此处仅为举例说明,不对参数的类型做具体限制。In the technical solution provided in the above step S404 of the present disclosure, a tuning module can be extracted from the backbone network, an initial bypass network can be constructed based on at least one tuning module, and the parameters of the initial bypass network can be updated based on the training data to obtain a target bypass network. The trained target bypass network can be called to convert the feature data. The training data can be data with an identifier, which can be pre-acquired and used to convert the initial bypass network. The data for network training, for example, can be image data, text data, voice data, etc., which is only for example and does not limit the type of training data. The parameters of the initial bypass network can be weights, bias data, etc., which is only for example and does not limit the type of parameters.

可选地,该实施例从初始主干网络中提取调优模块,得到调优模块和主干网路。可以将提取出的调优模块进行组合,得到初始旁路网络,可以对初始旁路网络中的参数进行更新,得到目标旁路网络。在生成任务的场景中,可以获取预训练模型中主干网络的对条件数据进行处理,得到的特征数据。调用训练好的目标旁路网络对特征数据进行转换。Optionally, this embodiment extracts a tuning module from the initial backbone network to obtain a tuning module and a backbone network. The extracted tuning modules can be combined to obtain an initial bypass network, and the parameters in the initial bypass network can be updated to obtain a target bypass network. In the scenario of generating tasks, the conditional data of the backbone network in the pre-trained model can be obtained to process the feature data. The trained target bypass network is called to convert the feature data.

步骤S406,基于转换后的特征数据,生成与条件数据对应的目标数据,其中,目标数据的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。Step S406: generating target data corresponding to the conditional data based on the converted feature data, wherein the type of the target data includes at least one of the following: text information, image information, video information and voice information.

在本公开上述步骤S406提供的技术方案中,可以将目标旁路模型转换后的特征数据输入至输出层中,输出层对特征数据进行处理,生成与条件数据对应的目标数据。其中,目标数据可以为基于条件数据生成的动画、故事、风景画、人脸图像、对话等,目标数据的类型可以包括如下至少之一:文本信息、图像信息、视频信息和语音信息,此处仅为举例说明,不对目标数据的类别做具体限制。In the technical solution provided in the above step S406 of the present disclosure, the feature data converted by the target bypass model can be input into the output layer, and the output layer processes the feature data to generate target data corresponding to the conditional data. The target data can be an animation, story, landscape, face image, dialogue, etc. generated based on the conditional data, and the type of the target data can include at least one of the following: text information, image information, video information, and voice information. This is only for example, and the category of the target data is not specifically limited.

可选地,预训练模型可以为基于条件预训练的转换器模型(Chat Conditional Pretrained Transformer,简称为ChatCPT),可以用于生成对话回复和进行对话交互。当条件数据为对话上下文时,可以基于对话上下文,确定询问信息。可以调用预训练模型中的主干网络对条件数据进行处理,得到特征数据调用目标旁路网络对特征数据进行转换;基于转换后的特征数据,可以生成与条件数据对应的对话回复(也即目标数据)。或者,当条件数据为用户提出的问题时,可以通过预训练模型生成与用户提问信息对应的回答。再或者,当条件数据为较长的文本信息和用户希望将文本信息简洁化的要求时,可以通过预训练模型生成简洁、概括性的摘要内容。又或者,当条件数据为用户提供的信息和指导时,预训练模型可以进行辅助写作,可以对条件数据进行处理,得到与田间数据对应的文章、故事情节、对话剧本等内容。Optionally, the pre-trained model can be a conditional pre-trained transformer model (Chat Conditional Pretrained Transformer, referred to as ChatCPT), which can be used to generate dialogue responses and conduct dialogue interactions. When the conditional data is the dialogue context, the inquiry information can be determined based on the dialogue context. The backbone network in the pre-trained model can be called to process the conditional data, and the feature data can be obtained. The target bypass network is called to transform the feature data; based on the converted feature data, a dialogue response corresponding to the conditional data (i.e., target data) can be generated. Alternatively, when the conditional data is a question raised by the user, the pre-trained model can be used to generate an answer corresponding to the user's question information. Alternatively, when the conditional data is a long text message and the user wants to simplify the text message, a concise and general summary content can be generated through the pre-trained model. Alternatively, when the conditional data is information and guidance provided by the user, the pre-trained model can assist in writing, and the conditional data can be processed to obtain articles, storylines, dialogue scripts, etc. corresponding to the field data.

需要说明的是,上述只是对预训练模型的使用场景进行举例说明,此处不做具体限制,通过预训练模型对条件数据进行处理得到目标数据的方法都应该在本公开的保护范围内。It should be noted that the above is only an example of the use scenario of the pre-trained model, and no specific limitation is made here. The method of processing the conditional data through the pre-trained model to obtain the target data should be within the scope of protection of this disclosure.

在本公开实施例中,获取预训练模型中主干网络对条件数据进行处理,得到的特征数据,其中,主干网络来自初始主干网络,条件数据用于确定目标数据的生成条件;调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,生成与条件数据对应的目标数据,其中,目标数据的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息, 从而达到减少模型的训练过程资源消耗的技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。In an embodiment of the present disclosure, feature data is obtained by processing conditional data by a backbone network in a pre-trained model, wherein the backbone network comes from an initial backbone network, and the conditional data is used to determine the generation condition of target data; a target bypass network is called to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, target data corresponding to the conditional data is generated, wherein the type of the target data includes at least one of the following: text information, image information, video information, and voice information, Thereby achieving the technical effect of reducing resource consumption in the model training process, and solving the technical problems of high resource consumption and computational efficiency in the model training process.

根据本公开实施例,还提供了一种可以应用于虚拟现实VR设备、增强现实AR设备等虚拟现实场景下的预训练模型的数据处理方法,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present disclosure, a data processing method for a pre-trained model in a virtual reality scenario such as a virtual reality VR device and an augmented reality AR device is also provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that shown here.

图5是根据本公开实施例的另一种预训练模型的数据处理方法的流程图,如图5所示,该方法可以包括以下步骤。FIG5 is a flowchart of another method for processing data of a pre-training model according to an embodiment of the present disclosure. As shown in FIG5 , the method may include the following steps.

步骤S502,在虚拟现实VR设备或增强现实AR设备上输入待转换语音。Step S502: input the speech to be converted on a virtual reality VR device or an augmented reality AR device.

步骤S504,使用预训练模型中的主干模型从待转换语音中提取出特征数据,其中,主干网络来自初始主干网络。Step S504, using the backbone model in the pre-trained model to extract feature data from the speech to be converted, wherein the backbone network comes from the initial backbone network.

步骤S506,调用预训练模型中的目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出。Step S506, calling the target bypass network in the pre-trained model to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.

步骤S508,基于转换后的特征数据,确定待转换语音对应的图像信息。Step S508: determining the image information corresponding to the speech to be converted based on the converted feature data.

步骤S510,使用图像信息激活VR设备或AR设备,并将图像信息展示在VR设备或AR设备中。Step S510: activating the VR device or AR device using the image information, and displaying the image information in the VR device or AR device.

可选地,在本实施例中,上述预训练模型的数据处理方法可以应用于由服务器、虚拟现实设备所构成的硬件环境中。控制VR设备或AR设备执行定位信息对应的人机交互操作,服务器可以为媒体文件运营商对应的服务器,上述网络包括但不限于:广域网、城域网或局域网,上述虚拟现实设备并不限定于:虚拟现实头盔、虚拟现实眼镜、虚拟现实一体机等。Optionally, in this embodiment, the data processing method of the pre-trained model can be applied to a hardware environment composed of a server and a virtual reality device. The VR device or AR device is controlled to perform a human-computer interaction operation corresponding to the positioning information. The server can be a server corresponding to the media file operator. The network includes but is not limited to: a wide area network, a metropolitan area network or a local area network. The virtual reality device is not limited to: a virtual reality helmet, virtual reality glasses, a virtual reality all-in-one machine, etc.

可选地,虚拟现实设备可以包括:存储器、处理器和传输装置。存储器用于存储应用程序,该应用程序可以用于执行:在虚拟现实VR设备或增强现实AR设备上输入待转换语音;使用预训练模型中的主干模型从待转换语音中提取出特征数据,其中,主干网络来自初始主干网络;调用预训练模型中的目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,确定待转换语音对应的图像信息;使用图像信息激活VR设备或AR设备,并将图像信息展示在VR设备或AR设备中。Optionally, the virtual reality device may include: a memory, a processor, and a transmission device. The memory is used to store an application, which can be used to execute: inputting a speech to be converted on a virtual reality VR device or an augmented reality AR device; extracting feature data from the speech to be converted using a backbone model in a pre-trained model, wherein the backbone network comes from an initial backbone network; calling a target bypass network in the pre-trained model to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; determining image information corresponding to the speech to be converted based on the converted feature data; using the image information to activate the VR device or AR device, and displaying the image information in the VR device or AR device.

需要说明的是,该实施例的上述应用在VR设备或AR设备中的预训练模型的处理方法可以包括图5所示实施例的方法,以控制VR设备或AR设备执行定位信息对应的人机交互操作。 It should be noted that the above-mentioned processing method of the pre-trained model applied in the VR device or AR device in this embodiment may include the method of the embodiment shown in Figure 5, so as to control the VR device or AR device to perform human-computer interaction operations corresponding to the positioning information.

可选地,该实施例的处理器可以通过传输装置调用上述存储器存储的应用程序以执行上述步骤。传输装置可以通过网络接收服务器发送的媒体文件,也可以用于上述处理器与存储器之间的数据传输。Optionally, the processor of this embodiment can call the application stored in the memory to execute the above steps through a transmission device. The transmission device can receive media files sent by the server through the network, and can also be used for data transmission between the processor and the memory.

可选地,在虚拟现实设备中,带有眼球追踪的头戴式显示器,该HMD头显中的屏幕,用于显示展示的视频画面,HMD中的眼球追踪模块,用于获取用户眼球的实时运动路径,跟踪系统,用于追踪用户在真实三维空间的位置信息与运动信息,计算处理单元,用于从跟踪系统中获取用户的实时位置与运动信息,并计算出用户头部在虚拟三维空间中的三维坐标,以及用户在虚拟三维空间中的视野朝向等。Optionally, in a virtual reality device, a head-mounted display with eye tracking is provided, wherein the screen in the HMD is used to display the displayed video images, the eye tracking module in the HMD is used to obtain the real-time movement path of the user's eyes, the tracking system is used to track the user's position information and motion information in the real three-dimensional space, and the computing processing unit is used to obtain the user's real-time position and motion information from the tracking system, and calculate the three-dimensional coordinates of the user's head in the virtual three-dimensional space, as well as the user's field of view direction in the virtual three-dimensional space, etc.

在本公开实施例中,虚拟现实设备可以与终端相连接,终端与服务器通过网络进行连接,上述虚拟现实设备并不限定于:虚拟现实头盔、虚拟现实眼镜、虚拟现实一体机等,上述终端并不限定于PC、手机、平板电脑等,服务器可以为媒体文件运营商对应的服务器,上述网络包括但不限于:广域网、城域网或局域网。In the embodiments of the present disclosure, a virtual reality device may be connected to a terminal, and the terminal and the server are connected via a network. The virtual reality device is not limited to: a virtual reality helmet, virtual reality glasses, a virtual reality all-in-one machine, etc. The terminal is not limited to a PC, a mobile phone, a tablet computer, etc. The server may be a server corresponding to a media file operator, and the network includes but is not limited to: a wide area network, a metropolitan area network or a local area network.

图6是根据本公开实施例的一种预训练模型的数据处理结果的示意图,如图6所示,在VR设备或AR设备的呈现画面上展示待转换语音“画一只小鹿”,可以使用预训练模型中的主干模型从待转换语音中提取出特征数据,且调用预训练模型中的目标旁路网络对特征数据进行转换,基于转换后的特征数据,可以确定待转换语音对应的图像信息;可以使用图像信息激活VR设备或AR设备,并将图像信息“小鹿”展示在VR设备或AR设备中。Figure 6 is a schematic diagram of a data processing result of a pre-trained model according to an embodiment of the present disclosure. As shown in Figure 6, the speech to be converted "Draw a deer" is displayed on the presentation screen of the VR device or AR device, and the backbone model in the pre-trained model can be used to extract feature data from the speech to be converted, and the target bypass network in the pre-trained model is called to convert the feature data. Based on the converted feature data, the image information corresponding to the speech to be converted can be determined; the image information can be used to activate the VR device or AR device, and the image information "deer" can be displayed in the VR device or AR device.

通过上述步骤,在虚拟现实VR设备或增强现实AR设备上输入待转换语音;使用预训练模型中的主干模型从待转换语音中提取出特征数据,其中,主干网络来自初始主干网络;调用预训练模型中的目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,确定待转换语音对应的图像信息;使用图像信息激活VR设备或AR设备,并将图像信息展示在VR设备或AR设备中,从而解决了模型的训练过程资源消耗多、计算效率的技术问题,实现了减少模型的训练过程资源消耗的技术效果。Through the above steps, the speech to be converted is input on the virtual reality VR device or the augmented reality AR device; the backbone model in the pre-trained model is used to extract feature data from the speech to be converted, wherein the backbone network comes from the initial backbone network; the target bypass network in the pre-trained model is called to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, the image information corresponding to the speech to be converted is determined; the image information is used to activate the VR device or AR device, and the image information is displayed in the VR device or AR device, thereby solving the technical problems of high resource consumption and computing efficiency in the training process of the model, and achieving the technical effect of reducing resource consumption in the training process of the model.

根据本公开实施例,还提供了一种可以应用于人工智能生成内容(Artificial Intelligence Generated Content,简称为AIGC)系统的预训练模型的数据处理方法,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present disclosure, there is also provided a data processing method for a pre-trained model that can be applied to an artificial intelligence generated content (AIGC) system. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.

图7是根据本公开实施例的另一种预训练模型的数据处理方法的流程图,如图7所示,该方法可以包括以下步骤。 FIG7 is a flowchart of another method for processing data of a pre-training model according to an embodiment of the present disclosure. As shown in FIG7 , the method may include the following steps.

步骤S702,在对话界面中输入多模态信息,其中,多模态信息的类型包括如下至少之一:包含字符信息的询问信息、包含帧图像信息的视频帧信息、音频信息。Step S702: input multimodal information in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: inquiry information including character information, video frame information including frame image information, and audio information.

在本公开上述步骤S702提供的技术方案中,可以获取在对话界面中输入的多模态信息,其中,多模态信息的类型可以包括如下至少之一:包含字符信息的询问信息、包含帧图像信息的视频帧信息、音频信息,需要说明的是,此处仅为举例说明,不对多模态信息的类型做具体限制。对话界面可以为人工智能生成内容系统所在终端的显示界面,此处仅为举例说明,可以用于进行对话系统的对话界面都应在本公开的保护范围之内,此处不做具体限制。In the technical solution provided in the above step S702 of the present disclosure, multimodal information input in the dialogue interface can be obtained, wherein the type of multimodal information can include at least one of the following: query information containing character information, video frame information containing frame image information, and audio information. It should be noted that this is only an example and does not specifically limit the type of multimodal information. The dialogue interface can be the display interface of the terminal where the artificial intelligence content generation system is located. This is only an example. The dialogue interface that can be used for the dialogue system should be within the protection scope of the present disclosure and is not specifically limited here.

步骤S704,调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据,其中,主干网络来自初始主干网络。Step S704, calling the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network.

在本公开上述步骤S704提供的技术方案中,AIGC系统可以获取在对话界面中输入多模态信息,并调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据。In the technical solution provided in the above step S704 of the present disclosure, the AIGC system can obtain multimodal information input in the dialogue interface, and call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data.

步骤S706,调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出。Step S706, calling the target bypass network to convert the characteristic data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.

步骤S708,基于转换后的特征数据,生成与多模态信息对应的答复信息,其中,答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。Step S708, generating reply information corresponding to the multimodal information based on the converted feature data, wherein the type of the reply information includes at least one of the following: text information, image information, video information and voice information.

在本公开上述步骤S708提供的技术方案中,预训练模型对多模态信息进行处理,得到转换后的特征数据,基于转换后的特征数据,可以生成与多模态信息对应的答复信息,其中,答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息,需要说明的是,此处不对答复信息的类型做具体限制。In the technical solution provided in the above step S708 of the present disclosure, the pre-trained model processes the multimodal information to obtain converted feature data. Based on the converted feature data, reply information corresponding to the multimodal information can be generated, wherein the type of the reply information includes at least one of the following: text information, image information, video information and voice information. It should be noted that there is no specific restriction on the type of reply information here.

举例而言,获取用户在对话界面中输入的多模态信息:“举例说明神经网络的使用场景”,可以调用训练好的预训练模型对多模态信息进行处理,得到与多模态信息对应的答复信息:“神经网络的使用场景可以为图像使用场景、语音使用场景和代码方向的使用场景”。For example, to obtain the multimodal information input by the user in the dialogue interface: "Give examples of usage scenarios of neural networks", the trained pre-trained model can be called to process the multimodal information to obtain response information corresponding to the multimodal information: "The usage scenarios of neural networks can be image usage scenarios, voice usage scenarios, and code direction usage scenarios."

作为一种可选的实施方式,步骤S706,调用目标旁路网络对特征数据进行转换,包括:调用目标旁路网络中的调优模块对特征数据进行调整;基于目标旁路网络的权重,对调整后的特征数据进行加权求和。As an optional implementation, step S706, calling the target bypass network to convert the characteristic data, includes: calling the tuning module in the target bypass network to adjust the characteristic data; and performing weighted summation on the adjusted characteristic data based on the weight of the target bypass network.

在该实施例中,可以获取用户在对话界面中输入的多模态信息,可以调用预训练模型中的主干网络对多模态信息进行分析处理,得到特征数据。可以将特征数据输入至训练好的目标旁路网络中,目标旁路网络中的调优模块对特征数据进行调整,且基 于目标旁路网络对调整后的特征数据进行加权求和,以得到最终转换后的特征数据。In this embodiment, the multimodal information input by the user in the dialogue interface can be obtained, and the backbone network in the pre-trained model can be called to analyze and process the multimodal information to obtain feature data. The feature data can be input into the trained target bypass network, and the tuning module in the target bypass network adjusts the feature data, and the base The adjusted feature data is weighted and summed in the target bypass network to obtain the final converted feature data.

作为一种可选的实施方式,步骤S708,基于转换后的特征数据,生成与多模态信息对应的答复信息,包括:基于预训练模型中输出层对转换后的特征数据进行处理,得到与多模态信息对应的答复信息。As an optional implementation, step S708, based on the converted feature data, generates reply information corresponding to the multimodal information, including: processing the converted feature data based on the output layer in the pre-trained model to obtain reply information corresponding to the multimodal information.

在该实施例中,可以基于预训练模型中的输出层对转换后的特征数据进行处理,以将特征数据转换为与多模态信息对应的答复信息。可以将答复信息显示在对话界面中。In this embodiment, the converted feature data may be processed based on the output layer in the pre-trained model to convert the feature data into reply information corresponding to the multimodal information. The reply information may be displayed in the dialogue interface.

在该实施例中,在对话界面中输入多模态信息,其中,多模态信息的类型包括如下至少之一:包含字符信息的询问信息、包含帧图像信息的视频帧信息、音频信息;调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据,其中,主干网络来自初始主干网络;调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,生成与多模态信息对应的答复信息,其中,答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息,从而解决了模型的训练过程资源消耗多、计算效率的技术问题,实现了减少模型的训练过程资源消耗的技术效果。In this embodiment, multimodal information is input in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: inquiry information containing character information, video frame information containing frame image information, and audio information; the backbone network in the pre-trained model is called to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network; the target bypass network is called to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, reply information corresponding to the multimodal information is generated, wherein the type of the reply information includes at least one of the following: text information, image information, video information, and voice information, thereby solving the technical problems of high resource consumption and computational efficiency in the training process of the model, and achieving the technical effect of reducing resource consumption in the training process of the model.

需要说明的是,本公开所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this disclosure are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本公开所必须的。For the aforementioned method embodiments, for the sake of simplicity, they are all described as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the order of the actions described, because according to the present disclosure, some steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present disclosure.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on such an understanding, the technical solution of the present disclosure, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of each embodiment of the present disclosure.

根据本公开实施例,还提供了一种预训练模型的数据处理系统的实施例,图8(a)是根据本公开实施例的一种预训练模型的数据处理系统的示意图,如图8(a)所示, 预训练模型的数据处理系统800可以包括:客户端802和服务端804。According to an embodiment of the present disclosure, an embodiment of a data processing system for a pre-trained model is also provided. FIG8(a) is a schematic diagram of a data processing system for a pre-trained model according to an embodiment of the present disclosure. As shown in FIG8(a), The data processing system 800 of the pre-trained model may include: a client 802 and a server 804.

客户端802,设置为显示对话界面,并捕获在对话界面中输入的多模态信息,其中,多模态信息的类型包括如下至少之一:包含字符信息的询问信息、包含帧图像信息的视频帧信息、音频信息。The client 802 is configured to display a dialogue interface and capture multimodal information input in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: query information containing character information, video frame information containing frame image information, and audio information.

可选地,在客户端802的对话界面中输入多模态信息。Optionally, the multimodal information is input in the dialogue interface of the client 802 .

服务端804,设置为调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据,且调用目标旁路网络对特征数据进行转换,基于转换后的特征数据,生成与多模态信息对应的答复信息,其中,主干网络来自初始主干网络,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出,答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。The server 804 is configured to call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, and call the target bypass network to convert the feature data, and generate reply information corresponding to the multimodal information based on the converted feature data, wherein the backbone network comes from the initial backbone network, the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network, and the type of the reply information includes at least one of the following: text information, image information, video information and voice information.

可选地,客户端802可以将获取到的多模态信息传输至服务端804中。服务端804可以调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据,且调用目标旁路网络对特征数据进行转换,基于转换后的特征数据,生成与多模态信息对应的答复信息。Optionally, the client 802 may transmit the acquired multimodal information to the server 804. The server 804 may call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, and call the target bypass network to convert the feature data, and generate reply information corresponding to the multimodal information based on the converted feature data.

可选地,服务端804可以将答复信息传输至客户端802的对话界面中进行显示。Optionally, the server 804 may transmit the reply information to the dialogue interface of the client 802 for display.

在该实施例中,通过客户端,显示对话界面,并捕获在对话界面中输入的多模态信息,其中,多模态信息的类型包括如下至少之一:包含字符信息的询问信息、包含帧图像信息的视频帧信息、音频信息;通过服务端,调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据,且调用目标旁路网络对特征数据进行转换,基于转换后的特征数据,生成与多模态信息对应的答复信息,其中,主干网络来自初始主干网络,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出,答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息,从而达到了减少模型的训练过程中的资源消耗技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。In this embodiment, a dialogue interface is displayed through a client, and multimodal information input in the dialogue interface is captured, wherein the type of multimodal information includes at least one of the following: query information including character information, video frame information including frame image information, and audio information; through a server, a backbone network in a pre-trained model is called to at least analyze and process the multimodal information to obtain feature data, and a target bypass network is called to convert the feature data, and based on the converted feature data, reply information corresponding to the multimodal information is generated, wherein the backbone network comes from an initial backbone network, the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network, and the type of reply information includes at least one of the following: text information, image information, video information, and voice information, thereby achieving a technical effect of reducing resource consumption in the training process of the model, and solving the technical problems of high resource consumption and computational efficiency in the training process of the model.

目前,基于大规模预训练基础模型的参数高效迁移学习方法在各种下游应用中均取得了巨大成功。现有的调优方法通常是在冻结基础模型的情况下,引入了参数轻量级可训练的结构来执行对于特定任务的微调,但是大多数现有的参数高效迁移学习方法都会改变预训练模型的中间状态,因此,训练时需要更多的内存和时间开销,并且在多任务推理时需要通过主干对不同任务进行冗余的计算,从而导致存在预训练模型的训练过程消耗大且计算效率低的问题。At present, parameter-efficient transfer learning methods based on large-scale pre-trained basic models have achieved great success in various downstream applications. Existing tuning methods usually introduce lightweight and trainable structures to perform fine-tuning for specific tasks while freezing the basic model. However, most existing parameter-efficient transfer learning methods change the intermediate state of the pre-trained model, so more memory and time are required for training, and redundant calculations of different tasks need to be performed through the backbone during multi-task reasoning, resulting in the problem of high consumption and low computational efficiency of the training process of the pre-trained model.

作为一种可选的实施例,图8(b)是根据本公开实施例的一种调优模块的示意图,如图8(b)所示,原有的调优方法会依据特定任务的不同,对原始训练架构中的不同 部分进行轻量化(比如Prompt、Prefix、Adapter等)的调整,但是该方法通过将参数高效的调优模块深嵌于在原有的主干网络中,其中,原有的主干网络(初始主干网络)中可以包含前馈网络和多头注意力。在训练时,需要通过初始主干网络对不同任务进行计算,从而导致存在浪费内存和推理时开销的问题。As an optional embodiment, FIG8(b) is a schematic diagram of a tuning module according to an embodiment of the present disclosure. As shown in FIG8(b), the original tuning method will adjust different optimization methods in the original training architecture according to different specific tasks. Some parts are light-weighted (such as prompt, prefix, adapter, etc.), but this method embeds the parameter-efficient tuning module deeply into the original backbone network, where the original backbone network (initial backbone network) can include a feedforward network and multi-head attention. During training, different tasks need to be calculated through the initial backbone network, resulting in a waste of memory and inference overhead.

为解决上述问题,该实施例提出了一种基于预训练模型的参数和内存的高效调优方法。该方法构建了一个参数和内存高效的旁路网络(又可以称为调优框架),将调优模块从主干网络中进行抽取,并构建独立于主干网络的前向和反向传播链路,在保持参数高效的优势下,减少了训练时的内存开销和推理多任务时的冗余计算,从而达到了减少模型的训练过程中的资源消耗且提高计算效率的目的。To solve the above problems, this embodiment proposes an efficient tuning method for parameters and memory based on a pre-trained model. This method constructs a bypass network (also called a tuning framework) with efficient parameters and memory, extracts the tuning module from the backbone network, and constructs forward and reverse propagation links independent of the backbone network. While maintaining the advantage of efficient parameters, it reduces the memory overhead during training and redundant calculations during reasoning multi-tasks, thereby achieving the purpose of reducing resource consumption during model training and improving computing efficiency.

下面对本公开实施例提出的一种基于预训练模型的参数和内存的高效调优方法进行进一步介绍。The following is a further introduction to an efficient tuning method for parameters and memory based on a pre-trained model proposed in an embodiment of the present disclosure.

图8(c)是根据本公开实施例的另一种调优模块的示意图,如图8(c)所示,该实施例将调优模块从初始主干网络中进行分离形成独立的子模块,可以将分离得到的子模块以残差的形式并行作用在主干网络上并进行自由地组合,形成一个独立的初始旁路网络。Figure 8(c) is a schematic diagram of another tuning module according to an embodiment of the present disclosure. As shown in Figure 8(c), this embodiment separates the tuning module from the initial backbone network to form an independent sub-module. The separated sub-modules can act in parallel on the backbone network in the form of residuals and can be freely combined to form an independent initial bypass network.

需要说明的是,分离得到的子模块可以以残差的形式作用在主干网络上,也可以以跳跃连接等其他连接方式作用在主干网络上,此处不对连接方式做具体限制。It should be noted that the separated sub-modules can act on the backbone network in the form of residuals, or in other connection modes such as skip connections. No specific restrictions are made on the connection mode here.

可选地,初始旁路网络在训练过程中,在前向传播时,可以使用主干网络的中间输出(特征数据),但在反向传播时,则与主干网络断开,数据流只经过旁路网络,从而在反向传播时,可以避免进一步计算主干网络的梯度,实现了内存的节省和训练速度的提升,进而达到了减少模型的训练过程中的资源消耗且提高计算效率的目的,解决了模型的训练过程资源消耗多、计算效率低的问题。Optionally, during the training process, the initial bypass network can use the intermediate output (feature data) of the backbone network during forward propagation, but during reverse propagation, it is disconnected from the backbone network, and the data flow only passes through the bypass network. Therefore, during reverse propagation, further calculation of the gradient of the backbone network can be avoided, thereby saving memory and improving training speed, thereby achieving the purpose of reducing resource consumption and improving computing efficiency during model training, and solving the problem of high resource consumption and low computing efficiency during model training.

如图8(c)所示,预训练模型可以由主干网络和残差连接的初始旁路网络组成,同时具备参数高效和内存高效的特点。其中,主干网络可以包括多个卷积层。As shown in Figure 8(c), the pre-trained model can be composed of a backbone network and an initial bypass network with residual connections, which is both parameter efficient and memory efficient. The backbone network can include multiple convolutional layers.

在该实施例中,预训练模型的整体结构可以分为两部分,一部分是主干模型,在训练过程中主干模型的参数被冻结;另一部分是可训练的初始旁路网络(又可以称为旁路结构),可以由若干个调优模块组成。可以将调优模块到主干网络的数据流进行截断,形成与主干网络相对独立的新调优模块,可以将不同的调优模块(Res-Tuner)进行组合设计构建出完整的初始旁路网络,基于输出结果可以对初始旁路网络的参数进行更新,得到目标旁路网络。In this embodiment, the overall structure of the pre-trained model can be divided into two parts, one part is the backbone model, the parameters of the backbone model are frozen during the training process; the other part is the trainable initial bypass network (also called bypass structure), which can be composed of several tuning modules. The data flow from the tuning module to the backbone network can be cut off to form a new tuning module that is relatively independent of the backbone network. Different tuning modules (Res-Tuner) can be combined and designed to construct a complete initial bypass network. Based on the output results, the parameters of the initial bypass network can be updated to obtain the target bypass network.

该实施例中,由于反向传播时,初始旁路网络和主干网络二者之间的数据流被截断,初始旁路网络中调优模块的数据流不在流向主干网络中,所以节省了内存,达到了减少内存损耗的目的。In this embodiment, since the data flow between the initial bypass network and the backbone network is cut off during reverse propagation, the data flow of the tuning module in the initial bypass network does not flow to the backbone network, thereby saving memory and achieving the purpose of reducing memory loss.

可选地,通过以下公式可以确定初始旁路网络的输出:

Optionally, the output of the initial bypass network can be determined by the following formula:

其中,可以用于表征第一层水平调优模块对应的调整后的特征数据,可以用于表征第l层的旁路网络的输出,Res-Tuner(xl)可以用于表征第l层水平调优模块的输出,Res-Tuner可以用于表征第l-1层垂直调优模块的输出,x0可以用于表征第零层主干网络输出的特征数据,xl可以用于表征第l层的输出特征,λ可以用于表征初始旁路网络的权重。in, It can be used to characterize the adjusted feature data corresponding to the first-level horizontal tuning module. It can be used to characterize the output of the bypass network at layer l. Res-Tuner(x l ) can be used to characterize the output of the horizontal tuning module at layer l. It can be used to characterize the output of the vertical tuning module of the l-1th layer, x0 can be used to characterize the characteristic data of the output of the zeroth layer backbone network, xl can be used to characterize the output characteristics of the lth layer, and λ can be used to characterize the weight of the initial bypass network.

可选地,如图8(c)所示,每一个相应于主干网络的网络层中都在初始旁路网络中存在对应的水平调优模块和垂直调优模块,水平调优模块和垂直调优模块可以分别用于处理来自主干网络的第l层的输出特征和初始旁路网络中第l-1层的特征输出,同时可以将两者输出使用可学习的权重(λ系数)进行加权求和。Optionally, as shown in FIG8(c), each network layer corresponding to the backbone network has a corresponding horizontal tuning module and a vertical tuning module in the initial bypass network. The horizontal tuning module and the vertical tuning module can be used to process the output features from the lth layer of the backbone network and the feature outputs from the l-1th layer of the initial bypass network, respectively, and the outputs of the two can be weighted and summed using a learnable weight (λ coefficient).

可选地,可以通过反向传播算法,对权重系数进行不断的调整。Optionally, the weight coefficients may be continuously adjusted through a back-propagation algorithm.

举例而言,假设有主干网络有12层网络层,可以12个网络层对输入数据全部进行转换完毕后,将特征数据输入至初始旁路网络中。但为提高数据的处理效率,可以由主干网络的第一个网络层将特征数据传输至初始旁路的第一个阶段;再由主干网络第二个网络层将特征数据传输至初始旁路的第二个阶段,以此类推。For example, if the backbone network has 12 network layers, the feature data can be input into the initial bypass network after all the input data are converted by the 12 network layers. However, in order to improve the data processing efficiency, the feature data can be transmitted from the first network layer of the backbone network to the first stage of the initial bypass; and then the feature data can be transmitted from the second network layer of the backbone network to the second stage of the initial bypass, and so on.

作为一种可选的实施例,预训练模型可以应用于判别式任务。As an optional embodiment, the pre-trained model can be applied to the discriminative task.

图9是根据本公开实施例的一种预训练模型的示意图,如图9所示,在判别式任务中,预训练模型的整体结构可以分为两部分。一部分为主干网络,在训练过程中对主干网络进行冻结,其中,主干网络可以包括多个训练层(比如,训练层_1、训练层_2……训练层_N),上述训练层可以为卷积层。另一部分为可训练的初始旁路网络(又可以称为旁路结构),初始旁路网络可以由若干个调优模块组成,其中,每一个相应主干网络的网络层上均有一个水平调优模块和一个垂直调优模块,分别用于接收来自主干网络中该层输出的特征数据和初始旁路网络中上一层的输出,两者进行加权求和,输出到下一层中,直到最后一层的转换后的特征数据输出给预训练模型中的输出层进行相应结果的输出,可以基于相应结果的输出进行反向传播过程,且在反向传播的过程中,会断开从旁路网络到主干网络的反向传播路径,以得到训练好的目标旁路网格。当获取到待分类图像时,可以获取主干模型对待分类图像进行处理,得到的特征数据。可以调用目标旁路网络对特征数据进行转换,且将转换后的特征数据输出至预训练模型中的输出层,以确定待分类图像所属的类别。FIG9 is a schematic diagram of a pre-training model according to an embodiment of the present disclosure. As shown in FIG9 , in a discriminant task, the overall structure of the pre-training model can be divided into two parts. One part is a backbone network, and the backbone network is frozen during the training process, wherein the backbone network may include multiple training layers (e.g., training layer_1, training layer_2...training layer_N), and the above training layer may be a convolutional layer. The other part is a trainable initial bypass network (also referred to as a bypass structure), and the initial bypass network may be composed of several tuning modules, wherein each network layer of the corresponding backbone network has a horizontal tuning module and a vertical tuning module, which are respectively used to receive the feature data output from the layer in the backbone network and the output of the previous layer in the initial bypass network, and the two are weighted and summed, and output to the next layer, until the converted feature data of the last layer is output to the output layer in the pre-training model for the output of the corresponding result, and the reverse propagation process can be performed based on the output of the corresponding result, and in the process of reverse propagation, the reverse propagation path from the bypass network to the backbone network will be disconnected to obtain the trained target bypass grid. When the image to be classified is obtained, the backbone model can be used to process the image to be classified to obtain feature data. The target bypass network can be called to convert the feature data, and the converted feature data can be output to the output layer in the pre-trained model to determine the category to which the image to be classified belongs.

作为一种可选的实施例,预训练模型可以应用于生成式任务。As an optional embodiment, the pre-trained model can be applied to the generative task.

图10是根据本公开实施例的另一种预训练模型的示意图,如图10所示,在生成 式任务(比如,图像生成)时,预训练模型的整体结构可以包括在训练过程中进行参数冻结的主干网络和更新好参数的目标旁路网络,其中,主干网络可以为基于稳定扩散(StableDiffusion)的深度学习模型(U-Net)结构,可以包含编码层64*64、编码层32*32、编码层16*16、编码层8*8、中间层8*8、解码层8*8、解码层16*16、解码层32*32和解码层64*64。目标旁路网络可以为可训练的旁路结构,可以由若干个调优模块组成。FIG. 10 is a schematic diagram of another pre-training model according to an embodiment of the present disclosure. As shown in FIG. 10 , when generating When performing a task (e.g., image generation) using a pre-trained model, the overall structure of the pre-trained model may include a backbone network whose parameters are frozen during training and a target bypass network whose parameters are updated, wherein the backbone network may be a deep learning model (U-Net) structure based on stable diffusion, and may include a coding layer of 64*64, a coding layer of 32*32, a coding layer of 16*16, a coding layer of 8*8, an intermediate layer of 8*8, a decoding layer of 8*8, a decoding layer of 16*16, a decoding layer of 32*32, and a decoding layer of 64*64. The target bypass network may be a trainable bypass structure, and may be composed of several tuning modules.

每一个相应主干网络的解码层上均对应一个水平调优模块和一个垂直调优模块。水平调优模块和垂直调优模块,可以分别接收来自主干网络中的中间层或解码层输出的特征数据和目标旁路网络中上一层输出的特征数据,可以对两者进行加权求和并输出到主干网络下一层对应的垂直调优模块中,直到主干网络最后一解码层对应的水平调优模块和垂直调优模块的加权求和结果输出给生成式任务网络中进行相应结果的输出,以得到目标数据。Each corresponding decoding layer of the backbone network corresponds to a horizontal tuning module and a vertical tuning module. The horizontal tuning module and the vertical tuning module can respectively receive feature data output from the intermediate layer or decoding layer in the backbone network and feature data output from the previous layer in the target bypass network, and can perform weighted summation on the two and output them to the vertical tuning module corresponding to the next layer of the backbone network, until the weighted summation result of the horizontal tuning module and the vertical tuning module corresponding to the last decoding layer of the backbone network is output to the generative task network for output of the corresponding result to obtain the target data.

可选地,可以将目标旁路模型作为预训练模型中最后一个编码层,则可以将转换后的特征数据作为预训练模型中最后一个解码层的输出数据,基于输出数据,输出层可以对输出数据进行转换,得到目标数据。Optionally, the target bypass model can be used as the last encoding layer in the pre-trained model, and the converted feature data can be used as the output data of the last decoding layer in the pre-trained model. Based on the output data, the output layer can convert the output data to obtain the target data.

可选地,该实施例中目标旁路网络同样也可以作为可控生成式任务(预训练模型)中的一个分支,即,可以为条件输入的编码模块。Optionally, the target bypass network in this embodiment can also be used as a branch in a controllable generative task (pre-training model), that is, it can be an encoding module for conditional input.

可控生成式任务,依据可控的条件数据,进行生成的任务。该实施例中,可控编码部分同样可以使用目标旁路网络作为分支,以达到减少内存消耗的目的。Controllable generative tasks are tasks that are generated based on controllable conditional data. In this embodiment, the controllable encoding part can also use the target bypass network as a branch to achieve the purpose of reducing memory consumption.

举例而言,主干网络可以包含三个编码层、一个中间层和三个解码层。每个解码层上均对应一个水平调优模块和垂直调优模块,水平调优模块和垂直调优模块可以用于接收来自中间层或解码层的输出的特征数据和目标旁路网络中上一层转换的特征数据。与第一解码层对应的第一垂直调优模块可以用于接收主干网络的中间层的特征数据,且对中间层的特征数据进行转换。与第一垂直调优模块关联的第一水平调优模块可以获取第一解码层输出的特征数据,且对第一解码层输出的特征数据进行转换。可以基于目标旁路网络的权重,对第一垂直调优模块调整后的特征数据和第一水平调优模块调整后的特征数据二者之间进行加权求和。For example, the backbone network may include three encoding layers, one intermediate layer, and three decoding layers. Each decoding layer corresponds to a horizontal tuning module and a vertical tuning module, and the horizontal tuning module and the vertical tuning module can be used to receive feature data output from the intermediate layer or the decoding layer and feature data converted from the previous layer in the target bypass network. The first vertical tuning module corresponding to the first decoding layer can be used to receive feature data of the intermediate layer of the backbone network and convert the feature data of the intermediate layer. The first horizontal tuning module associated with the first vertical tuning module can obtain feature data output by the first decoding layer and convert feature data output by the first decoding layer. Based on the weight of the target bypass network, a weighted sum can be performed between the feature data adjusted by the first vertical tuning module and the feature data adjusted by the first horizontal tuning module.

可选地,可以将对第一垂直调优模块调整后的特征数据和第一水平调优模块调整后的特征数据二者之间进行加权求和得到的特征数据输入至与第二解码层对应的第二垂直调优模块中,得到第二垂直调优模块转换后的特征数据,且将第二解码层输出的特征数据输出至第二水平调优模块中,得到对应的第二水平调优模块调整后的特征数据,可以对第二水平调优模块调整后的特征数据和第二垂直调优模块调整后的特征数据进行加权求和。继续将对第二水平调优模块调整后的特征数据和第二垂直调优模块调整后的特征数据进行加权求和得到的特征数据传输至与第三解码层对应的第三垂直 调优模块中,且将第三解码层输出的特征数据传输至与第三解码层对应的第三水平调优模块中,可以对第三水平调优模块调整后的特征数据和第三垂直调优模块调整后的特征数据进行加权求和,从而得到目标旁路网络最终转换得到的特征数据。可以基于预训练模型中的输出层对特征数据进行处理,得到目标数据。Optionally, the feature data obtained by weighted summing the feature data adjusted by the first vertical tuning module and the feature data adjusted by the first horizontal tuning module can be input into the second vertical tuning module corresponding to the second decoding layer to obtain the feature data converted by the second vertical tuning module, and the feature data output by the second decoding layer is output to the second horizontal tuning module to obtain the corresponding feature data adjusted by the second horizontal tuning module, and the feature data adjusted by the second horizontal tuning module and the feature data adjusted by the second vertical tuning module can be weighted summed. The feature data obtained by weighted summing the feature data adjusted by the second horizontal tuning module and the feature data adjusted by the second vertical tuning module is continuously transmitted to the third vertical tuning module corresponding to the third decoding layer. In the tuning module, the feature data output by the third decoding layer is transmitted to the third horizontal tuning module corresponding to the third decoding layer, and the feature data adjusted by the third horizontal tuning module and the feature data adjusted by the third vertical tuning module can be weighted summed to obtain the feature data finally converted by the target bypass network. The feature data can be processed based on the output layer in the pre-trained model to obtain the target data.

在本公开实施例中,借助从初始主干分离的调优模块,构建了一个参数和内存高效的初始旁路网络,且对初始旁路网络中权重进行更新,得到目标旁路网络,该方法相比原有调优方法其能够大大降低内存的消耗和多任务推理的成本,并适用在判别式任务和生成式任务中。In the disclosed embodiment, a parameter- and memory-efficient initial bypass network is constructed with the help of a tuning module separated from the initial backbone, and the weights in the initial bypass network are updated to obtain a target bypass network. Compared with the original tuning method, this method can greatly reduce memory consumption and the cost of multi-task reasoning, and is applicable to both discriminative and generative tasks.

在本公开实施例中,使用了从主干网络中抽离的调优模块构建初始旁路网络,同时,在初始旁路网络参数调整的过程中,将初始旁路网络到主干网络的数据流进行截断,得到与主干网络独立的目标旁路网络(新调优模块),从而使得在初始旁路网络训练过程中,不需要进一步计算主干网络的参数梯度,实现了内存的节省和训练速度的提升,进而达到了减少模型的训练过程中的资源消耗且提高计算效率的技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。In the disclosed embodiment, a tuning module extracted from a backbone network is used to construct an initial bypass network. Meanwhile, during the process of adjusting the parameters of the initial bypass network, the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (new tuning module) independent of the backbone network. This eliminates the need to further calculate the parameter gradient of the backbone network during the training of the initial bypass network, thereby saving memory and improving the training speed, thereby achieving the technical effect of reducing resource consumption and improving computing efficiency during the training process of the model, and solving the technical problems of high resource consumption and computing efficiency during the training process of the model.

需要说明的是,本公开所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),比如,气象预告结果等数据,均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this disclosure, such as weather forecast results and other data, are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本公开所必须的。It should be noted that, for the aforementioned method embodiments, for the sake of simplicity, they are all described as a series of action combinations, but those skilled in the art should be aware that the present disclosure is not limited by the order of the actions described, because according to the present disclosure, certain steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present disclosure.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on such an understanding, the technical solution of the present disclosure, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of each embodiment of the present disclosure.

根据本公开实施例,还提供了一种用于实施上述图2所示的预训练模型中参数的更新方法的预训练模型中参数的更新装置。According to an embodiment of the present disclosure, there is also provided a device for updating parameters in a pre-trained model for implementing the method for updating parameters in a pre-trained model shown in FIG. 2 .

图11是根据本公开实施例的一种预训练模型中参数的更新装置的示意图,如图11所示,该预训练模型中参数的更新装置1100可以包括:第一获取组件1102、第一 调用组件1104和更新组件1106。FIG11 is a schematic diagram of a device for updating parameters in a pre-training model according to an embodiment of the present disclosure. As shown in FIG11 , the device 1100 for updating parameters in the pre-training model may include: a first acquisition component 1102, a first A component 1104 is called and a component 1106 is updated.

第一获取组件1102,设置为获取预训练模型的主干网络输出的特征数据,其中,主干网络来自初始主干网络。The first acquisition component 1102 is configured to acquire feature data output by a backbone network of a pre-trained model, wherein the backbone network comes from an initial backbone network.

第一调用组件1104,设置为调用初始旁路网络对特征数据进行转换,其中,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;A first calling component 1104 is configured to call an initial bypass network to convert feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network;

更新组件1106,设置为基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,其中,在更新初始旁路网络的参数的过程中,初始旁路网络的数据流与主干网络的数据流之间相互独立,初始旁路网络的参数用于表征调优模块对特征数据的影响情况。The update component 1106 is configured to update the parameters of the initial bypass network based on the converted characteristic data to obtain a target bypass network, wherein in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the characteristic data.

此处需要说明的是,上述第一获取组件1102、第一调用组件1104和更新组件1106对应于实施例1中的步骤S202至步骤S206,三个组件与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述组件可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件,上述组件也可以作为装置的一部分可以运行在实施例1提供的计算机终端中。It should be noted that the first acquisition component 1102, the first calling component 1104 and the update component 1106 correspond to steps S202 to S206 in Example 1, and the three components and the corresponding steps implement the same instances and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1. It should be noted that the above-mentioned components can be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned components can also be run in the computer terminal provided in Example 1 as part of the device.

根据本公开实施例,还提供了另一种用于实施上述图3所示的预训练模型的数据处理方法的预训练模型的数据处理装置。According to an embodiment of the present disclosure, another data processing device for a pre-trained model for implementing the data processing method for a pre-trained model shown in FIG. 3 is also provided.

图12是根据本公开实施例的一种预训练模型的数据处理装置的示意图,如图12所示,该预训练模型的数据处理装置1200可以包括:第一处理组件1202、第二处理组件1204、第一转换组件1206、第一生成组件1208和显示组件1210。Figure 12 is a schematic diagram of a data processing device for a pre-trained model according to an embodiment of the present disclosure. As shown in Figure 12, the data processing device 1200 for the pre-trained model may include: a first processing component 1202, a second processing component 1204, a first conversion component 1206, a first generation component 1208 and a display component 1210.

第一处理组件1202,设置为响应生成式交互界面中接收到的询问信息,其中,询问信息至少包括:文本生成信息的关键词。The first processing component 1202 is configured to respond to query information received in the generative interactive interface, wherein the query information at least includes: keywords of the text generation information.

第二处理组件1204,设置为调用预训练模型的主干网络至少对文本生成信息进行分析,输出文本特征数据,其中,主干网络来自初始主干网络。The second processing component 1204 is configured to call a backbone network of a pre-trained model to at least analyze text generation information and output text feature data, wherein the backbone network comes from an initial backbone network.

第一转换组件1206,设置为调用目标旁路网络对文本特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取得到。The first conversion component 1206 is configured to call a target bypass network to convert text feature data, wherein the target bypass network is obtained by updating parameters of an initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from an initial backbone network.

第一生成组件1208,设置为基于转换后的文本特征数据,生成至少一与询问指令匹配的答复结果;A first generating component 1208 is configured to generate at least one response result matching the query instruction based on the converted text feature data;

显示组件1210,设置为将答复结果显示在生成式交互界面中。The display component 1210 is configured to display the reply result in the generative interactive interface.

此处需要说明的是,上述第一处理组件1202、第二处理组件1204、第一转换组件1206、第一生成组件1208和显示组件1210对应于实施例1中的步骤S302至步骤S310,五个组件与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述组件可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件,上述组件也可以作为装置的一部分可以运行在实施例1提供 的计算机终端中。It should be noted that the first processing component 1202, the second processing component 1204, the first conversion component 1206, the first generation component 1208 and the display component 1210 correspond to steps S302 to S310 in Example 1. The five components and the corresponding steps implement the same instances and application scenarios, but are not limited to the contents disclosed in Example 1. It should be noted that the above components can be hardware components or software components stored in a memory and processed by one or more processors. The above components can also be part of the device and can be run on the computer provided in Example 1. in a computer terminal.

根据本公开实施例,还提供了另一种用于实施上述图4所示的预训练模型的数据处理方法的预训练模型的数据处理装置。According to an embodiment of the present disclosure, another data processing device for a pre-trained model for implementing the data processing method for a pre-trained model shown in FIG. 4 is also provided.

图13是根据本公开实施例的另一种预训练模型的数据处理装置的示意图,如图13所示,该预训练模型的数据处理装置1300可以包括:第二获取组件1302、第二转换组件1304和第二生成组件1306。Figure 13 is a schematic diagram of another data processing device for a pre-trained model according to an embodiment of the present disclosure. As shown in Figure 13, the data processing device 1300 for the pre-trained model may include: a second acquisition component 1302, a second conversion component 1304 and a second generation component 1306.

第二获取组件1302,设置为获取预训练模型中主干网络对条件数据进行处理,得到的特征数据,其中,主干网络来自初始主干网络,条件数据用于确定目标数据的生成条件。The second acquisition component 1302 is configured to obtain feature data obtained by processing the conditional data by the backbone network in the pre-trained model, wherein the backbone network comes from the initial backbone network, and the conditional data is used to determine the generation conditions of the target data.

第二转换组件1304,设置为调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;A second conversion component 1304 is configured to call a target bypass network to convert the feature data, wherein the target bypass network is obtained by updating parameters of an initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network;

第二生成组件1306,设置为基于转换后的特征数据,生成与条件数据对应的目标数据,其中,目标数据的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。The second generating component 1306 is configured to generate target data corresponding to the conditional data based on the converted feature data, wherein the type of the target data includes at least one of the following: text information, image information, video information and voice information.

此处需要说明的是,上述第二获取组件1302、第二转换组件1304和第二生成组件1306对应于实施例1中的步骤S402至步骤S406,三个组件与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述组件可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件,上述组件也可以作为装置的一部分可以运行在实施例1提供的计算机终端中。It should be noted that the second acquisition component 1302, the second conversion component 1304 and the second generation component 1306 correspond to steps S402 to S406 in Example 1, and the three components and the corresponding steps implement the same examples and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1. It should be noted that the above-mentioned components can be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned components can also be run in the computer terminal provided in Example 1 as part of the device.

根据本公开实施例,还提供了另一种用于实施上述图5所示的预训练模型的数据处理方法的预训练模型的数据处理装置。According to an embodiment of the present disclosure, another data processing device for a pre-trained model for implementing the data processing method for a pre-trained model shown in FIG. 5 is also provided.

图14是根据本公开实施例的另一种预训练模型的数据处理装置的示意图,如图14所示,该预训练模型的数据处理装置1400可以包括:输入组件1402、第三处理组件1404、第三转换组件1406和第三生成组件1408。Figure 14 is a schematic diagram of another data processing device of a pre-trained model according to an embodiment of the present disclosure. As shown in Figure 14, the data processing device 1400 of the pre-trained model may include: an input component 1402, a third processing component 1404, a third conversion component 1406 and a third generation component 1408.

输入组件1402,设置为在对话界面中输入多模态信息,其中,多模态信息的类型包括如下至少之一:包含字符信息的询问信息、包含帧图像信息的视频帧信息、音频信息。The input component 1402 is configured to input multimodal information in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: query information including character information, video frame information including frame image information, and audio information.

第三处理组件1404,设置为调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据,其中,主干网络来自初始主干网络。The third processing component 1404 is configured to call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network.

第三转换组件1406,设置为调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出。The third conversion component 1406 is configured to call a target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network.

第三生成组件1408,设置为基于转换后的特征数据,生成与多模态信息对应的答 复信息,其中,答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。The third generating component 1408 is configured to generate an answer corresponding to the multimodal information based on the converted feature data. The reply information includes at least one of the following types: text information, image information, video information and voice information.

此处需要说明的是,上述输入组件1402、第三处理组件1404、第三转换组件1406和第三生成组件1408对应于实施例1中的步骤S702至步骤S708,四个组件与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述组件可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件,上述组件也可以作为装置的一部分可以运行在实施例1提供的计算机终端中。It should be noted that the above-mentioned input component 1402, the third processing component 1404, the third conversion component 1406 and the third generation component 1408 correspond to steps S702 to S708 in Example 1, and the four components and the corresponding steps implement the same examples and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1. It should be noted that the above-mentioned components can be hardware components or software components stored in a memory and processed by one or more processors, and the above-mentioned components can also be run in the computer terminal provided in Example 1 as part of the device.

在上述装置中,使用了从主干网络中抽离的调优模块构建初始旁路网络,同时,在初始旁路网络参数调整的过程中,将初始旁路网络到主干网络的数据流进行截断,得到与主干网络相对独立的目标旁路网络(新调优模块),从而使得在初始旁路网络训练过程中,不需要进一步计算主干网络的参数梯度,实现了内存的节省和训练速度的提升,进而达到了减少模型的训练过程中的资源消耗,解决了模型的训练过程资源消耗多、计算效率的技术问题。In the above-mentioned device, a tuning module extracted from the backbone network is used to construct an initial bypass network. At the same time, during the adjustment of the parameters of the initial bypass network, the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (new tuning module) that is relatively independent of the backbone network. As a result, during the training of the initial bypass network, there is no need to further calculate the parameter gradient of the backbone network, thereby achieving memory savings and improved training speed, thereby reducing resource consumption during the training process of the model and solving the technical problems of high resource consumption and computational efficiency in the training process of the model.

本公开的实施例可以提供一种计算机终端,该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地,在本实施例中,上述计算机终端也可以替换为移动终端等终端设备。The embodiment of the present disclosure may provide a computer terminal, which may be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced by a terminal device such as a mobile terminal.

可选地,在本实施例中,上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。Optionally, in this embodiment, the computer terminal may be located in at least one network device among a plurality of network devices of the computer network.

在本实施例中,上述计算机终端可以执行预训练模型中参数的更新方法中以下步骤的程序代码:获取预训练模型的主干网络输出的特征数据,其中,主干网络来自初始主干网络;调用初始旁路网络对特征数据进行转换,其中,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,其中,在更新初始旁路网络的参数的过程中,初始旁路网络的数据流独立于主干网络的数据流,初始旁路网络的参数用于表征调优模块对特征数据的影响情况。In this embodiment, the above-mentioned computer terminal can execute the program code of the following steps in the method for updating parameters in the pre-trained model: obtaining feature data output by the backbone network of the pre-trained model, wherein the backbone network comes from the initial backbone network; calling the initial bypass network to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; updating the parameters of the initial bypass network based on the converted feature data to obtain a target bypass network, wherein, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the feature data.

可选地,图15是根据本公开实施例的一种计算机终端的结构框图。如图15所示,该计算机终端A可以包括:一个或多个(图中仅示出一个)处理器1502、存储器1504以及传输装置1506。Optionally, Fig. 15 is a structural block diagram of a computer terminal according to an embodiment of the present disclosure. As shown in Fig. 15 , the computer terminal A may include: one or more (only one is shown in the figure) processors 1502 , a memory 1504 , and a transmission device 1506 .

其中,存储器可用于存储软件程序以及模块,如本公开实施例中的预训练模型中参数的更新方法和装置对应的程序指令/模块,处理器通过运行存储在存储器内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的预训练模型中参数的更新方法。存储器可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接 至计算机终端A。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。Among them, the memory can be used to store software programs and modules, such as the program instructions/modules corresponding to the method and device for updating parameters in the pre-trained model in the embodiment of the present disclosure. The processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, realizing the method for updating parameters in the pre-trained model mentioned above. The memory may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include a memory remotely located relative to the processor, and these remote memories may be connected via a network. To computer terminal A. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:获取预训练模型的主干网络输出的特征数据,其中,主干网络来自初始主干网络;调用初始旁路网络对特征数据进行转换,其中,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,其中,在更新初始旁路网络的参数的过程中,初始旁路网络的数据流独立于主干网络的数据流,初始旁路网络的参数用于表征调优模块对特征数据的影响情况。The processor can call the information and application stored in the memory through the transmission device to perform the following steps: obtain the feature data output by the backbone network of the pre-trained model, wherein the backbone network comes from the initial backbone network; call the initial bypass network to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; update the parameters of the initial bypass network based on the converted feature data to obtain the target bypass network, wherein in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the feature data.

可选地,上述处理器还可以执行如下步骤的程序代码:调用初始旁路网络中的调优模块对特征数据进行调整;基于初始旁路网络的权重,对调整后的特征数据进行加权求和。Optionally, the processor may also execute program codes of the following steps: calling a tuning module in the initial bypass network to adjust the characteristic data; and performing weighted summation on the adjusted characteristic data based on the weight of the initial bypass network.

可选地,上述处理器还可以执行如下步骤的程序代码:调用初始旁路网络中的调优模块对特征数据进行调整;基于初始旁路网络的权重,对调整后的特征数据进行加权求和。Optionally, the processor may also execute program codes of the following steps: calling a tuning module in the initial bypass network to adjust the characteristic data; and performing weighted summation on the adjusted characteristic data based on the weight of the initial bypass network.

可选地,上述处理器还可以执行如下步骤的程序代码:基于初始旁路网络的权重,对第一垂直调优模块对应的调整后的特征数据和第一水平调优模块对应的调整后的特征数据二者之间进行加权求和,得到第一特征数据。Optionally, the processor may also execute the following program code: based on the weight of the initial bypass network, perform weighted summation of the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module to obtain the first feature data.

可选地,上述处理器还可以执行如下步骤的程序代码:响应于第一层主干网络为主干网络中的最后一层网络,基于第一特征数据更新初始旁路网络的参数,得到目标旁路网络。Optionally, the processor may also execute program code of the following steps: in response to the first-layer backbone network being the last layer of network in the backbone network, updating parameters of the initial bypass network based on the first characteristic data to obtain a target bypass network.

可选地,上述处理器还可以执行如下步骤的程序代码:响应于第一层主干网络非主干网络中的最后一层网络,调用与主干网络中第二层主干网络对应的第二垂直调优模块,对第一特征数据进行调整,且调用与第二垂直调优模块关联的第二水平调优模块,对主干网络中的第二层主干网络的特征数据进行调整;基于初始旁路网络的权重,对第二垂直调优模块对应的调整后的第一特征数据和第二水平调优模块对应的调整后的特征数据二者之间,进行加权求和,得到加权求和后的第二特征数据;响应于第二层主干网络非主干网络中的最后一层网络,执行以下步骤:将第二层主干网络确定为第一层主干网络,且将主干网络中的第三层主干网络确定为第二层主干网络;调用第二垂直调优模块,对第一特征数据进行调整,且调用第二水平调优模块,对第二层主干网络的特征数据进行调整;基于初始旁路网络的权重,对第二垂直调优模块对应的调整后的第一特征数据和第二水平调优模块对应的调整后的特征数据二者之间,进行加权求和,得到加权求和后的第二特征数据,直至第二层主干网络为主干网络中的最后一层网络;基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络, 包括:基于第二特征数据更新初始旁路网络的参数,得到目标旁路网络。Optionally, the processor may also execute the program code of the following steps: in response to the last layer of network in the non-backbone network of the first layer backbone network, calling the second vertical tuning module corresponding to the second layer backbone network in the backbone network to adjust the first feature data, and calling the second horizontal tuning module associated with the second vertical tuning module to adjust the feature data of the second layer backbone network in the backbone network; based on the weight of the initial bypass network, performing weighted summation on the adjusted first feature data corresponding to the second vertical tuning module and the adjusted feature data corresponding to the second horizontal tuning module to obtain the weighted summed second feature data; in response to the last layer of network in the non-backbone network of the second layer backbone network, The following steps are performed: the second-layer backbone network is determined as the first-layer backbone network, and the third-layer backbone network in the backbone network is determined as the second-layer backbone network; the second vertical tuning module is called to adjust the first feature data, and the second horizontal tuning module is called to adjust the feature data of the second-layer backbone network; based on the weight of the initial bypass network, the adjusted first feature data corresponding to the second vertical tuning module and the adjusted feature data corresponding to the second horizontal tuning module are weighted summed to obtain the weighted summed second feature data, until the second-layer backbone network is the last layer of the backbone network; the parameters of the initial bypass network are updated based on the converted feature data to obtain the target bypass network, The method comprises: updating the parameters of the initial bypass network based on the second characteristic data to obtain the target bypass network.

可选地,上述处理器还可以执行如下步骤的程序代码:确定预训练模型中输出层对转换后的特征数据进行处理,得到的处理结果;确定处理结果和处理结果对应的真实处理结果二者之间的差异值;基于差异值调整初始旁路网络的参数,得到目标旁路网络。Optionally, the processor may also execute the program code of the following steps: determining a processing result obtained by processing the converted feature data by the output layer in the pre-trained model; determining a difference value between the processing result and a true processing result corresponding to the processing result; and adjusting the parameters of the initial bypass network based on the difference value to obtain a target bypass network.

作为一种可选的示例,处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:响应生成式交互界面中接收到的询问信息,其中,询问信息至少包括:文本生成信息的关键词;调用预训练模型的主干网络至少对文本生成信息进行分析,输出文本特征数据,其中,主干网络来自初始主干网络;调用目标旁路网络对文本特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取得到;基于转换后的文本特征数据,生成至少一与询问指令匹配的答复结果;将答复结果显示在生成式交互界面中。As an optional example, the processor can call the information and application stored in the memory through the transmission device to perform the following steps: respond to the query information received in the generative interactive interface, wherein the query information at least includes: keywords of the text generation information; call the backbone network of the pre-trained model to at least analyze the text generation information and output text feature data, wherein the backbone network comes from the initial backbone network; call the target bypass network to convert the text feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted text feature data, generate at least one reply result that matches the query instruction; and display the reply result in the generative interactive interface.

可选地,上述处理器还可以执行如下步骤的程序代码:在更新初始旁路网络的参数的过程中,初始旁路网络的数据流独立于主干网络的数据流,初始旁路网络的参数用于表征调优模块对特征数据的影响情况。Optionally, the processor may also execute program code of the following steps: in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the characteristic data.

作为另一种可选的示例,处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:获取预训练模型中主干网络对条件数据进行处理,得到的特征数据,其中,主干网络来自初始主干网络,条件数据用于确定目标数据的生成条件;调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,生成与条件数据对应的目标数据,其中,目标数据的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。As another optional example, the processor may call the information and application stored in the memory through the transmission device to perform the following steps: obtaining feature data obtained by processing the conditional data by the backbone network in the pre-trained model, wherein the backbone network comes from the initial backbone network, and the conditional data is used to determine the generation conditions of the target data; calling the target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, generating target data corresponding to the conditional data, wherein the type of the target data includes at least one of the following: text information, image information, video information and voice information.

可选地,上述处理器还可以执行如下步骤的程序代码:调用与主干网络中第一解码层对应的第一垂直调优模块,对主干网络的中间层输出的文本特征数据进行转换,且调用与第一垂直调优模块关联的第一水平调优模块对第一解码层输出的文本特征数据进行转换。Optionally, the processor may also execute the program code of the following steps: calling the first vertical tuning module corresponding to the first decoding layer in the backbone network to convert the text feature data output by the middle layer of the backbone network, and calling the first horizontal tuning module associated with the first vertical tuning module to convert the text feature data output by the first decoding layer.

可选地,上述处理器还可以执行如下步骤的程序代码:基于目标旁路网络的权重,对第一垂直调优模块调整后的文本特征数据和第一水平调优模块调整后的文本特征数据二者之间进行加权求和,得到转换后的文本特征数据;基于转换后的文本特征数据,确定答复结果。Optionally, the processor may also execute the program code of the following steps: based on the weight of the target bypass network, performing weighted summation on the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module to obtain converted text feature data; and determining a response result based on the converted text feature data.

可选地,上述处理器还可以执行如下步骤的程序代码:将转换后的文本特征数据作为预训练模型中最后一个解码层的输出数据;将输出数据转换为答复结果。Optionally, the processor may also execute the following program code: using the converted text feature data as output data of the last decoding layer in the pre-trained model; and converting the output data into a response result.

作为另一种可选的示例,处理器可以通过传输装置调用存储器存储的信息及应用 程序,以执行下述步骤:在对话界面中输入多模态信息,其中,多模态信息的类型包括如下至少之一:包含字符信息的文本信息、包含帧图像信息的视频帧信息、音频信息;调用预训练模型中的主干网络至少对多模态信息进行分析处理,得到特征数据,其中,主干网络来自初始主干网络;调用目标旁路网络对特征数据进行转换,其中,目标旁路网络为对初始旁路网络的参数进行更新后得到,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据,生成与多模态信息对应的答复信息,其中,答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。As another optional example, the processor can call the information and application stored in the memory through the transmission device. The program is used to execute the following steps: input multimodal information in a dialogue interface, wherein the type of the multimodal information includes at least one of the following: text information containing character information, video frame information containing frame image information, and audio information; call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from the initial backbone network; call the target bypass network to convert the feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; based on the converted feature data, generate reply information corresponding to the multimodal information, wherein the type of the reply information includes at least one of the following: text information, image information, video information, and voice information.

可选地,上述处理器还可以执行如下步骤的程序代码:调用目标旁路网络中的调优模块对特征数据进行调整;基于目标旁路网络的权重,对调整后的特征数据进行加权求和。Optionally, the processor may also execute program codes of the following steps: calling a tuning module in the target bypass network to adjust the characteristic data; and performing weighted summation on the adjusted characteristic data based on the weight of the target bypass network.

可选地,上述处理器还可以执行如下步骤的程序代码:基于预训练模型中输出层对转换后的特征数据进行处理,得到与多模态信息对应的答复信息。Optionally, the processor may also execute the following program code: processing the converted feature data based on the output layer in the pre-trained model to obtain response information corresponding to the multimodal information.

采用本公开实施例,使用了从主干网络中抽离的调优模块构建初始旁路网络,同时,在初始旁路网络参数调整的过程中,将初始旁路网络到主干网络的数据流进行截断,得到与主干网络独立的目标旁路网络(新调优模块),从而使得在初始旁路网络训练过程中,不需要进一步计算主干网络的参数梯度,实现了内存的节省和训练速度的提升,进而达到了减少模型的训练过程中的资源消耗的技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。By adopting the embodiment of the present disclosure, a tuning module extracted from the backbone network is used to construct an initial bypass network. At the same time, during the process of adjusting the parameters of the initial bypass network, the data flow from the initial bypass network to the backbone network is truncated to obtain a target bypass network (new tuning module) independent of the backbone network. As a result, during the training of the initial bypass network, there is no need to further calculate the parameter gradient of the backbone network, thereby achieving memory savings and improved training speed, thereby achieving the technical effect of reducing resource consumption during the training process of the model, and solving the technical problems of high resource consumption and computing efficiency during the training process of the model.

本领域普通技术人员可以理解,图15所示的结构仅为示意,计算机终端A也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图15其并不对上述计算机终端A的结构造成限定。例如,计算机终端A还可包括比图15中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图15所示不同的配置。Those skilled in the art will appreciate that the structure shown in FIG. 15 is for illustration only, and the computer terminal A may also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a PDA, a mobile Internet device (MID), a PAD, and other terminal devices. FIG. 15 does not limit the structure of the above-mentioned computer terminal A. For example, the computer terminal A may also include more or fewer components (such as a network interface, a display device, etc.) than those shown in FIG. 15 , or may have a configuration different from that shown in FIG. 15 .

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。A person of ordinary skill in the art may understand that all or part of the steps in the various methods of the above embodiments may be completed by instructing the hardware related to the terminal device through a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: a flash drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, etc.

本公开的实施例还提供了一种计算机可读存储介质。可选地,在本实施例中,上述计算机可读存储介质可以用于保存上述实施例一所提供预训练模型中参数的更新方法所执行的程序代码。The embodiment of the present disclosure further provides a computer-readable storage medium. Optionally, in this embodiment, the computer-readable storage medium can be used to store the program code executed by the method for updating parameters in the pre-trained model provided in the first embodiment.

可选地,在本实施例中,上述计算机可读存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。Optionally, in this embodiment, the computer-readable storage medium may be located in any one of the computer terminals in a computer terminal group in a computer network, or in any one of the mobile terminals in a mobile terminal group.

可选地,在本实施例中,计算机可读存储介质被设置为存储用于执行上述处理器 可以通过传输装置调用存储器存储的信息及应用程序所执行的程序代码。Optionally, in this embodiment, the computer readable storage medium is configured to store a computer readable storage medium for executing the above-mentioned processor. The information stored in the memory and the program code executed by the application program can be called up through the transmission device.

在本公开实施例中,使用了从主干网络中抽离的调优模块构建初始旁路网络,同时,在初始旁路网络参数调整的过程中,将初始旁路网络到主干网络的数据流进行截断,得到与主干网络独立的目标旁路网络(新调优模块),从而使得在初始旁路网络训练过程中,不需要进一步计算主干网络的参数梯度,实现了内存的节省和训练速度的提升,进而达到了减少模型的训练过程中的资源消耗的技术效果,解决了模型的训练过程资源消耗多、计算效率的技术问题。In the disclosed embodiment, a tuning module extracted from a backbone network is used to construct an initial bypass network. At the same time, during the process of adjusting the parameters of the initial bypass network, the data flow from the initial bypass network to the backbone network is cut off to obtain a target bypass network (new tuning module) independent of the backbone network. As a result, during the training of the initial bypass network, there is no need to further calculate the parameter gradient of the backbone network, thereby achieving memory savings and improved training speed, thereby achieving the technical effect of reducing resource consumption during the training process of the model, and solving the technical problems of high resource consumption and computational efficiency in the training process of the model.

本公开的实施例可以提供一种电子设备,该电子设备可以包括存储器和处理器。An embodiment of the present disclosure may provide an electronic device, which may include a memory and a processor.

图16是根据本公开实施例的一种预训练模型中参数的更新方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。Figure 16 is a block diagram of an electronic device according to a method for updating parameters in a pre-trained model according to an embodiment of the present disclosure. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present disclosure described and/or required herein.

如图16所示,设备1600包括计算单元1601,其可以根据存储在只读存储器(Read Only Memory,简称为ROM)1602中的计算机程序或者从存储单元1608加载到随机访问存储器(Random Access Memory,简称为RAM)1603中的计算机程序,来执行各种适当的动作和处理。在RAM1603中,还可存储设备1600操作所需的各种程序和数据。计算单元1601、ROM1602以及RAM1603通过总线1604彼此相连。输入/输出(Input/Output,简称为I/O)接口1605也连接至总线1604。As shown in FIG. 16 , the device 1600 includes a computing unit 1601, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 1602 or a computer program loaded from a storage unit 1608 to a random access memory (RAM) 1603. In the RAM 1603, various programs and data required for the operation of the device 1600 can also be stored. The computing unit 1601, the ROM 1602, and the RAM 1603 are connected to each other via a bus 1604. An input/output (I/O) interface 1605 is also connected to the bus 1604.

设备1600中的多个部件连接至I/O接口1605,包括:输入单元1606,例如键盘、鼠标等;输出单元1604,例如各种类型的显示器、扬声器等;存储单元1608,例如磁盘、光盘等;以及通信单元1609,例如网卡、调制解调器、无线通信收发机等。通信单元1609允许设备1600通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。A number of components in the device 1600 are connected to the I/O interface 1605, including: an input unit 1606, such as a keyboard, a mouse, etc.; an output unit 1604, such as various types of displays, speakers, etc.; a storage unit 1608, such as a disk, an optical disk, etc.; and a communication unit 1609, such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 1609 allows the device 1600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元1601可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1601的一些示例包括但不限于中央处理单元(Central Processing Unit,简称为CPU)、图形处理单元(Graphic Processing Unit,简称为GPU)、各种专用的人工智能(Artificial Intelligence,简称为AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(Demand Side Platform,简称为DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1601执行上文所描述的各个方法和处理,例如预训练模型中参数的更新方法。例如,在一些实施例中,预训练模型中参数的更新方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元1608。在一些实施例中,计算机程序的部分或者全部可以经由ROM 1602和/或通信单 元16010而被载入和/或安装到设备1600上。当计算机程序加载到RAM 1603并由计算单元1601执行时,可以执行上文描述的预训练模型中参数的更新方法的一个或多个步骤。备选地,在其他实施例中,计算单元1601可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行预训练模型中参数的更新方法。The computing unit 1601 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the computing unit 1601 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller, microcontroller, etc. The computing unit 1601 performs the various methods and processes described above, such as a method for updating parameters in a pre-trained model. For example, in some embodiments, the method for updating parameters in a pre-trained model may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as a storage unit 1608. In some embodiments, part or all of the computer program may be transmitted to a computer system via ROM 1602 and/or a communication unit. The computer program is loaded and/or installed on the device 1600 by the computing unit 16010. When the computer program is loaded into the RAM 1603 and executed by the computing unit 1601, one or more steps of the method for updating the parameters in the pre-trained model described above may be performed. Alternatively, in other embodiments, the computing unit 1601 may be configured to execute the method for updating the parameters in the pre-trained model by any other appropriate means (e.g., by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(Field Programmable Gate Array,简称为FPGA)、专用集成电路(Application Specific Integrated Circuit,简称为ASIC)、专用标准产品(Application Specific Standard Parts,简称为ASSP)、芯片上系统的系统(System On Chip,简称为SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,简称为CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置和该至少一个输出装置。Various embodiments of the systems and techniques described above herein may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard parts (ASSPs), system on chip systems (SOCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs that are executable and/or interpreted on a programmable system including at least one programmable processor that may be a special purpose or general purpose programmable processor that may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或全部在远程机器或服务器上执行。The program code for implementing the method of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that the program code, when executed by the processor or controller, implements the functions/operations specified in the flow chart and/or block diagram. The program code may be executed on the machine, partially on the machine, partially on the machine as a stand-alone software package and partially on a remote machine, or entirely on a remote machine or server.

在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器、只读存储器、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or equipment, or any suitable combination of the foregoing. A more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory, a read-only memory, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,阴极射线管或者液晶显示器、监测器;以及键盘和指向装置(例如,鼠标或者路径球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈); 并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having: a display device (e.g., a cathode ray tube or liquid crystal display, monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or path ball) through which the user can provide input to the computer. Other types of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); And the input from the user can be received in any form (including acoustic input, voice input or tactile input).

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网、广域网和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such back-end components, middleware components, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area networks, wide area networks, and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器。A computer system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The relationship of client and server is generated by computer programs running on respective computers and having a client-server relationship with each other. The server may be a cloud server, a server of a distributed system, or a server combined with a blockchain.

需要说明的是,上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。It should be noted that the serial numbers of the above-mentioned embodiments of the present disclosure are only for description and do not represent the advantages or disadvantages of the embodiments.

在本公开的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments of the present disclosure, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

在本公开所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in the present disclosure, it should be understood that the disclosed technical content can be implemented in other ways. Among them, the device embodiments described above are only schematic, for example, the division of units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of units or modules, which can be electrical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例方法 的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器随机存取存储器、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present disclosure, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server or network device, etc.) to execute the methods of each embodiment of the present disclosure. The aforementioned storage medium includes: a USB flash drive, a read-only memory, a random access memory, a mobile hard disk, a magnetic disk or an optical disk, and other media that can store program codes.

以上仅是本公开的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本公开原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本公开的保护范围。The above are only preferred embodiments of the present disclosure. It should be pointed out that, for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principles of the present disclosure. These improvements and modifications should also be regarded as the protection scope of the present disclosure.

工业实用性Industrial Applicability

本公开实施例提供的方案可以应用于模型训练的过程中,获取预训练模型的主干网络输出的特征数据,其中,主干网络来自初始主干网络;调用初始旁路网络对特征数据进行转换,其中,初始旁路网络为基于至少一调优模块构建得到,调优模块为从初始主干网络中提取出;基于转换后的特征数据更新初始旁路网络的参数,得到目标旁路网络,其中,在更新初始旁路网络的参数的过程中,初始旁路网络的数据流独立于主干网络的数据流,初始旁路网络的参数用于表征调优模块对特征数据的影响情况,进而解决了模型的训练过程资源消耗多、计算效率的技术问题。 The solution provided by the embodiment of the present disclosure can be applied to the process of model training to obtain feature data output by a backbone network of a pre-trained model, wherein the backbone network comes from an initial backbone network; call an initial bypass network to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; update the parameters of the initial bypass network based on the converted feature data to obtain a target bypass network, wherein, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the feature data, thereby solving the technical problems of high resource consumption and computational efficiency in the model training process.

Claims (20)

一种预训练模型中参数的更新方法,包括:A method for updating parameters in a pre-trained model, comprising: 获取预训练模型的主干网络输出的特征数据,其中,所述主干网络来自初始主干网络;Acquire feature data output by a backbone network of a pre-trained model, wherein the backbone network comes from an initial backbone network; 调用初始旁路网络对所述特征数据进行转换,其中,所述初始旁路网络为基于至少一调优模块构建得到,所述调优模块为从所述初始主干网络中提取出;Calling an initial bypass network to convert the feature data, wherein the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial trunk network; 基于转换后的所述特征数据更新所述初始旁路网络的参数,得到目标旁路网络,其中,在更新所述初始旁路网络的参数的过程中,所述初始旁路网络的数据流独立于所述主干网络的数据流,所述初始旁路网络的参数用于表征所述调优模块对所述特征数据的影响情况。Based on the converted characteristic data, the parameters of the initial bypass network are updated to obtain a target bypass network, wherein, in the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the trunk network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the characteristic data. 根据权利要求1所述的方法,其中,调用所述初始旁路网络对所述特征数据进行转换,包括:The method according to claim 1, wherein calling the initial bypass network to convert the feature data comprises: 调用所述初始旁路网络中的所述调优模块对所述特征数据进行调整;Calling the tuning module in the initial bypass network to adjust the characteristic data; 基于所述初始旁路网络的权重,对调整后的所述特征数据进行加权求和。Based on the weight of the initial bypass network, weighted summation is performed on the adjusted feature data. 根据权利要求2所述的方法,其中,调用所述初始旁路网络中的所述调优模块对所述特征数据进行调整,包括:The method according to claim 2, wherein calling the tuning module in the initial bypass network to adjust the characteristic data comprises: 调用与所述主干网络中第一层主干网络对应的第一垂直调优模块,对所述主干网络中第零层主干网络输出的特征数据进行调整,且调用与所述第一垂直调优模块关联的第一水平调优模块对所述第一层主干网络输出的特征数据进行调整。The first vertical tuning module corresponding to the first layer of the backbone network in the backbone network is called to adjust the characteristic data output by the zeroth layer of the backbone network in the backbone network, and the first horizontal tuning module associated with the first vertical tuning module is called to adjust the characteristic data output by the first layer of the backbone network. 根据权利要求3所述的方法,其中,基于所述初始旁路网络的权重,对调整后的所述特征数据进行加权求和,包括:The method according to claim 3, wherein the weighted summing of the adjusted characteristic data based on the weight of the initial bypass network comprises: 基于所述初始旁路网络的权重,对所述第一垂直调优模块对应的调整后的所述特征数据,以及所述第一水平调优模块对应的调整后的所述特征数据二者之间进行加权求和,得到第一特征数据。Based on the weight of the initial bypass network, a weighted sum is performed on the adjusted feature data corresponding to the first vertical tuning module and the adjusted feature data corresponding to the first horizontal tuning module to obtain first feature data. 根据权利要求4所述的方法,其中,基于转换后的所述特征数据更新所述初始旁路网络的参数,得到所述目标旁路网络,包括:The method according to claim 4, wherein updating the parameters of the initial bypass network based on the converted characteristic data to obtain the target bypass network comprises: 响应于所述第一层主干网络为所述主干网络中的最后一层网络,基于所述第一特征数据更新所述初始旁路网络的参数,得到所述目标旁路网络。In response to the first layer of backbone network being the last layer of network in the backbone network, the parameters of the initial bypass network are updated based on the first characteristic data to obtain the target bypass network. 根据权利要求4所述的方法,其中,基于转换后的所述特征数据更新所述初始旁路网络的参数,得到所述目标旁路网络,包括:The method according to claim 4, wherein updating the parameters of the initial bypass network based on the converted characteristic data to obtain the target bypass network comprises: 响应于所述第一层主干网络非所述主干网络中的最后一层网络,调用与所述主干网络中第二层主干网络对应的第二垂直调优模块,对所述第一特征数据进行调整,且调用与所述第二垂直调优模块关联的第二水平调优模块,对所述主干网 络中的所述第二层主干网络的特征数据进行调整;In response to the first layer of backbone network not being the last layer of network in the backbone network, calling a second vertical tuning module corresponding to a second layer of backbone network in the backbone network to adjust the first feature data, and calling a second horizontal tuning module associated with the second vertical tuning module to adjust the backbone network Adjusting the characteristic data of the second layer backbone network in the network; 基于所述初始旁路网络的权重,对所述第二垂直调优模块对应的调整后的所述第一特征数据和所述第二水平调优模块对应的调整后的所述特征数据二者之间,进行加权求和,得到第二特征数据;Based on the weight of the initial bypass network, performing a weighted summation between the adjusted first feature data corresponding to the second vertical tuning module and the adjusted feature data corresponding to the second horizontal tuning module to obtain second feature data; 响应于所述第二层主干网络非所述主干网络中的最后一层网络,执行以下步骤:In response to the second layer backbone network being not the last layer network in the backbone network, performing the following steps: 将所述第二层主干网络确定为所述第一层主干网络,且将所述主干网络中的第三层主干网络确定为所述第二层主干网络;Determine the second-layer backbone network as the first-layer backbone network, and determine the third-layer backbone network in the backbone network as the second-layer backbone network; 调用所述第二垂直调优模块,对所述第一特征数据进行调整,且调用所述第二水平调优模块,对所述第二层主干网络的特征数据进行调整;Calling the second vertical tuning module to adjust the first characteristic data, and calling the second horizontal tuning module to adjust the characteristic data of the second layer backbone network; 基于所述初始旁路网络的权重,对所述第二垂直调优模块对应的调整后的所述第一特征数据和所述第二水平调优模块对应的调整后的所述特征数据二者之间,进行加权求和,得到所述第二特征数据,直至所述第二层主干网络为所述主干网络中的最后一层网络;Based on the weight of the initial bypass network, weighted sum is performed between the adjusted first feature data corresponding to the second vertical tuning module and the adjusted feature data corresponding to the second horizontal tuning module to obtain the second feature data, until the second-layer backbone network is the last layer of the backbone network; 基于转换后的所述特征数据更新所述初始旁路网络的参数,得到所述目标旁路网络,包括:基于所述第二特征数据更新所述初始旁路网络的参数,得到所述目标旁路网络。Updating the parameters of the initial bypass network based on the converted characteristic data to obtain the target bypass network includes: updating the parameters of the initial bypass network based on the second characteristic data to obtain the target bypass network. 根据权利要求1所述的方法,其中,基于转换后的所述特征数据更新所述初始旁路网络的参数,得到所述目标旁路网络,包括:The method according to claim 1, wherein updating the parameters of the initial bypass network based on the converted characteristic data to obtain the target bypass network comprises: 确定所述预训练模型中输出层对转换后的所述特征数据进行处理,而得到的处理结果;Determine a processing result obtained by processing the converted feature data by an output layer in the pre-trained model; 确定所述处理结果和所述处理结果对应的真实处理结果二者之间的差异结果;Determine a difference result between the processing result and a true processing result corresponding to the processing result; 基于所述差异值调整所述初始旁路网络的参数,得到所述目标旁路网络。The parameters of the initial bypass network are adjusted based on the difference value to obtain the target bypass network. 根据权利要求1所述的方法,其中,所述主干网络和所述至少一调优模块之间基于残差进行连接。The method according to claim 1, wherein the backbone network and the at least one tuning module are connected based on residual. 根据权利要求1所述的方法,其中,在将所述特征数据传输至初始旁路网络的过程中,所述主干网络的参数的状态为锁定状态。The method according to claim 1, wherein, during the process of transmitting the characteristic data to the initial bypass network, the state of the parameters of the backbone network is a locked state. 一种预训练模型中数据的处理方法,包括:A method for processing data in a pre-training model, comprising: 响应生成式交互界面中接收到的询问信息,其中,所述询问信息至少包括:Respond to the query information received in the generative interactive interface, wherein the query information at least includes: 文本生成信息的关键词;Keywords for text-generated information; 调用预训练模型的主干网络至少对所述文本生成信息进行分析,输出文本特征数据,其中,所述主干网络来自初始主干网络;Calling a backbone network of a pre-trained model to at least analyze the text generation information and output text feature data, wherein the backbone network comes from an initial backbone network; 调用目标旁路网络对所述文本特征数据进行转换,其中,所述目标旁路网络为对初始旁路网络的参数进行更新后得到,所述初始旁路网络为基于至少一调优 模块构建得到,所述调优模块为从所述初始主干网络中提取得到;The target bypass network is called to convert the text feature data, wherein the target bypass network is obtained by updating the parameters of the initial bypass network, and the initial bypass network is based on at least one tuning The module is constructed, and the tuning module is extracted from the initial backbone network; 基于转换后的所述文本特征数据,生成至少一与所述询问指令匹配的答复结果;Based on the converted text feature data, generating at least one reply result matching the query instruction; 将所述答复结果显示在所述生成式交互界面中。The response result is displayed in the generative interactive interface. 根据权利要求10所述的方法,其中,The method according to claim 10, wherein 在更新所述初始旁路网络的参数的过程中,所述初始旁路网络的数据流独立于所述主干网络的数据流,所述初始旁路网络的参数用于表征所述调优模块对所述文本特征数据的影响情况。In the process of updating the parameters of the initial bypass network, the data flow of the initial bypass network is independent of the data flow of the backbone network, and the parameters of the initial bypass network are used to characterize the influence of the tuning module on the text feature data. 根据权利要求11所述的方法,其中,调用所述目标旁路网络对所述文本特征数据进行转换,包括:The method according to claim 11, wherein calling the target bypass network to convert the text feature data comprises: 调用与所述主干网络中第一解码层对应的第一垂直调优模块,对所述主干网络的中间层输出的文本特征数据进行转换,且调用与所述第一垂直调优模块关联的第一水平调优模块对所述第一解码层输出的文本特征数据进行转换。The first vertical tuning module corresponding to the first decoding layer in the backbone network is called to convert the text feature data output by the middle layer of the backbone network, and the first horizontal tuning module associated with the first vertical tuning module is called to convert the text feature data output by the first decoding layer. 根据权利要求12所述的方法,其中,基于转换后的所述文本特征数据,生成至少一与所述询问指令匹配的所述答复结果,包括:The method according to claim 12, wherein generating at least one reply result matching the query instruction based on the converted text feature data comprises: 基于所述目标旁路网络的权重,对所述第一垂直调优模块调整后的所述文本特征数据和所述第一水平调优模块调整后的所述文本特征数据二者之间进行加权求和,得到加权求和后的文本特征数据;Based on the weight of the target bypass network, weighted summing is performed on the text feature data adjusted by the first vertical tuning module and the text feature data adjusted by the first horizontal tuning module to obtain text feature data after weighted summing; 基于转换后的所述文本特征数据,确定所述答复结果。The reply result is determined based on the converted text feature data. 根据权利要求11所述的方法,其中,基于转换后的所述文本特征数据,确定所述答复结果,包括:The method according to claim 11, wherein determining the reply result based on the converted text feature data comprises: 将转换后的所述文本特征数据作为所述预训练模型中最后一个解码层的输出数据;Using the converted text feature data as output data of the last decoding layer in the pre-training model; 将所述输出数据转换为所述答复结果。The output data is converted into the response result. 一种预训练模型的数据处理方法,包括:A data processing method for a pre-training model, comprising: 在对话界面中输入多模态信息,其中,所述多模态信息的类型包括如下至少之一:包含字符信息的文本信息、包含帧图像信息的视频帧信息和音频信息;Inputting multimodal information in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: text information including character information, video frame information including frame image information, and audio information; 调用预训练模型中的主干网络至少对所述多模态信息进行分析处理,得到特征数据,其中,所述主干网络来自初始主干网络;Calling a backbone network in a pre-trained model to at least analyze and process the multimodal information to obtain feature data, wherein the backbone network comes from an initial backbone network; 调用目标旁路网络对所述特征数据进行转换,其中,所述目标旁路网络为对初始旁路网络的参数进行更新后得到,所述初始旁路网络为基于至少一调优模块构建得到,所述调优模块为从所述初始主干网络中提取出;Calling a target bypass network to convert the characteristic data, wherein the target bypass network is obtained by updating parameters of an initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network; 基于转换后的所述特征数据,生成与所述多模态信息对应的答复信息,其中,所述答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音 信息。Based on the converted feature data, a reply message corresponding to the multimodal information is generated, wherein the type of the reply message includes at least one of the following: text information, image information, video information and voice information. information. 根据权利要求15所述的方法,其中,调用所述目标旁路网络对所述特征数据进行转换,包括:The method according to claim 15, wherein calling the target bypass network to convert the feature data comprises: 调用所述目标旁路网络中的所述调优模块对所述特征数据进行调整;Calling the tuning module in the target bypass network to adjust the characteristic data; 基于所述目标旁路网络的权重,对调整后的所述特征数据进行加权求和。Based on the weight of the target bypass network, a weighted sum is performed on the adjusted characteristic data. 根据权利要求15所述的方法,其中,基于转换后的特征数据,生成与所述多模态信息对应的答复信息,包括:The method according to claim 15, wherein generating reply information corresponding to the multimodal information based on the converted feature data comprises: 基于所述预训练模型中输出层对转换后的所述特征数据进行处理,得到与所述多模态信息对应的所述答复信息。The converted feature data is processed based on the output layer in the pre-trained model to obtain the response information corresponding to the multimodal information. 一种预训练模型的数据处理系统,包括:A data processing system for a pre-trained model, comprising: 客户端,设置为显示对话界面,并捕获在所述对话界面中输入的多模态信息,其中,所述多模态信息的类型包括如下至少之一:包含字符信息的询问信息、包含帧图像信息的视频帧信息、音频信息;The client is configured to display a dialogue interface and capture multimodal information input in the dialogue interface, wherein the type of the multimodal information includes at least one of the following: inquiry information including character information, video frame information including frame image information, and audio information; 服务端,设置为调用预训练模型中的主干网络至少对所述多模态信息进行分析处理,得到特征数据,且调用目标旁路网络对所述特征数据进行转换,基于转换后的所述特征数据,生成与所述多模态信息对应的答复信息,其中,所述主干网络来自初始主干网络,所述目标旁路网络为对初始旁路网络的参数进行更新后得到,所述初始旁路网络为基于至少一调优模块构建得到,所述调优模块为从所述初始主干网络中提取出,所述答复信息的类型包括如下至少之一:文本信息、图像信息、视频信息和语音信息。The server is configured to call the backbone network in the pre-trained model to at least analyze and process the multimodal information to obtain feature data, and call the target bypass network to convert the feature data, and generate reply information corresponding to the multimodal information based on the converted feature data, wherein the backbone network comes from the initial backbone network, the target bypass network is obtained by updating the parameters of the initial bypass network, the initial bypass network is constructed based on at least one tuning module, and the tuning module is extracted from the initial backbone network, and the type of the reply information includes at least one of the following: text information, image information, video information and voice information. 一种电子设备,包括:存储器和处理器;所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令,该计算机可执行指令被处理器执行时实现权利要求1至17中任意一项所述方法的步骤。An electronic device comprises: a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions, which, when executed by the processor, implement the steps of the method described in any one of claims 1 to 17. 一种计算机程序产品,包括非易失性计算机可读存储介质,所述非易失性计算机可读存储介质存储计算机程序,所述计算机程序被处理器执行时实现权利要求1至17中任意一项所述的方法。 A computer program product comprises a non-volatile computer-readable storage medium, wherein the non-volatile computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 17 is implemented.
PCT/CN2024/104765 2023-08-09 2024-07-10 Method for updating parameters of pre-trained model and data processing method for pre-trained model Pending WO2025031095A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202311003358.5 2023-08-09
CN202311003358.5A CN119473568A (en) 2023-08-09 2023-08-09 Updating parameters in pre-trained models and data processing methods for pre-trained models

Publications (1)

Publication Number Publication Date
WO2025031095A1 true WO2025031095A1 (en) 2025-02-13

Family

ID=94533538

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/104765 Pending WO2025031095A1 (en) 2023-08-09 2024-07-10 Method for updating parameters of pre-trained model and data processing method for pre-trained model

Country Status (2)

Country Link
CN (1) CN119473568A (en)
WO (1) WO2025031095A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020172974A1 (en) * 2019-02-25 2020-09-03 中国科学院自动化研究所 Artificial neural network optimization method and system based on orthogonal projection matrix, and apparatuses
CN114821247A (en) * 2022-06-30 2022-07-29 杭州闪马智擎科技有限公司 Model training method and device, storage medium and electronic device
CN114973285A (en) * 2022-05-26 2022-08-30 中国平安人寿保险股份有限公司 Image processing method and apparatus, device, and medium
CN115294396A (en) * 2022-08-12 2022-11-04 北京百度网讯科技有限公司 Backbone network training method and image classification method
CN115294961A (en) * 2022-07-29 2022-11-04 平安科技(深圳)有限公司 Voice synthesis method and device, electronic equipment and storage medium
CN116306791A (en) * 2023-03-22 2023-06-23 北京龙智数科科技服务有限公司 Text processing method and device for improving self-attention model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020172974A1 (en) * 2019-02-25 2020-09-03 中国科学院自动化研究所 Artificial neural network optimization method and system based on orthogonal projection matrix, and apparatuses
CN114973285A (en) * 2022-05-26 2022-08-30 中国平安人寿保险股份有限公司 Image processing method and apparatus, device, and medium
CN114821247A (en) * 2022-06-30 2022-07-29 杭州闪马智擎科技有限公司 Model training method and device, storage medium and electronic device
CN115294961A (en) * 2022-07-29 2022-11-04 平安科技(深圳)有限公司 Voice synthesis method and device, electronic equipment and storage medium
CN115294396A (en) * 2022-08-12 2022-11-04 北京百度网讯科技有限公司 Backbone network training method and image classification method
CN116306791A (en) * 2023-03-22 2023-06-23 北京龙智数科科技服务有限公司 Text processing method and device for improving self-attention model

Also Published As

Publication number Publication date
CN119473568A (en) 2025-02-18

Similar Documents

Publication Publication Date Title
US12182507B2 (en) Text processing model training method, and text processing method and apparatus
CN113762322B (en) Video classification method, device and equipment based on multi-modal representation and storage medium
Liang et al. Generative AI-driven semantic communication networks: Architecture, technologies and applications
JP7429734B2 (en) Multimodal data federated learning model training method and device
CN110263324B (en) Text processing method, model training method and device
CN117992800B (en) Image-text data matching detection method, device, equipment and medium
US20230244879A1 (en) Language representation model system, pre-training method and apparatus, device, and medium
CN110795549B (en) Short text conversation method, device, equipment and storage medium
CN113674732B (en) Voice confidence detection method and device, electronic equipment and storage medium
CN115223020A (en) Image processing method, image processing device, electronic equipment and readable storage medium
CN113705315A (en) Video processing method, device, equipment and storage medium
WO2025241750A1 (en) Question and answer method and apparatus based on large model, device and storage medium
CN114399646A (en) Image description method and device based on Transformer structure
CN115908991A (en) Image description model method, system, device and medium based on feature fusion
CN113360683A (en) Method for training cross-modal retrieval model and cross-modal retrieval method and device
US20250299072A1 (en) Data processing method and apparatus, device, and readable storage medium
WO2025031067A1 (en) Image processing method and apparatus, device, and computer readable storage medium
CN117351387B (en) Video conversation and model training method, device, equipment and storage medium
CN118781234A (en) Image data processing method and model training method
CN113139608B (en) Feature fusion method and device based on multi-task learning
CN119580034A (en) Training method for generating picture description model, picture description generation method, device, equipment, medium and program product
WO2025031095A1 (en) Method for updating parameters of pre-trained model and data processing method for pre-trained model
Ma et al. An LLM-based Intelligent System for the Evaluation of Property Geographical Environment
CN116521832A (en) Dialogue interaction method, device and system, electronic equipment and storage medium
CN116975654B (en) Object interaction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24850746

Country of ref document: EP

Kind code of ref document: A1