[go: up one dir, main page]

WO2025195159A1 - Data processing method and related apparatus - Google Patents

Data processing method and related apparatus

Info

Publication number
WO2025195159A1
WO2025195159A1 PCT/CN2025/080397 CN2025080397W WO2025195159A1 WO 2025195159 A1 WO2025195159 A1 WO 2025195159A1 CN 2025080397 W CN2025080397 W CN 2025080397W WO 2025195159 A1 WO2025195159 A1 WO 2025195159A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
model
data processing
feature information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2025/080397
Other languages
French (fr)
Chinese (zh)
Inventor
黄晨宇
陈鹏
杨晓峰
黄丹青
张凡
饶华铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of WO2025195159A1 publication Critical patent/WO2025195159A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/43Assembling or disassembling of packets, e.g. segmentation and reassembly [SAR]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of artificial intelligence technology, and in particular to data security technology.
  • a data provider can provide data for model input to a model provider.
  • the model provider uses the model to generate output based on the input data and returns the output to the data provider for use.
  • the data provider can split the data into multiple data shards, retain some of the data shards for itself, and send other data shards to the model provider. Both parties input the data shards they own into the model to obtain corresponding output results. Finally, the output results obtained by both parties are spliced and restored to obtain the output results corresponding to the complete input data. In the above process, since the model provider cannot obtain the complete input data, the security of the data provided by the data provider is guaranteed.
  • the present application provides a data processing method that can, in the model application scenario, while ensuring the data processing accuracy, simultaneously ensure the data security of both the data provider and the model provider, reduce the amount of computation in the data processing process, and improve data processing efficiency, so that it can be applied to a wider range of data processing scenarios.
  • an embodiment of the present application discloses a data processing method, which is performed by a first device, the first device including a first model, and the method includes:
  • the second data characteristic information is sent to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information through a second model, the first model and the second model are used to constitute a data processing model, and the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.
  • an embodiment of the present application discloses a data processing method, which is executed by a second device, the second device including a second model, and the method includes:
  • Second data feature information sent by the first device, where the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device based on the first data using a first model;
  • a first data processing result is generated according to the second data feature information.
  • the first model and the second model are used to constitute a data processing model.
  • the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.
  • the first generating unit is configured to generate first data feature information according to the first data using the first model, where the first data feature information is used to represent data features of the first data;
  • the first adding unit is configured to add noise information to the first data feature information to obtain second data feature information
  • the first sending unit is used to send the second data characteristic information to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information through a second model.
  • the first model and the second model are used to constitute a data processing model, and the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.
  • an embodiment of the present application discloses a data processing device, comprising a third acquisition unit and a third generation unit:
  • the third acquiring unit being configured to acquire second data feature information sent by the first device, where the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device based on the first data using the first model;
  • the third generation unit is used to generate a first data processing result based on the second data feature information through the second model.
  • the first model and the second model are used to constitute a data processing model.
  • the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.
  • an embodiment of the present application discloses a computer device, comprising a processor and a memory:
  • the memory is used to store a computer program and transmit the computer program to the processor
  • the processor is configured to execute the data processing method of the first aspect or the data processing method of the second aspect according to instructions in the computer program;
  • an embodiment of the present application discloses a computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, wherein the computer program is used to execute the data processing method described in the first aspect, or execute the data processing method described in the second aspect;
  • an embodiment of the present application discloses a computer program product including a computer program, which, when running on a computer device, enables the computer device to execute the data processing method described in the first aspect, or execute the data processing method described in the second aspect.
  • the present application can only place the first model close to the input side of the data processing model into the first device, that is, the data provider can only obtain part of the model, thereby ensuring the security of the model.
  • the first device can generate first data feature information based on the first data as input data through the first model.
  • the first device can add noise information to the first data feature information to obtain second data feature information, and send the second data feature information to the second device to instruct the second device to determine the first data processing result through the second model close to the output side of the data processing model.
  • the present application can ensure the data security of the data provider and the model security of the model provider while ensuring the accuracy of data processing. At the same time, there is no need for repeated data processing processes, which reduces the amount of computing required for the overall data processing process and ensures data processing efficiency, so that the data processing method of the present application can be applied to a wider range of data processing scenarios.
  • FIG1 is a schematic diagram of a data processing method in a practical application scenario provided by an embodiment of the present application.
  • FIG2 is a signaling diagram of a data processing method provided in an embodiment of the present application.
  • FIG3 is a signaling diagram of a data processing method provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a data processing method provided in an embodiment of the present application.
  • FIG5 is a signaling diagram of a data processing method provided in an embodiment of the present application.
  • FIG6 is a signaling diagram of a data processing method in an actual application scenario provided by an embodiment of the present application.
  • FIG7 is a schematic diagram of a data processing method in a practical application scenario provided by an embodiment of the present application.
  • FIG8 is a structural block diagram of a data processing device provided in an embodiment of the present application.
  • FIG9 is a structural block diagram of a data processing device provided in an embodiment of the present application.
  • FIG10 is a structural diagram of a terminal provided in an embodiment of the present application.
  • FIG11 is a structural diagram of a server provided in an embodiment of the present application.
  • Model-based data processing has a wide range of applications. For example, in artificial intelligence scenarios, models can be used to generate corresponding text processing results based on input text. Data processing typically requires the participation of both data providers and model providers. Data providers provide data to be input into the data processing model to obtain the corresponding processing results, while model providers provide data processing models to complete the data processing process.
  • the data provider in order to ensure the data security of the data provider and prevent the model provider from obtaining the complete input data, the data provider can divide the data to be input into two parts, which are input into the model for processing by the data provider and the model provider respectively. Finally, the results are spliced together to obtain the data processing results corresponding to the complete data.
  • the data processing method in the related technology converts the original one-time data processing for the complete data into one-time data processing for each of the two data parts. This leads to a significant increase in the amount of calculation in the data processing process. Although it can guarantee the data security of the data provider to a certain extent, it will lead to a significant decrease in data processing efficiency, making it difficult to provide efficient data processing services, and thus difficult to apply to various scenarios with high requirements for data processing efficiency.
  • the present application provides a data processing method, which splits the data processing model into a first model close to the data input side and a second model close to the data output side.
  • the first device as the data provider, can generate first data feature information based on the first data as input data through the first model, and then obtain second data feature information by adding noise information to the first data feature information.
  • the second device as the model provider, can obtain a first data processing result based on the second data feature information through the second model.
  • the first data processing result is used to characterize the data processing result obtained by the data processing model by processing the first data.
  • the data provider can only obtain part of the model, the security of the model provided by the model provider can be guaranteed; since the noise information is added to the second data feature information, the second device cannot restore the first data through the second data feature information, thereby ensuring the security of the data provider's data; at the same time, since the noise information affects the data processing process close to the output side, it has less interference with the data processing result, and the accuracy of the first data processing result is higher; in addition, the present application does not need to repeat the entire data processing process, the amount of computation required for the data processing process as a whole is small, and the data processing efficiency is high. To sum up, this application can take into account data security and model security while ensuring data processing accuracy and efficiency, bringing better data processing results.
  • the method can be executed by a computer device, which is a computer device with data processing capabilities, such as a terminal device or a server.
  • the method can be executed independently by a terminal device or a server, and can also be applied to a network scenario in which a terminal device and a server communicate, and is executed by the terminal device and the server in cooperation.
  • the terminal device can be a mobile phone, a tablet computer, a laptop computer, a desktop computer and other devices.
  • the terminal device can also include a variety of virtual reality devices, for example, it can include augmented reality (AR) devices, such as AR glasses, AR screens and other devices, and can include virtual reality technology (VR) devices, such as head-mounted VR glasses and other devices.
  • the server can be understood as an application server or a web server. In actual deployment, the server can be an independent server, a cluster server, or a cloud server.
  • This application also relates to technologies in the field of large models, specifically model compression and quantization and model parallel computing technologies.
  • Model compression and quantization uses compression and quantization techniques to reduce model size and accelerate model inference, thereby lowering model storage and computational costs.
  • Model compression typically includes pruning, low-rank decomposition, and knowledge distillation.
  • Model quantization converts floating-point parameters in the model to fixed-point or integer parameters, reducing model size and accelerating model inference.
  • Model parallel computing involves distributing model computational tasks across multiple computing devices (such as CPUs, GPUs, and TPUs) to perform simultaneous computations, thereby accelerating model training and inference. Model parallel computing can effectively utilize computing resources, improving model computational efficiency and training speed.
  • the model architecture of the data processing model used for data processing can be divided into a first model architecture and a second model architecture, wherein the first model structure is close to the input side and is used to construct the first model, and the second model architecture is close to the output side and is used to construct the second model.
  • the model provider can send the first model to the first server 101.
  • the first server 101 can process the first data through the first model to obtain first data feature information, and the first data feature information is used to characterize the data features corresponding to the first data.
  • the first server 101 can add noise information to the first data feature information to obtain second data feature information, and then send the second data feature information to the second server 102. Since the noise information is added to the second data feature information, the second server 102 cannot obtain the first data feature information and thus cannot restore the first data, thereby ensuring the data security of the data provider.
  • this method can ensure the security of the data provided by the first server 101; second, since the first server 101 can only obtain part of the model architecture and cannot restore the complete data processing model, the security of the model provided by the second server 102 is guaranteed; third, the present application adds noise information after the first data is processed by the first model, so the noise information only acts on the data processing flow close to the data output side, and has little impact on the data processing flow. Therefore, the first data processing result is closer to the data processing result obtained by the data processing model by directly processing the first data, and has higher accuracy; fourth, the present application does not need to repeatedly execute the data processing flow for the first data, and the overall computational cost of the data processing process is relatively small, with higher data processing efficiency.
  • the first device generates first data feature information according to first data using a first model.
  • the model architecture of the data processing model can be split first to obtain a first model architecture close to the data input side and a second model architecture close to the output side, wherein the first model architecture is used to determine the data feature information corresponding to the input data, and the data feature information is used to characterize the input data; the second model architecture is used to determine the data processing result based on the data feature information, and the first model architecture and the second model architecture can be combined to realize the complete data processing function of the data processing model.
  • the first device can generate first data feature information according to the first data through the first model, where the first data feature information is used to characterize data features of the first data.
  • the first model corresponds to a first model architecture.
  • S202 The first device adds noise information to the first data feature information to obtain second data feature information.
  • the second model corresponds to the second model architecture
  • the first model architecture and the second model architecture are used to constitute the model architecture corresponding to the data processing model
  • the first data processing result is used to represent the data processing result obtained by the data processing model by processing the first data.
  • S204 The second device obtains the second data characteristic information sent by the first device.
  • the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device according to the first data through the first model.
  • S205 The second device generates a first data processing result according to the second data feature information through the second model.
  • the first model architecture and the second model architecture can constitute a complete model architecture of the data processing model, and the first data processing result is obtained by processing the first data through data processing, noise addition, and data processing by the second model, the difference between the first data processing result and the data processing result obtained by directly processing the first data through the data processing model is only the influence of noise information, so that the first data processing result can be used to characterize the data processing result obtained by the data processing model by processing the first data.
  • the present application adds noise information after the first model is processed, so that the noise information only affects the data processing part of the second model close to the output side, thereby reducing the impact of the noise information on the first data, and then reducing the impact on the data processing results, so that the first data processing results can be closer to the actual data processing results, ensuring the accuracy of data processing.
  • this application while ensuring the security of the data provided by the data provider and the security of the model provided by the model provider, reduces the impact of noise information on the data processing results, so that the final first data processing result is closer to the actual data processing result obtained by directly processing the first data through the data processing model, ensuring the accuracy of data processing.
  • this application does not require multiple repetitions of the data processing process on a single device. For example, the data processing process corresponding to the second model only needs to be repeated once on the second device, and the data processing process corresponding to the first model only needs to be repeated once on the first device.
  • the overall amount of computation required for the data processing process is relatively small, ensuring data processing efficiency, so that the data processing method of this application can be applied to a wider range of data processing scenarios.
  • FIG3 is a signaling diagram of a data processing method provided in an embodiment of the present application, wherein steps S301 to S303 and S307 to S308 are a possible implementation of step S201.
  • the method includes:
  • S301 A first device performs fragmentation processing on first data to obtain first data fragments and second data fragments.
  • Data sharding refers to dividing data into multiple data parts, each of which can be used as a data shard. In this application, it can be divided into two shards, namely the first data shard and the second data shard. It can also be divided into more shards, which is not limited here.
  • the first data shard and the second data shard can be used to constitute the first data, that is, the first data shard and the second data shard can be combined to include the complete data content corresponding to the first data. Therefore, by performing data processing on the first data shard and performing data processing on the second data shard, the data results of data processing on the first data can be simulated.
  • the first device generates first sub-feature information according to the first data slice using a first model.
  • the target model parameters can be divided into first model parameters and second model parameters.
  • the target model parameters are model parameters of the first model architecture in the data processing model.
  • the computer device can assign the first model parameters to the first model of the data provider, thereby preventing the data provider from obtaining the complete target model parameters and, therefore, from restoring the model portion of the first model architecture in the data processing model.
  • the first sub-feature information is used to characterize the data features of the first data slice.
  • the first device sends the second data slice to the second device to instruct the second device to generate second sub-feature information according to the second data slice using the third model.
  • S304 The second device obtains the second data fragment sent by the first device.
  • S305 The second device generates second sub-feature information according to the second data slice using the third model.
  • the second device sends the second sub-feature information to the first device to instruct the first device to determine the first data feature information according to the first sub-feature information and the second sub-feature information.
  • the first sub-feature information is information generated by the first device through the first model according to the first data slice.
  • the first model has a first model parameter.
  • the first model parameter and the second model parameter are used to constitute the target model parameter.
  • the model part corresponding to the first model architecture in the data processing model has the target model parameter.
  • S307 The first device obtains the second sub-feature information sent by the second device.
  • the first device determines first data feature information according to the first sub-feature information and the second sub-feature information.
  • first model parameters and the second model parameters can be used to constitute the target model parameters
  • first data slice and the second data slice can be used to constitute the first data
  • first model and the third model both correspond to the first model architecture, therefore, combined with the data processing of the first data slice by the first model and the data processing of the second data slice by the third model, the data processing of the first data by the first model part in the data processing model can be restored, and thus, the first data feature information can be determined based on the first sub-feature information and the second sub-feature information.
  • S309 The first device adds noise information to the first data feature information to obtain second data feature information.
  • the first device sends second data feature information to the second device to instruct the second device to generate a first data processing result according to the second data feature information through a second model.
  • S311 The second device obtains the second data characteristic information sent by the first device.
  • S312 The second device generates a first data processing result according to the second data feature information through the second model.
  • the first device acting as the data provider
  • the second device acting as the model provider
  • this application only needs to perform data processing twice on the first model architecture part, without repeating the entire data processing flow. Therefore, it still reduces the amount of computing required in the processing process to a certain extent and improves data processing efficiency.
  • the speed at which the model processes data is usually related to the size of the data. The larger the data, the slower the data processing speed is.
  • the model architecture of the first model and the third model are the same, that is, when the data processing process is the same, the data size determines the data processing speed of the first model and the third model.
  • the first device can make the data size of the first data slice the same as that of the second data slice, thereby being able to achieve the best data processing speed as a whole and ensure data processing efficiency.
  • the above process mainly protects the data provider's data in the input data dimension.
  • the present application can also protect the data processing results output by the model to prevent other parties other than the data provider from knowing the data processing results.
  • FIG. 5 is a signaling diagram of a data processing method provided in an embodiment of the present application, wherein step S504 is a possible implementation of step S203, and step S505 is a possible implementation of step S204.
  • the method includes:
  • the first device determines encryption information and decryption information corresponding to first data.
  • the encryption information is used to represent the encryption method of the data processing result corresponding to the first data, and the decryption information is used to decrypt the data encrypted by the encryption method.
  • the encryption method can include multiple methods and is not limited here.
  • the first device generates first data feature information according to the first data using a first model.
  • S503 The first device adds noise information to the first data feature information to obtain second data feature information.
  • the first device sends the second data characteristic information and the encryption information to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information and the encryption information through the second model.
  • the first device can also send the encryption information to the second device, so that the second model in the second device can encrypt the processed data processing results based on the encryption information, and finally output the encrypted first data processing results, so that the model provider cannot know the accurate data processing results, thereby ensuring the security of the data provider's data from the output side.
  • the second device obtains the second data characteristic information and encryption information sent by the first device.
  • the second device generates a first data processing result according to the second data feature information and the encryption information through the second model.
  • the second device can generate an initial data processing result based on the second data feature information using the second model, and then encrypt the initial data processing result based on the encryption information to output the first data processing result.
  • the initial data processing result is the data processing result obtained by processing the second data feature information.
  • the purpose of this embodiment of the application is to protect this data processing result from being known by the model provider.
  • the second model's input is the second data feature information
  • its output is the first data processing result. After obtaining the initial data processing result, it is not output, but the encryption process is directly executed. Therefore, the second device cannot obtain this initial data processing result. Because the decryption information is held by the first device, the second device cannot decrypt the first data processing result, thus ensuring data security.
  • the second device sends the first data processing result to the first device to instruct the first device to decrypt the first data processing result by using the decryption information to obtain the initial data processing result.
  • S508 The first device obtains the first data processing result sent by the second device.
  • the first device decrypts the first data processing result by using the decryption information to obtain an initial data processing result.
  • the second model When the second model processes the second data feature information, it doesn't directly output the initial data processing result. Instead, it performs an encryption process on it to obtain the encrypted first data processing result. Therefore, the second device cannot obtain this initial data processing result. Furthermore, the decryption information is held by the first device, preventing the second device from decrypting the first data processing result. This ensures that the second device, as the model provider, cannot access the data provided by the data provider, thus ensuring data security.
  • this application can be applied to data processing scenarios of various data.
  • this application can process the text information in the following manner to obtain first data for input into the model.
  • the first device can obtain text information to be processed, which is information that needs to obtain data processing results through a data processing model, such as query text information that needs to be input into a query model.
  • a data processing model such as query text information that needs to be input into a query model.
  • the first device can perform word segmentation processing on the text information to be processed, and obtain a word segmentation set corresponding to the text information to be processed.
  • the word segmentation set includes multiple word segments, and the word segments can be characters, phrases, short sentences, etc. in the text information, which are not limited here.
  • Word segmentation processing can adopt a variety of word segmentation methods, such as Byte-Pair Encoding (BPE), word fragments (WordPiece), sentence fragments (SentencePiece), etc., which are not limited here.
  • the first device can determine the segmentation codes corresponding to the multiple segmentations, thereby converting the text information into coding information that the model can understand.
  • the segmentation code is used to represent the corresponding segmentation, and the segmentation codes corresponding to different segmentations are different, so that the segmentations can be effectively represented by the segmentation code.
  • the coding schemes for segmentation codes can include multiple, for example, 8-bit coding (Universal Character Set/Unicode Transformation Format, referred to as utf-8), etc., which are not limited here.
  • the first device can generate the first data according to the segmentation codes corresponding to the multiple segmentations, so that the data processing model can fully understand the information content of the text information to be processed.
  • the first device can also integrate other information that is helpful for data processing into the first data.
  • the first device when determining the segmentation codes corresponding to a plurality of segmentations, can not only determine the segmentation code corresponding to each of the plurality of segmentations, but also determine the segmentation code corresponding to each of the plurality of segmentations.
  • the text information to be processed includes text information corresponding to a plurality of units, where a unit refers to a text information unit that can include a plurality of segmentations, such as a sentence unit, a paragraph unit, etc.
  • a unit refers to a text information unit that can include a plurality of segmentations, such as a sentence unit, a paragraph unit, etc.
  • the association between a plurality of segmentations in the same unit is closer, and the association between the segmentations in different units is weaker. Therefore, by identifying the units to which the plurality of segmentations belong, it can help the data processing model analyze the association between the segmentations, thereby enabling more accurate data processing.
  • the first device can determine, for each of the multiple segmentations, a segmentation code corresponding to the segmentation, where the segmentation code is used to represent the unit in which the segmentation is located. Accordingly, when generating first data based on the segmentation codes corresponding to the multiple segmentations, the first device can generate the first data based on the segmentation codes and segmentation codes corresponding to the multiple segmentations, thereby enabling the model to clearly understand the association relationship between the multiple segmentations based on the segmentation codes, thereby improving data processing accuracy.
  • the first device can also determine the position code corresponding to each segmentation in the multiple segmentations.
  • the position code is used to identify the position distribution of the segmentation in the text information to be processed. For example, the segmentation can be identified as the first segmentation, the second segmentation, etc. in the text information to be processed.
  • the position code can directly identify the positional relationship of the segmentation in the text information to be processed.
  • the position code can identify the positional relationship of the segmentation in the unit to which it belongs, such as the position distribution of multiple words in the same sentence.
  • the first device can generate the first data based on the segmentation codes and position codes corresponding to the multiple segmentations, so that the model can know the positional relationship between the multiple segmentations, which helps the model further understand the text semantics of the text information to be processed, and thus improve the data processing accuracy.
  • a code mapping relationship can be established, allowing the first device to quickly determine the word segmentation encoding based on the code mapping relationship.
  • the first device can determine the segmentation codes corresponding to the multiple segmentations based on the code mapping relationship, wherein the code mapping relationship is used to record the mapping relationship between the segmentations and the segmentation codes, so that the segmentation codes can be determined by a simple mapping relationship search, which simplifies the segmentation code determination process.
  • the code mapping relationship is used to record the mapping relationship between the segmentations and the segmentation codes, so that the segmentation codes can be determined by a simple mapping relationship search, which simplifies the segmentation code determination process.
  • it can further improve the data processing efficiency, and on the other hand, it can reduce the data processing pressure of the first device, so that the data processing method of the present application can be applied to more devices with poor data processing performance, thereby expanding the versatility of data processing.
  • the first device can generate the word segmentation codes corresponding to such word segmentations in other ways, such as through a coding algorithm.
  • the first device can update the mapping relationship between the word segmentation and the word segmentation code to the coding mapping relationship, so that the next time a word segmentation code needs to be generated for the word segmentation, it can be generated directly based on the coding mapping relationship without the need for other coding processing.
  • the second device when splitting the model architecture, can set a first degree threshold, and the first degree threshold is used to measure the impact of the model architecture splitting on the data processing accuracy.
  • the second device can ensure that the degree of difference between the first data processing result and the second data processing result is less than the first degree threshold, and the second data processing result is the data processing result obtained by processing the first data through the data processing model. That is, when splitting the model architecture, the second device can ensure that the data processing result obtained based on the split model architecture is close to the data processing result obtained based on the complete model architecture, thereby ensuring the processing accuracy of data processing performed in the manner of the present application.
  • the second device may perform model splitting in the following manner:
  • the second device can obtain the second data, and the second data has a corresponding sample data processing result.
  • the sample data processing result is the result of processing the second data through the data processing model, that is, the data processing result obtained after data processing through the complete model architecture.
  • the second device can segment the model architecture corresponding to the data processing model based on the initial structure segmentation method to obtain a first initial model architecture and a second initial model architecture.
  • the first initial model architecture and the second initial model architecture can constitute the model architecture corresponding to the data processing model.
  • the second device can construct a first initial model and a second initial model, wherein the first initial model architecture is the model architecture corresponding to the first initial model, and the second initial model architecture is the model architecture corresponding to the second initial model, so that data processing by the first initial model and the second initial model can simulate data processing by the data processing model.
  • the second device can generate third data feature information based on the second data through the first initial model.
  • the third data feature information is used to characterize the data features corresponding to the second data.
  • the device generates a pending data processing result based on the fourth data feature information.
  • the fourth data feature information is obtained by adding noise information to the third data feature information, thereby characterizing the impact of the noise information on the third data feature information.
  • noise information needs to be kept confidential, it can be added by the first device to ensure that the second device cannot restore the data based on the noise information in subsequent processing. If the noise information does not need to be kept confidential, it can be added by the second device.
  • the noise information used in the analysis of the structural segmentation method can be consistent with the noise information in actual applications, thereby ensuring that the impact of the noise information on the model data processing process is relatively close and avoiding interference from additional factors.
  • the sample data processing result is the processing result of processing the second data based on the complete model architecture
  • the pending data processing result is the processing result of processing the second data based on the two-part model architecture. Therefore, the difference between the pending data processing result and the sample data processing result can characterize the impact of the segmented model architecture on the data processing accuracy. The smaller the difference, the smaller the impact on the data processing accuracy when the model architecture is segmented based on the initial structural segmentation method. Therefore, the second device can adjust the initial structural segmentation method according to the difference to obtain the structural segmentation method.
  • the degree of difference between the pending data processing result and the sample data processing result determined by the structural segmentation method is less than the first degree threshold, thereby ensuring that after the data processing model is segmented based on the structural segmentation method, the first model and the second model obtained can process the data more accurately.
  • the second device can segment the model architecture corresponding to the data processing model based on a structural segmentation method to obtain a first model architecture and a second model architecture, wherein the first model architecture can be used to construct the above-mentioned first model and third model, and the second model architecture can be used to construct the second model.
  • noise information is also one of the factors that affect the accuracy of data processing. Since the purpose of noise information in this application is to change the data rather than to affect the data processing results, the type of noise information can be selected based on the degree of influence of the noise information on the data processing results.
  • a second degree threshold can be preset, which is used to measure whether the noise information has a greater impact on the data processing results. The selected noise information satisfies the difference between the first data processing result and the second data processing result less than the second degree threshold.
  • the second data processing result is the data processing result obtained by processing the first data through the data processing model, thereby ensuring that the data processing results will not be inaccurate due to the addition of noise information, affecting the model usage effect of the data provider.
  • FIG. 6 is a signaling diagram of a data processing method in an actual application scenario provided by an embodiment of the present application.
  • a computer device includes a first device as a data provider and a second device as a model provider.
  • the method includes:
  • the first device determines encryption information and decryption information corresponding to first data.
  • the original data may be text information
  • the first data may be the word segmentation encoding result U after word segmentation and encoding processing.
  • Segment encoding information (segment encoding), position encoding information (position encoding), etc. may be added to U, which is not limited here.
  • the first device performs fragmentation processing on the first data to obtain a first data fragment and a second data fragment.
  • the first device may fragment the word segmentation result U to obtain a first data fragment and the second data shard First device holder Second device holder
  • data in the ⁇ >A format all represent data fragments, where: Indicates the data shard processed by the first model, Indicates the data shards processed by the third model.
  • S603 The first device generates first sub-feature information according to the first data slice using the first model.
  • the first model architecture can be N cycles of the model architecture shown in Figure 7.
  • the input data passes through a multi-head attention layer, a normalization layer, a feed-forward layer, and a second normalization layer.
  • the first and third models both correspond to this first model architecture.
  • the first device sends the second data slice to the second device, instructing the second device to generate second sub-feature information according to the second data slice using the third model.
  • S605 The second device obtains the second data fragment sent by the first device.
  • S606 The second device generates second sub-feature information according to the second data slice using the third model.
  • Multi-head attention mechanism layer It is composed of multiple attention mechanism layers (Attention), and the processing method is shown in the following formula:
  • W0 is the model weight
  • the calculation of each attention is shown in the following formula:
  • ⁇ Q> A ⁇ X> A ⁇ WQ> A
  • ⁇ K> A ⁇ X> A ⁇ WK> A
  • ⁇ V> A ⁇ X> A ⁇ WV> A
  • ⁇ X> A is the data slice of the input model
  • ⁇ WQ> A ⁇ WK> A
  • ⁇ WV> A all model weights, i.e., model parameters.
  • the first model architecture is applied to the first model and the third model, and the model parameters of the data processing model corresponding to the first model architecture can be divided into two data slices, the first model parameters and the second model parameters.
  • the first model parameter corresponding to the first model is and
  • the second model parameters corresponding to the third model are and ⁇ A> A is the output of each attention mechanism calculation.
  • the model can strengthen the relationship between each word segmentation and the overall text information, thereby deepening the model's understanding of the input data.
  • Normalization layer The following calculations can be performed through the first and third models respectively:
  • A is the output of each attention mechanism layer, and the final normalized layer output is as follows:
  • G and B are hyperparameters in the model parameters.
  • Feedforward layer The following calculations can be performed through the first and third models respectively:
  • W 0 and W 1 are weight parameters in the model parameters
  • f( ⁇ ) is an activation function
  • the activation function can be a Gaussian error linear unit GeLu.
  • the first model can output the first sub-feature information
  • the third model can output the second sub-feature information
  • the second device sends the second sub-feature information to the first device to instruct the first device to determine the first data feature information according to the first sub-feature information and the second sub-feature information.
  • S608 The first device obtains the second sub-feature information sent by the second device.
  • S609 The first device determines first data feature information according to the first sub-feature information and the second sub-feature information.
  • the combination method of the first data feature information X can be expressed as follows:
  • S610 The first device adds noise information to the first data feature information to obtain second data feature information.
  • the first device may add Gaussian noise to the first data feature information X As the noise information, where ⁇ is a configurable parameter, the second data feature information X′ is obtained.
  • the first device sends second data feature information and encryption information to the second device to instruct the second device to generate a first data processing result according to the second data feature information and encryption information through a second model.
  • the second device obtains the second data characteristic information sent by the first device.
  • the second device generates an initial data processing result according to the second data feature information through the second model, encrypts the initial data processing result according to the encryption information, and outputs the first data processing result.
  • S614 The second device sends the first data processing result to the first device to instruct the first device to decrypt the first data processing result through decryption information to obtain an initial data processing result.
  • S615 The first device obtains the first data processing result sent by the second device.
  • S616 The first device decrypts the first data processing result by using the decryption information to obtain an initial data processing result.
  • model architectures involved in this application can adopt multiple model architectures with similar functions.
  • only one model architecture is used as an example and is not limited.
  • This application can prevent data providers from obtaining a complete data processing model by splitting the model architecture and model parameters, thereby ensuring the model security of the model provider.
  • This application can ensure the data security of data providers in both input data and output results by adding noise information, encrypting data processing results, and sharding input data.
  • This application can improve the data processing efficiency of the data provider and reduce the data processing pressure of the data provider by establishing a coding mapping relationship, thereby further improving the versatility of the data processing method.
  • This application can reduce the impact of model architecture segmentation and noise information addition on the accuracy of data processing results by adjusting the architecture segmentation method and noise information, thereby ensuring data processing accuracy.
  • FIG8 is a structural block diagram of a data processing device provided in an embodiment of the present application.
  • the device 800 includes a first generating unit 801, a first adding unit 802, and a first sending unit 803:
  • the first generating unit 801 is configured to generate first data feature information according to the first data using the first model, where the first data feature information is used to represent data features of the first data;
  • the first adding unit 802 is configured to add noise information to the first data feature information to obtain second data feature information
  • the first sending unit 803 is used to send the second data characteristic information to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information through a second model.
  • the first model and the second model are used to constitute a data processing model, and the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.
  • the first generating unit 801 is specifically configured to:
  • Slice the first data to obtain a first data slice and a second data slice, wherein the first data slice and the second data slice are used to constitute the first data;
  • the first data feature information is determined according to the first sub-feature information and the second sub-feature information.
  • the first data slice and the second data slice have the same data size.
  • the apparatus further includes a first acquisition unit, a word segmentation unit, a first determination unit, and a second generation unit:
  • the first acquiring unit is used to acquire the text information to be processed
  • the word segmentation unit is used to perform word segmentation processing on the text information to be processed to obtain a plurality of word segments included in the text information to be processed;
  • the first determining unit is used to determine the word segmentation codes corresponding to the multiple word segmentations respectively;
  • the second generating unit is configured to generate the first data according to the word segmentation codes corresponding to the multiple word segmentations.
  • the text information to be processed includes text information corresponding to a plurality of units respectively, and the apparatus further includes a second determining unit:
  • the second determining unit is configured to determine, for each of the multiple segmentations, a segmentation code corresponding to the segmentation, wherein the segmentation code is used to represent the unit in which the segmentation is located;
  • the second generating unit is specifically configured to:
  • the first data is generated according to the word segmentation codes and segmentation codes corresponding to the multiple word segmentations.
  • the apparatus further includes a third determining unit:
  • the third determining unit is configured to determine, for each of the multiple segmented words, a position code corresponding to the segmented word, wherein the position code is used to represent a position distribution of the target segmented word in the text information to be processed;
  • the second generating unit is specifically configured to:
  • the first data is generated according to the word segmentation codes and position codes corresponding to the multiple word segmentations.
  • the first determining unit is specifically configured to:
  • the device further comprises an updating unit:
  • the updating unit is configured to update the mapping relationship between the word segmentation and the word segmentation encoding into the encoding mapping relationship when the word segmentation is not included in the encoding mapping relationship.
  • the apparatus further includes a fourth determining unit:
  • the fourth determining unit is configured to determine encryption information and decryption information corresponding to the first data, wherein the encryption information is used to represent an encryption method for a data processing result corresponding to the first data, and the decryption information is used to decrypt data encrypted using the encryption method;
  • the first sending unit 803 is specifically configured to:
  • the device further includes a second acquisition unit and a decryption unit:
  • the second acquiring unit is configured to acquire the first data processing result sent by the second device
  • the decryption unit is used to decrypt the first data processing result using the decryption information to obtain the initial data processing result.
  • the present application further provides a data processing device. See FIG9 , which is a structural block diagram of a data processing device provided in an embodiment of the present application.
  • the device 900 includes a third acquisition unit 901 and a third generation unit 902:
  • the third acquiring unit 901 is configured to acquire second data feature information sent by the first device, where the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device based on the first data using the first model;
  • the third generation unit 902 is used to generate a first data processing result based on the second data feature information through the second model.
  • the first model and the second model are used to constitute a data processing model.
  • the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.
  • the degree of difference between the first data processing result and the second data processing result is less than a first degree threshold
  • the second data processing result is a data processing result obtained by processing the first data through the data processing model.
  • the apparatus further includes a fourth acquiring unit, a first segmenting unit, a fourth generating unit, a fifth generating unit, an adjusting unit, and a second segmenting unit:
  • the fourth acquiring unit is configured to acquire second data, where the second data has a corresponding sample data processing result, where the sample data processing result is a result of processing the second data using the data processing model;
  • the first segmentation unit is configured to segment the data processing model based on the initial structure segmentation method to obtain a first initial model and a second initial model;
  • the fourth generating unit is configured to generate third data feature information based on the second data using the first initial model, where the third data feature information is used to represent data features corresponding to the second data;
  • the fifth generating unit is configured to generate a pending data processing result according to fourth data feature information using the second initial model, where the fourth data feature information is obtained by adding the noise information to the third data feature information;
  • the adjusting unit is configured to adjust the initial structural segmentation mode according to a difference between the pending data processing result and the sample data processing result, so as to obtain a structural segmentation mode, wherein a degree of difference between the pending data processing result and the sample data processing result determined by the structural segmentation mode is less than a first degree threshold;
  • the second segmentation unit is used to segment the data processing model based on the structural segmentation method to obtain the first model and the second model.
  • the noise information satisfies that the degree of difference between the first data processing result and the second data processing result is less than a second degree threshold
  • the second data processing result is a data processing result obtained by processing the first data through the data processing model.
  • the first device is further configured to perform sharding processing on the first data to obtain first data shards and second data shards, wherein the first data shards and the second data shards are used to constitute the first data;
  • the second device further includes a third model, the third model having the same architecture as the first model, and the third model having second model parameters;
  • the apparatus further includes a fifth acquiring unit, a sixth generating unit, and a second sending unit:
  • the fifth acquiring unit is configured to acquire the second data fragment sent by the first device
  • the sixth generating unit is configured to generate second sub-feature information according to the second data slice using the third model
  • the second sending unit is used to send the second sub-feature information to the first device to instruct the first device to determine the first data feature information based on the first sub-feature information and the second sub-feature information, where the first sub-feature information is feature information generated by the first device according to the first data slice through the first model, and the first model has first model parameters.
  • the first model parameters and the second model parameters are used to constitute target model parameters, and the target model parameters are model parameters of the first model in the data processing model.
  • the third acquiring unit 901 is specifically configured to:
  • the third generating unit 902 is specifically configured to: generate an initial data processing result according to the second data feature information using the second model, encrypt the initial data processing result according to the encryption information, and output the first data processing result;
  • the third sending unit is used to send the first data processing result to the first device to instruct the first device to decrypt the first data processing result through the decryption information to obtain the initial data processing result, and the decryption information is used to decrypt the data encrypted by the encryption method.
  • FIG10 is a block diagram showing a partial structure of a mobile phone related to a terminal device provided in an embodiment of the present application.
  • the mobile phone includes components such as a radio frequency (RF) circuit 710, a memory 720, an input unit 730, a display unit 740, a sensor 750, an audio circuit 760, a wireless fidelity (WiFi) module 770, a processor 780, and a power supply 790.
  • RF radio frequency
  • the RF circuit 710 can be used to receive and transmit signals during information transmission or calls. Specifically, it receives downlink information from the base station and sends it to the processor 780 for processing. It also transmits uplink data to the base station.
  • the RF circuit 710 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low-noise amplifier (LNA), a duplexer, and the like.
  • the RF circuit 710 can communicate with the network and other devices via wireless communications.
  • Such wireless communications can utilize any communication standard or protocol, including but not limited to Global System of Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, and Short Messaging Service (SMS).
  • GSM Global System of Mobile Communications
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • SMS Short Messaging Service
  • the memory 720 can be used to store software programs and modules.
  • the processor 780 executes the various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 720.
  • the memory 720 can mainly include a program storage area and a data storage area.
  • the program storage area can store an operating system and at least one application required for a function (such as a sound playback function, an image playback function, etc.); the data storage area can store data created based on the use of the mobile phone (such as audio data, a phone book, etc.).
  • the memory 720 can include high-speed random access memory and non-volatile memory, such as at least one disk storage device, a flash memory device, or other volatile solid-state storage device.
  • the input unit 730 can be used to receive input digital or character information, and to generate key signal input related to the user settings and function control of the mobile phone.
  • the input unit 730 may include a touch panel 731 and other input devices 732.
  • the touch panel 731 also known as a touch screen, can collect user touch operations on or near it (such as operations performed by the user using any suitable object or accessory such as a finger, stylus, etc. on or near the touch panel 731) and drive the corresponding connection device according to a pre-set program.
  • the touch panel 731 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch direction and detects the signal caused by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 780. It can also receive commands sent by the processor 780 and execute them.
  • the touch panel 731 can be implemented using various types such as resistive, capacitive, infrared and surface acoustic wave.
  • the input unit 730 may further include other input devices 732.
  • the other input devices 732 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick.
  • the display unit 740 can be used to display information input by the user or information provided to the user, as well as various menus of the mobile phone.
  • the display unit 740 may include a display panel 741.
  • the display panel 741 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • a touch panel 731 may overlay the display panel 741. When the touch panel 731 detects a touch operation on or near it, it transmits the information to the processor 780 to determine the type of touch event. The processor 780 then provides a corresponding visual output on the display panel 741 based on the type of touch event.
  • the touch panel 731 and the display panel 741 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 731 and the display panel 741 can be integrated to implement the input and output functions of the mobile phone.
  • the mobile phone may also include at least one sensor 750, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 741 according to the brightness of the ambient light, and the proximity sensor may turn off the display panel 741 and/or the backlight when the mobile phone is moved to the ear.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (generally three axes), and can detect the magnitude and direction of gravity when stationary.
  • Audio circuit 760, speaker 761, and microphone 762 provide an audio interface between the user and the phone. Audio circuit 760 converts received audio data into electrical signals and transmits them to speaker 761, which then converts them into sound signals for output. Microphone 762, on the other hand, converts collected sound signals into electrical signals, which are then received by audio circuit 760 and converted into audio data. The audio data is then processed by processor 780 and transmitted to, for example, another phone via RF circuit 710, or stored in memory 720 for further processing.
  • WiFi is a short-range wireless transmission technology.
  • a mobile phone uses WiFi module 770 to help users send and receive emails, browse the web, and access streaming media, providing wireless broadband internet access.
  • FIG10 illustrates WiFi module 770, it is understood that it is not a required component of the mobile phone and can be omitted as needed without changing the essence of the invention.
  • Processor 780 is the control center of the phone, connecting all parts of the phone using various interfaces and circuits. By running or executing software programs and/or modules stored in memory 720 and accessing data stored in memory 720, it performs various phone functions and processes data, thereby performing overall phone testing.
  • processor 780 may include one or more processing units; preferably, processor 780 may integrate an application processor and a modem processor, where the application processor primarily handles the operating system, user interface, and application programs, while the modem processor primarily handles wireless communications. It is understood that the modem processor may not be integrated into processor 780.
  • the mobile phone also includes a power supply 790 (such as a battery) for supplying power to various components.
  • a power supply 790 (such as a battery) for supplying power to various components.
  • the power supply can be logically connected to the processor 780 through a power management system, thereby managing charging, discharging, and power consumption management functions through the power management system.
  • the mobile phone may also include a camera, a Bluetooth module, etc., which will not be described in detail here.
  • the processor 780 included in the terminal device is also used to execute the above-mentioned data processing method on the first device side or the second device side.
  • FIG. 11 is a structural diagram of the server 800 provided in the embodiment of the present application.
  • the server 800 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPUs) 822 (for example, one or more processors) and memories 832, and one or more storage media 830 (for example, one or more mass storage devices) for storing application programs 842 or data 844.
  • the memories 832 and the storage media 830 can be temporary storage or permanent storage.
  • the program stored in the storage medium 830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
  • the central processing unit 822 can be configured to communicate with the storage medium 830 to execute a series of instruction operations in the storage medium 830 on the server 800.
  • the server 800 may also include one or more power supplies 826 , one or more wired or wireless network interfaces 850 , one or more input and output interfaces 858 , and/or one or more operating systems 841 .
  • the steps executed by the server in the above embodiment may be based on the server structure shown in FIG11 .
  • An embodiment of the present application further provides a computer-readable storage medium for storing a computer program, which is used to execute any one of the data processing methods described in the aforementioned embodiments.
  • An embodiment of the present application further provides a computer program product including a computer program, which, when executed on a computer device, enables the computer device to execute the data processing method described in any one of the above embodiments.
  • the various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiments.
  • the device and system embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of this embodiment. A person of ordinary skill in the art can understand and implement it without expending creative work.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Storage Device Security (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Embodiments of the present application disclose a data processing method and a related apparatus. A data processing model is divided into a first model and a second model; a first device generates first data feature information by means of the first model, and adds noise information in the first data feature information to obtain second data feature information; a second device obtains a data processing result by means of the second model on the basis of the second data feature information. Since the first device can only obtain part of the model, the security of the model provided by the model provider is ensured; since noise information is added into the second data feature information, the second device cannot restore the input data, thereby ensuring the security of the data provided by the data provider; since the noise information affects the data processing flow near the output side, the interference with the data processing result is small, thereby achieving high data processing accuracy; in addition, the present application does not require repeated execution of the entire data processing flow, thereby achieving high data processing efficiency.

Description

数据处理方法和相关装置Data processing method and related device

本申请要求于2024年03月18日提交中国专利局、申请号为2024103074890、申请名称为“数据处理方法和相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on March 18, 2024, with application number 2024103074890 and application name “Data Processing Methods and Related Devices”, the entire contents of which are incorporated by reference into this application.

技术领域Technical Field

本申请涉及人工智能技术领域,特别涉及数据安全技术。The present application relates to the field of artificial intelligence technology, and in particular to data security technology.

背景技术Background Art

随着计算机技术的不断发展,越来越多的技术领域开始应用计算机模型技术来提高处理精度和处理效率,同时带来更智能化的技术体验。在计算机模型技术中,数据提供方可以向模型提供方提供用于输入模型的数据,由模型提供方通过模型基于输入数据得到模型输出,并将模型输出返回给数据提供方进行使用。With the continuous development of computer technology, computer modeling is being applied to a growing number of technical fields to improve processing accuracy and efficiency, while also delivering a more intelligent technological experience. In computer modeling, a data provider can provide data for model input to a model provider. The model provider then uses the model to generate output based on the input data and returns the output to the data provider for use.

在相关技术中,数据提供方可以将数据拆分成多个数据分片,自身保留部分数据分片,将另一部分数据分片发送给模型提供方,双方分别将所拥有的数据分片输入到模型中,得到对应的输出结果,最终将双方得到的输出结果进行拼接、还原,得到完整的输入数据所对应的输出结果,在上述过程中,由于模型提供方不能够获得完整的输入数据,因此保障了数据提供方所提供的数据的安全性。In related technologies, the data provider can split the data into multiple data shards, retain some of the data shards for itself, and send other data shards to the model provider. Both parties input the data shards they own into the model to obtain corresponding output results. Finally, the output results obtained by both parties are spliced and restored to obtain the output results corresponding to the complete input data. In the above process, since the model provider cannot obtain the complete input data, the security of the data provided by the data provider is guaranteed.

然而,相关技术中的方法极大的提高了通过模型进行数据处理的计算量,大幅降低了数据处理效率,难以广泛的应用于各种数据处理场景。However, the methods in related technologies greatly increase the computational complexity of data processing through models, significantly reduce data processing efficiency, and are difficult to be widely applied in various data processing scenarios.

发明内容Summary of the Invention

为了解决上述技术问题,本申请提供了一种数据处理方法,能够在模型应用场景中,在保障数据处理精度的前提下,同时保障数据提供方和模型提供方双方的数据安全,并降低数据处理过程中的计算量,提高数据处理效率,从而可以适用于更广泛的数据处理场景。In order to solve the above technical problems, the present application provides a data processing method that can, in the model application scenario, while ensuring the data processing accuracy, simultaneously ensure the data security of both the data provider and the model provider, reduce the amount of computation in the data processing process, and improve data processing efficiency, so that it can be applied to a wider range of data processing scenarios.

本申请实施例公开了如下技术方案:The embodiments of this application disclose the following technical solutions:

第一方面,本申请实施例公开了一种数据处理方法,所述方法由第一设备执行,所述第一设备中包括第一模型,所述方法包括:In a first aspect, an embodiment of the present application discloses a data processing method, which is performed by a first device, the first device including a first model, and the method includes:

通过所述第一模型,根据第一数据生成第一数据特征信息,所述第一数据特征信息用于表征所述第一数据的数据特征;generating, using the first model, first data feature information according to the first data, where the first data feature information is used to characterize data features of the first data;

在所述第一数据特征信息中添加噪声信息,得到第二数据特征信息;adding noise information to the first data feature information to obtain second data feature information;

向第二设备发送所述第二数据特征信息,以指示所述第二设备通过第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。The second data characteristic information is sent to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information through a second model, the first model and the second model are used to constitute a data processing model, and the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.

第二方面,本申请实施例公开了一种数据处理方法,所述方法由第二设备执行,第二设备中包括第二模型,所述方法包括:In a second aspect, an embodiment of the present application discloses a data processing method, which is executed by a second device, the second device including a second model, and the method includes:

获取第一设备发送的第二数据特征信息,所述第二数据特征信息为所述第一设备通过在第一数据特征信息中添加噪声信息得到的,所述第一数据特征信息为所述第一设备通过第一模型根据第一数据生成的;Obtaining second data feature information sent by the first device, where the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device based on the first data using a first model;

通过所述第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。Through the second model, a first data processing result is generated according to the second data feature information. The first model and the second model are used to constitute a data processing model. The first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.

第三方面,本申请实施例公开了一种数据处理装置,所述装置包括第一生成单元、第一添加单元和第一发送单元:In a third aspect, an embodiment of the present application discloses a data processing device, comprising a first generating unit, a first adding unit, and a first sending unit:

所述第一生成单元,用于通过所述第一模型,根据第一数据生成第一数据特征信息,所述第一数据特征信息用于表征所述第一数据的数据特征;The first generating unit is configured to generate first data feature information according to the first data using the first model, where the first data feature information is used to represent data features of the first data;

所述第一添加单元,用于在所述第一数据特征信息中添加噪声信息,得到第二数据特征信息;The first adding unit is configured to add noise information to the first data feature information to obtain second data feature information;

所述第一发送单元,用于向第二设备发送所述第二数据特征信息,以指示所述第二设备通过第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。The first sending unit is used to send the second data characteristic information to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information through a second model. The first model and the second model are used to constitute a data processing model, and the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.

第四方面,本申请实施例公开了一种数据处理装置,所述装置包括第三获取单元和第三生成单元:In a fourth aspect, an embodiment of the present application discloses a data processing device, comprising a third acquisition unit and a third generation unit:

所述第三获取单元,用于获取第一设备发送的第二数据特征信息,所述第二数据特征信息为所述第一设备通过在第一数据特征信息中添加噪声信息得到的,所述第一数据特征信息为所述第一设备通过第一模型根据第一数据生成的;the third acquiring unit being configured to acquire second data feature information sent by the first device, where the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device based on the first data using the first model;

所述第三生成单元,用于通过所述第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。The third generation unit is used to generate a first data processing result based on the second data feature information through the second model. The first model and the second model are used to constitute a data processing model. The first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.

第五方面,本申请实施例公开了一种计算机设备,所述计算机设备包括处理器以及存储器:In a fifth aspect, an embodiment of the present application discloses a computer device, comprising a processor and a memory:

所述存储器用于存储计算机程序,并将所述计算机程序传输给所述处理器;The memory is used to store a computer program and transmit the computer program to the processor;

所述处理器用于根据所述计算机程序中的指令执行第一方面述的数据处理方法,或执行第二方面所述的数据处理方法;The processor is configured to execute the data processing method of the first aspect or the data processing method of the second aspect according to instructions in the computer program;

第六方面,本申请实施例公开了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行第一方面所述的数据处理方法,或执行第二方面所述的数据处理方法;In a sixth aspect, an embodiment of the present application discloses a computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, wherein the computer program is used to execute the data processing method described in the first aspect, or execute the data processing method described in the second aspect;

第七方面,本申请实施例公开了一种包括计算机程序的计算机程序产品,当其在计算机设备上运行时,使得所述计算机设备执行第一方面所述的数据处理方法,或执行第二方面所述的数据处理方法。In the seventh aspect, an embodiment of the present application discloses a computer program product including a computer program, which, when running on a computer device, enables the computer device to execute the data processing method described in the first aspect, or execute the data processing method described in the second aspect.

由上述技术方案可以看出,首先为了避免作为数据提供方的第一设备获取到完整的数据处理模型,本申请可以只将数据处理模型中靠近输入侧的第一模型放置到第一设备中,即数据提供方只能够获取到部分模型,从而保障了模型的安全性。第一设备可以通过第一模型,根据作为输入数据的第一数据生成第一数据特征信息。其次,为了避免作为模型提供方的第二设备获知第一数据,第一设备可以在第一数据特征信息中添加噪声信息,得到第二数据特征信息,将该第二数据特征信息发送给第二设备,以指示第二设备通过数据处理模型中靠近输出侧的第二模型来确定第一数据处理结果,该第一数据处理结果用于表征通过数据处理模型处理第一数据得到的数据处理结果。由于第二数据特征信息中包含噪声信息,因此,第二设备无法基于该第二数据特征信息准确还原出第一数据,进而保障了数据提供方所提供的输入数据不会被模型提供方所获取,保障了数据的安全性。同时,由于噪声信息是在靠近输出侧的第二模型中对数据处理过程进行影响,因此对于数据处理过程的影响程度较小,从而保证最终得到的第一数据处理结果较为接近通过数据处理模型处理第一数据得到的实际数据处理结果,从而保障了数据处理的准确度。综上所述,本申请可以在保障数据处理准确性的前提下,同时保障数据提供方的数据安全和模型提供方的模型安全,同时无需多次重复的数据处理流程,减少了数据处理过程整体所需耗费的计算量,保障了数据处理效率,从而使本申请的数据处理方法可以适用于更加广泛的数据处理场景。It can be seen from the above technical solution that, first, in order to prevent the first device as the data provider from obtaining the complete data processing model, the present application can only place the first model close to the input side of the data processing model into the first device, that is, the data provider can only obtain part of the model, thereby ensuring the security of the model. The first device can generate first data feature information based on the first data as input data through the first model. Secondly, in order to prevent the second device as the model provider from knowing the first data, the first device can add noise information to the first data feature information to obtain second data feature information, and send the second data feature information to the second device to instruct the second device to determine the first data processing result through the second model close to the output side of the data processing model. The first data processing result is used to characterize the data processing result obtained by processing the first data through the data processing model. Since the second data feature information contains noise information, the second device cannot accurately restore the first data based on the second data feature information, thereby ensuring that the input data provided by the data provider will not be obtained by the model provider, thereby ensuring the security of the data. At the same time, since the noise information affects the data processing process in the second model close to the output side, the degree of influence on the data processing process is relatively small, thereby ensuring that the first data processing result finally obtained is closer to the actual data processing result obtained by processing the first data through the data processing model, thereby ensuring the accuracy of data processing. In summary, the present application can ensure the data security of the data provider and the model security of the model provider while ensuring the accuracy of data processing. At the same time, there is no need for repeated data processing processes, which reduces the amount of computing required for the overall data processing process and ensures data processing efficiency, so that the data processing method of the present application can be applied to a wider range of data processing scenarios.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without any creative work.

图1为本申请实施例提供的一种实际应用场景中数据处理方法的示意图;FIG1 is a schematic diagram of a data processing method in a practical application scenario provided by an embodiment of the present application;

图2为本申请实施例提供的一种数据处理方法的信令图;FIG2 is a signaling diagram of a data processing method provided in an embodiment of the present application;

图3为本申请实施例提供的一种数据处理方法的信令图;FIG3 is a signaling diagram of a data processing method provided in an embodiment of the present application;

图4为本申请实施例提供的一种数据处理方法的示意图;FIG4 is a schematic diagram of a data processing method provided in an embodiment of the present application;

图5为本申请实施例提供的一种数据处理方法的信令图;FIG5 is a signaling diagram of a data processing method provided in an embodiment of the present application;

图6为本申请实施例提供的一种实际应用场景中数据处理方法的信令图;FIG6 is a signaling diagram of a data processing method in an actual application scenario provided by an embodiment of the present application;

图7为本申请实施例提供的一种实际应用场景中数据处理方法的示意图;FIG7 is a schematic diagram of a data processing method in a practical application scenario provided by an embodiment of the present application;

图8为本申请实施例提供的一种数据处理装置的结构框图;FIG8 is a structural block diagram of a data processing device provided in an embodiment of the present application;

图9为本申请实施例提供的一种数据处理装置的结构框图;FIG9 is a structural block diagram of a data processing device provided in an embodiment of the present application;

图10为本申请实施例提供的一种终端的结构图;FIG10 is a structural diagram of a terminal provided in an embodiment of the present application;

图11为本申请实施例提供的一种服务器的结构图。FIG11 is a structural diagram of a server provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面结合附图,对本申请的实施例进行描述。The embodiments of the present application are described below with reference to the accompanying drawings.

基于模型的数据处理具有广泛的应用场景,例如,人工智能场景中可以通过模型根据输入文本信息得到对应的文本信息处理结果等。数据处理通常需要数据提供方和模型提供方的参与,其中,数据提供方用于提供输入数据处理模型的数据,以获取数据对应的处理结果,模型提供方用于提供数据处理模型,以通过数据处理模型完成针对数据的数据处理流程。Model-based data processing has a wide range of applications. For example, in artificial intelligence scenarios, models can be used to generate corresponding text processing results based on input text. Data processing typically requires the participation of both data providers and model providers. Data providers provide data to be input into the data processing model to obtain the corresponding processing results, while model providers provide data processing models to complete the data processing process.

在相关技术中,为了保障数据提供方的数据安全,使模型提供方无法获知完整的输入数据,数据提供方可以将需要输入的数据分为两部分,分别由数据提供方和模型提供方输入到模型中进行处理,最终将结果进行拼接,得到完整数据对应的数据处理结果。In related technologies, in order to ensure the data security of the data provider and prevent the model provider from obtaining the complete input data, the data provider can divide the data to be input into two parts, which are input into the model for processing by the data provider and the model provider respectively. Finally, the results are spliced together to obtain the data processing results corresponding to the complete data.

然而,相关技术中的数据处理方式,将原本针对完整数据的一次数据处理转化为了针对两个数据部分的各一次数据处理,这就导致数据处理过程中的计算量大幅扩大,虽然能够在一定程度上保障数据提供方的数据安全,但是会导致数据处理效率大幅下降,难以提供高效的数据处理服务,从而难以适用于多种对数据处理效率要求较高的场景。However, the data processing method in the related technology converts the original one-time data processing for the complete data into one-time data processing for each of the two data parts. This leads to a significant increase in the amount of calculation in the data processing process. Although it can guarantee the data security of the data provider to a certain extent, it will lead to a significant decrease in data processing efficiency, making it difficult to provide efficient data processing services, and thus difficult to apply to various scenarios with high requirements for data processing efficiency.

为了解决上述技术问题,本申请提供了一种数据处理方法,将数据处理模型拆分为靠近数据输入侧的第一模型和靠近数据输出侧的第二模型,作为数据提供方的第一设备可以通过第一模型,根据作为输入数据的第一数据生成第一数据特征信息,然后通过在第一数据特征信息中添加噪声信息,得到第二数据特征信息,作为模型提供方的第二设备可以通过第二模型,基于该第二数据特征信息得到第一数据处理结果,该第一数据处理结果用于表征数据处理模型通过处理第一数据得到的数据处理结果。由于数据提供方只能够得到部分模型,因此模型提供方提供的模型的安全性能够得到保障;由于第二数据特征信息中添加有噪声信息,因此第二设备无法通过第二数据特征信息还原得到第一数据,保障了数据提供方的数据的安全性;同时,由于噪声信息是在靠近输出侧的数据处理流程中产生影响的,因此其对于数据处理结果的干扰较小,得到的第一数据处理结果的准确度较高;此外,本申请无需对整个数据处理流程进行重复执行,数据处理过程整体所需耗费的计算量较小,数据处理效率较高。综上所述,本申请能够在保障数据处理准确度和数据处理效率的前提下,兼顾数据安全和模型安全,带来较优的数据处理效果。In order to solve the above technical problems, the present application provides a data processing method, which splits the data processing model into a first model close to the data input side and a second model close to the data output side. The first device, as the data provider, can generate first data feature information based on the first data as input data through the first model, and then obtain second data feature information by adding noise information to the first data feature information. The second device, as the model provider, can obtain a first data processing result based on the second data feature information through the second model. The first data processing result is used to characterize the data processing result obtained by the data processing model by processing the first data. Since the data provider can only obtain part of the model, the security of the model provided by the model provider can be guaranteed; since the noise information is added to the second data feature information, the second device cannot restore the first data through the second data feature information, thereby ensuring the security of the data provider's data; at the same time, since the noise information affects the data processing process close to the output side, it has less interference with the data processing result, and the accuracy of the first data processing result is higher; in addition, the present application does not need to repeat the entire data processing process, the amount of computation required for the data processing process as a whole is small, and the data processing efficiency is high. To sum up, this application can take into account data security and model security while ensuring data processing accuracy and efficiency, bringing better data processing results.

可以理解的是,该方法可以由计算机设备执行,该计算机设备为具有数据处理功能的计算机设备,例如可以为终端设备或服务器。该方法可以通过终端设备或服务器独立执行,也可以应用于终端设备和服务器通信的网络场景,通过终端设备和服务器配合执行。其中,终端设备可以为手机、平板电脑、笔记本电脑、台式电脑等设备。终端设备还可以包括多种虚拟现实设备,例如可以包括增强现实(Augmented Reality,简称AR)设备,如AR眼镜、AR屏幕等设备,以及可以包括虚拟现实技术(Virtual Reality,简称VR)设备,例如头戴式VR眼镜等设备。服务器可以理解为是应用服务器,也可以为Web服务器,在实际部署时,该服务器可以为独立服务器,也可以为集群服务器,或者云服务器等。It is understandable that the method can be executed by a computer device, which is a computer device with data processing capabilities, such as a terminal device or a server. The method can be executed independently by a terminal device or a server, and can also be applied to a network scenario in which a terminal device and a server communicate, and is executed by the terminal device and the server in cooperation. Among them, the terminal device can be a mobile phone, a tablet computer, a laptop computer, a desktop computer and other devices. The terminal device can also include a variety of virtual reality devices, for example, it can include augmented reality (AR) devices, such as AR glasses, AR screens and other devices, and can include virtual reality technology (VR) devices, such as head-mounted VR glasses and other devices. The server can be understood as an application server or a web server. In actual deployment, the server can be an independent server, a cluster server, or a cloud server.

本申请还涉及大模型领域技术,具体可以涉及其中的模型压缩与量化和模型并行计算技术。This application also relates to technologies in the field of large models, specifically model compression and quantization and model parallel computing technologies.

模型压缩与量化是指通过压缩与量化的技术,帮助减小模型大小和加速模型推理,从而降低模型在存储和计算方面的成本。模型压缩通常包括剪枝、低秩分解、知识蒸馏等,模型量化指将模型中的浮点数参数转换为定点数或整数参数,从而减小模型大小和加速模型推理。Model compression and quantization uses compression and quantization techniques to reduce model size and accelerate model inference, thereby lowering model storage and computational costs. Model compression typically includes pruning, low-rank decomposition, and knowledge distillation. Model quantization converts floating-point parameters in the model to fixed-point or integer parameters, reducing model size and accelerating model inference.

模型并行计算是指将模型的计算任务分配给多个计算设备(例如CPU、GPU、TPU等)同时进行计算,从而加速模型的训练和推理。模型并行计算能够有效地利用计算资源,提高模型的计算效率和训练速度。Model parallel computing involves distributing model computational tasks across multiple computing devices (such as CPUs, GPUs, and TPUs) to perform simultaneous computations, thereby accelerating model training and inference. Model parallel computing can effectively utilize computing resources, improving model computational efficiency and training speed.

在本申请中,模型压缩与量化技术主要体现在可以采用压缩或量化后的模型,对数据提供方提供的数据进行处理,从而加快数据处理速度,减小在第一设备和第二设备上部署的模型的大小。模型并行计算主要体现在,使用多个第一设备通过其各自部署的第一模型对输入数据进行并行计算,或者使用多个第二设备通过其各自部署的第二模型对输入数据进行并行计算,从而提高数据处理效率。In this application, model compression and quantization technology is mainly reflected in the ability to use compressed or quantized models to process data provided by the data provider, thereby speeding up data processing and reducing the size of the models deployed on the first and second devices. Model parallel computing is mainly reflected in the use of multiple first devices to perform parallel computing on input data through their respective deployed first models, or using multiple second devices to perform parallel computing on input data through their respective deployed second models, thereby improving data processing efficiency.

为了便于理解本申请提供的技术方案,接下来,将结合一种实际应用场景,对本申请提供的数据处理方法进行介绍。In order to facilitate understanding of the technical solution provided by this application, the data processing method provided by this application will be introduced below in combination with an actual application scenario.

参见图1,图1为本申请实施例提供的一种实际应用场景中数据处理方法的示意图,在该实际应用场景中,第一设备为第一服务器101,第二设备为第二服务器102,第一服务器101为作为数据提供方的服务器,第二服务器102为作为模型提供方的服务器。Refer to Figure 1, which is a schematic diagram of a data processing method in an actual application scenario provided by an embodiment of the present application. In this actual application scenario, the first device is a first server 101, the second device is a second server 102, the first server 101 is a server serving as a data provider, and the second server 102 is a server serving as a model provider.

用于进行数据处理的数据处理模型的模型架构可以拆分为第一模型架构和第二模型架构,其中,第一模型结构接近输入侧,用于构建第一模型,第二模型架构接近输出侧,用于构建第二模型。模型提供方可以将第一模型发送给第一服务器101,第一服务器101通过第一模型,可以对第一数据进行处理,得到第一数据特征信息,第一数据特征信息用于表征第一数据对应的数据特征。第一服务器101可以在第一数据特征信息中添加噪声信息,得到第二数据特征信息,然后将第二数据特征信息发送给第二服务器102。由于第二数据特征信息中添加了噪声信息,因此第二服务器102无法获知第一数据特征信息,从而无法还原出第一数据,进而保障了数据提供方的数据安全。The model architecture of the data processing model used for data processing can be divided into a first model architecture and a second model architecture, wherein the first model structure is close to the input side and is used to construct the first model, and the second model architecture is close to the output side and is used to construct the second model. The model provider can send the first model to the first server 101. The first server 101 can process the first data through the first model to obtain first data feature information, and the first data feature information is used to characterize the data features corresponding to the first data. The first server 101 can add noise information to the first data feature information to obtain second data feature information, and then send the second data feature information to the second server 102. Since the noise information is added to the second data feature information, the second server 102 cannot obtain the first data feature information and thus cannot restore the first data, thereby ensuring the data security of the data provider.

第二服务器102可以通过第二模型,根据第二数据特征信息确定出第一数据处理结果。由于第一数据经过了第一模型和第二模型的处理,而第一模型和第二模型能够构成数据处理模型对应的完整模型架构,因此该第一数据处理结果即可以用于表征数据处理模型通过处理第一数据得到的数据处理结果。The second server 102 can use the second model to determine the first data processing result based on the second data feature information. Since the first data has been processed by the first model and the second model, and the first model and the second model can constitute a complete model architecture corresponding to the data processing model, the first data processing result can be used to represent the data processing result obtained by the data processing model by processing the first data.

第一方面,上已述及该方式可以保障第一服务器101提供的数据的安全性;第二方面,由于第一服务器101只能够得到部分模型架构,无法还原出完整的数据处理模型,因此保障了第二服务器102提供的模型的安全性;第三方面,本申请是在第一数据经过第一模型处理后再添加噪声信息,因此噪声信息只作用于靠近数据输出侧的数据处理流程,对数据处理流程的影响较小,因此第一数据处理结果较为接近数据处理模型通过直接处理第一数据得到的数据处理结果,准确度较高;第四方面,本申请无需针对第一数据重复执行数据处理流程,数据处理过程整体所需耗费的计算量较小,具有较高的数据处理效率。First, as mentioned above, this method can ensure the security of the data provided by the first server 101; second, since the first server 101 can only obtain part of the model architecture and cannot restore the complete data processing model, the security of the model provided by the second server 102 is guaranteed; third, the present application adds noise information after the first data is processed by the first model, so the noise information only acts on the data processing flow close to the data output side, and has little impact on the data processing flow. Therefore, the first data processing result is closer to the data processing result obtained by the data processing model by directly processing the first data, and has higher accuracy; fourth, the present application does not need to repeatedly execute the data processing flow for the first data, and the overall computational cost of the data processing process is relatively small, with higher data processing efficiency.

接下来,将结合附图,对本申请提供的数据处理方法进行详细介绍。Next, the data processing method provided by this application will be introduced in detail with reference to the accompanying drawings.

参见图2,图2为本申请实施例提供的一种数据处理方法的信令图,在该实施例中,计算机设备可以包括作为数据提供方的第一设备,和作为模型提供方的第二设备,第一设备和第二设备均可以为任意具有数据处理功能的计算机设备。该方法包括:Referring to Figure 2, which is a signaling diagram of a data processing method provided in an embodiment of the present application, in this embodiment, a computer device may include a first device as a data provider and a second device as a model provider. Both the first device and the second device may be any computer device with data processing capabilities. The method includes:

S201:第一设备通过第一模型,根据第一数据生成第一数据特征信息。S201: The first device generates first data feature information according to first data using a first model.

可以理解的是,模型提供方提供的模型主要体现在模型架构和模型参数两个维度,例如对于数据处理模型来说,模型架构能够表征出进行数据处理的处理流程,模型参数能够决定基于该处理流程进行数据处理的具体方式,这些都是模型提供方通过模型研发得到的。It is understandable that the model provided by the model provider is mainly reflected in two dimensions: model architecture and model parameters. For example, for a data processing model, the model architecture can characterize the processing flow for data processing, and the model parameters can determine the specific method of data processing based on the processing flow. These are all obtained by the model provider through model research and development.

基于此,在本实施例中,为了保障模型提供方的模型的安全性,避免模型架构和模型参数完全被作为数据提供方的第一设备获取,可以先对数据处理模型的模型架构进行拆分,得到靠近数据输入侧的第一模型架构,以及靠近输出侧的第二模型架构,其中,第一模型架构用于确定输入数据对应的数据特征信息,该数据特征信息用于表征输入数据;第二模型架构用于根据数据特征信息确定数据处理结果,第一模型架构和第二模型架构能够组合实现数据处理模型完整的数据处理功能。Based on this, in this embodiment, in order to ensure the security of the model of the model provider and prevent the model architecture and model parameters from being completely obtained by the first device acting as the data provider, the model architecture of the data processing model can be split first to obtain a first model architecture close to the data input side and a second model architecture close to the output side, wherein the first model architecture is used to determine the data feature information corresponding to the input data, and the data feature information is used to characterize the input data; the second model architecture is used to determine the data processing result based on the data feature information, and the first model architecture and the second model architecture can be combined to realize the complete data processing function of the data processing model.

通过第一模型架构可以构成第一模型,通过第二模型架构可以构成第二模型,在本实施例中,只将第一模型提供给第一设备,作为数据提供方的第一设备将无法获知完整的模型架构和模型参数,进而无法自己构建出数据处理模型,保障了数据处理模型的安全性。The first model can be constructed through the first model architecture, and the second model can be constructed through the second model architecture. In this embodiment, only the first model is provided to the first device. The first device, as the data provider, will not be able to know the complete model architecture and model parameters, and thus will not be able to construct the data processing model by itself, thereby ensuring the security of the data processing model.

第一设备可以通过第一模型,根据第一数据生成第一数据特征信息,第一数据特征信息用于表征第一数据的数据特征,第一模型对应第一模型架构。The first device can generate first data feature information according to the first data through the first model, where the first data feature information is used to characterize data features of the first data. The first model corresponds to a first model architecture.

S202:第一设备在第一数据特征信息中添加噪声信息,得到第二数据特征信息。S202: The first device adds noise information to the first data feature information to obtain second data feature information.

为了使作为模型提供方的第二设备无法获知准确的输入数据,第一设备可以在第一数据特征信息中添加噪声信息,生成第二数据特征信息,从而保证第二设备基于该第二数据特征信息无法还原出准确的第一数据。其中,噪声信息可以包括多种,例如可以为高斯分布噪声信息等,此处不做限定。To prevent the second device, acting as the model provider, from obtaining accurate input data, the first device can add noise information to the first data feature information to generate second data feature information, thereby ensuring that the second device cannot accurately restore the first data based on the second data feature information. Noise information can include various types, such as Gaussian distributed noise information, which is not limited here.

S203:第一设备向第二设备发送第二数据特征信息,以指示第二设备通过第二模型,根据第二数据特征信息生成第一数据处理结果。S203: The first device sends second data feature information to the second device to instruct the second device to generate a first data processing result according to the second data feature information through a second model.

其中,第二模型对应第二模型架构,第一模型架构和第二模型架构用于构成数据处理模型对应的模型架构,第一数据处理结果用于表征数据处理模型通过处理第一数据得到的数据处理结果。Among them, the second model corresponds to the second model architecture, the first model architecture and the second model architecture are used to constitute the model architecture corresponding to the data processing model, and the first data processing result is used to represent the data processing result obtained by the data processing model by processing the first data.

S204:第二设备获取第一设备发送的第二数据特征信息。S204: The second device obtains the second data characteristic information sent by the first device.

通过上述内容可见,第二数据特征信息为第一设备通过在第一数据特征信息中添加噪声信息得到的,第一数据特征信息为第一设备通过第一模型根据第一数据生成的。It can be seen from the above content that the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device according to the first data through the first model.

S205:第二设备通过第二模型,根据第二数据特征信息生成第一数据处理结果。S205: The second device generates a first data processing result according to the second data feature information through the second model.

由于第一模型架构和第二模型架构能够构成数据处理模型完整的模型架构,而第一数据处理结果是第一数据经过第一模型的数据处理、噪声添加、以及第二模型的数据处理得到的,因此,第一数据处理结果与通过数据处理模型直接处理第一数据得到的数据处理结果之间的差异只在于噪声信息的影响,从而该第一数据处理结果可以用于表征数据处理模型通过处理第一数据得到的数据处理结果。Since the first model architecture and the second model architecture can constitute a complete model architecture of the data processing model, and the first data processing result is obtained by processing the first data through data processing, noise addition, and data processing by the second model, the difference between the first data processing result and the data processing result obtained by directly processing the first data through the data processing model is only the influence of noise information, so that the first data processing result can be used to characterize the data processing result obtained by the data processing model by processing the first data.

可以理解的是,通常情况下,噪声信息所参与的数据处理流程越长,对于数据处理结果的影响越大;噪声信息所影响的数据处理流程越接近数据输入侧,对于输入数据的影响越大,从而对于数据处理结果的影响越大,反之越接近数据输出侧,对于输入数据的影响越小,从而对于数据处理结果的影响越小。因此,本申请在经过第一模型处理后再添加噪声信息,使噪声信息只影响接近输出侧的第二模型的数据处理部分,从而可以降低噪声信息对第一数据的影响,进而降低对数据处理结果的影响,使第一数据处理结果能够较为接近真实的数据处理结果,保障了数据处理的准确度。It is understandable that, generally speaking, the longer the data processing process involved in the noise information is, the greater the impact on the data processing results; the closer the data processing process affected by the noise information is to the data input side, the greater the impact on the input data, and thus the greater the impact on the data processing results; conversely, the closer it is to the data output side, the smaller the impact on the input data, and thus the smaller the impact on the data processing results. Therefore, the present application adds noise information after the first model is processed, so that the noise information only affects the data processing part of the second model close to the output side, thereby reducing the impact of the noise information on the first data, and then reducing the impact on the data processing results, so that the first data processing results can be closer to the actual data processing results, ensuring the accuracy of data processing.

由上述技术方案可以看出,本申请在保障了数据提供方提供的数据安全和模型提供方提供的模型安全的同时,降低了噪声信息对于数据处理结果的影响,使最终得到的第一数据处理结果较为接近通过数据处理模型直接处理第一数据得到的实际数据处理结果,保障了数据处理的准确度。同时,本申请在单个设备上无需多次重复的数据处理流程,例如在第二设备只需经过一次第二模型对应的数据处理流程,也可以在第一设备上只需经过一次第一模型对应的数据流程,数据处理过程整体所需耗费的计算量较小,保障了数据处理效率,从而使本申请的数据处理方法可以适用于更加广泛的数据处理场景。It can be seen from the above technical solutions that this application, while ensuring the security of the data provided by the data provider and the security of the model provided by the model provider, reduces the impact of noise information on the data processing results, so that the final first data processing result is closer to the actual data processing result obtained by directly processing the first data through the data processing model, ensuring the accuracy of data processing. At the same time, this application does not require multiple repetitions of the data processing process on a single device. For example, the data processing process corresponding to the second model only needs to be repeated once on the second device, and the data processing process corresponding to the first model only needs to be repeated once on the first device. The overall amount of computation required for the data processing process is relatively small, ensuring data processing efficiency, so that the data processing method of this application can be applied to a wider range of data processing scenarios.

上已述及,模型包括了模型架构和模型参数,上述方法虽然能够整体上使完整的模型架构和模型参数被分割,然而第一设备中的第一模型仍然可能包括数据处理模型在第一模型架构上所对应的完整的模型参数。在一种可能的实现方式中,计算机设备可以进一步对第一模型架构对应的模型参数进行处理,来避免数据提供方获得第一模型架构对应的完整模型参数。As mentioned above, a model includes a model architecture and model parameters. While the above method can generally separate the complete model architecture and model parameters, the first model in the first device may still include the complete model parameters corresponding to the data processing model in the first model architecture. In one possible implementation, the computer device may further process the model parameters corresponding to the first model architecture to prevent the data provider from obtaining the complete model parameters corresponding to the first model architecture.

参见图3,图3为本申请实施例提供的一种数据处理方法的信令图,其中,步骤S301~S303和S307~S308为步骤S201的一种可能的实现方式。该方法包括:Referring to FIG3 , FIG3 is a signaling diagram of a data processing method provided in an embodiment of the present application, wherein steps S301 to S303 and S307 to S308 are a possible implementation of step S201. The method includes:

S301:第一设备对第一数据进行分片处理,得到第一数据分片和第二数据分片。S301: A first device performs fragmentation processing on first data to obtain first data fragments and second data fragments.

数据分片是指将数据分割成多个数据部分,每个数据部分可以作为一个数据分片,在本申请中可以分为第一数据分片和第二数据分片这两个分片,也可以切分成更多分片,此处不作限定。第一数据分片和第二数据分片可以用于构成第一数据,即第一数据分片和第二数据分片组合起来能够包括第一数据对应的完整数据内容,从而,通过对第一数据分片进行数据处理和对第二数据分片进行数据处理,能够模拟出针对第一数据进行数据处理的数据结果。Data sharding refers to dividing data into multiple data parts, each of which can be used as a data shard. In this application, it can be divided into two shards, namely the first data shard and the second data shard. It can also be divided into more shards, which is not limited here. The first data shard and the second data shard can be used to constitute the first data, that is, the first data shard and the second data shard can be combined to include the complete data content corresponding to the first data. Therefore, by performing data processing on the first data shard and performing data processing on the second data shard, the data results of data processing on the first data can be simulated.

S302:第一设备通过第一模型,根据第一数据分片生成第一子特征信息。S302: The first device generates first sub-feature information according to the first data slice using a first model.

在本申请实施例中,为了保护模型提供方提供的模型参数的安全性,如图4所示,可以将目标模型参数切分为第一模型参数和第二模型参数,该目标模型参数为数据处理模型中的第一模型架构的模型参数。计算机设备可以将第一模型参数分配给数据提供方的第一模型,从而使数据提供方无法获知完整的目标模型参数,进而无法还原出数据处理模型中第一模型架构的模型部分。该第一子特征信息用于表征所述第一数据分片的数据特征。In an embodiment of the present application, to protect the security of the model parameters provided by the model provider, as shown in FIG4 , the target model parameters can be divided into first model parameters and second model parameters. The target model parameters are model parameters of the first model architecture in the data processing model. The computer device can assign the first model parameters to the first model of the data provider, thereby preventing the data provider from obtaining the complete target model parameters and, therefore, from restoring the model portion of the first model architecture in the data processing model. The first sub-feature information is used to characterize the data features of the first data slice.

S303:第一设备向第二设备发送第二数据分片,以指示第二设备通过第三模型,根据第二数据分片生成第二子特征信息。S303: The first device sends the second data slice to the second device to instruct the second device to generate second sub-feature information according to the second data slice using the third model.

第二模型参数可以分配给第三模型,第三模型位于作为模型提供方的第二设备中,第三模型同样对应第一模型架构,第二子特征信息用于表征第二数据分片的数据特征。The second model parameters can be assigned to a third model. The third model is located in a second device serving as a model provider. The third model also corresponds to the first model architecture. The second sub-feature information is used to characterize data features of the second data slice.

S304:第二设备获取第一设备发送的第二数据分片。S304: The second device obtains the second data fragment sent by the first device.

S305:第二设备通过第三模型,根据第二数据分片生成第二子特征信息。S305: The second device generates second sub-feature information according to the second data slice using the third model.

S306:第二设备向第一设备发送第二子特征信息,以指示第一设备根据第一子特征信息和第二子特征信息确定第一数据特征信息。S306: The second device sends the second sub-feature information to the first device to instruct the first device to determine the first data feature information according to the first sub-feature information and the second sub-feature information.

上已述及,第一子特征信息为第一设备通过第一模型,根据第一数据分片生成的信息,第一模型具有第一模型参数,第一模型参数和第二模型参数用于构成目标模型参数,数据处理模型中对应第一模型架构的模型部分具有目标模型参数。As mentioned above, the first sub-feature information is information generated by the first device through the first model according to the first data slice. The first model has a first model parameter. The first model parameter and the second model parameter are used to constitute the target model parameter. The model part corresponding to the first model architecture in the data processing model has the target model parameter.

S307:第一设备获取第二设备发送的第二子特征信息。S307: The first device obtains the second sub-feature information sent by the second device.

S308:第一设备根据第一子特征信息和第二子特征信息,确定第一数据特征信息。S308: The first device determines first data feature information according to the first sub-feature information and the second sub-feature information.

由于第一模型参数和第二模型参数可以用于构成目标模型参数,而第一数据分片和第二数据分片可以用于构成第一数据,且第一模型和第三模型均对应第一模型架构,因此,结合第一模型对第一数据分片的数据处理和第三模型对第二数据分片的数据处理,能够还原出数据处理模型中的第一模型部分对第一数据的数据处理,从而,基于第一子特征信息和第二子特征信息能够确定出第一数据特征信息。Since the first model parameters and the second model parameters can be used to constitute the target model parameters, and the first data slice and the second data slice can be used to constitute the first data, and the first model and the third model both correspond to the first model architecture, therefore, combined with the data processing of the first data slice by the first model and the data processing of the second data slice by the third model, the data processing of the first data by the first model part in the data processing model can be restored, and thus, the first data feature information can be determined based on the first sub-feature information and the second sub-feature information.

S309:第一设备在第一数据特征信息中添加噪声信息,得到第二数据特征信息。S309: The first device adds noise information to the first data feature information to obtain second data feature information.

S310:第一设备向第二设备发送第二数据特征信息,以指示第二设备通过第二模型,根据第二数据特征信息生成第一数据处理结果。S310: The first device sends second data feature information to the second device to instruct the second device to generate a first data processing result according to the second data feature information through a second model.

S311:第二设备获取第一设备发送的第二数据特征信息。S311: The second device obtains the second data characteristic information sent by the first device.

S312:第二设备通过所述第二模型,根据所述第二数据特征信息生成第一数据处理结果。S312: The second device generates a first data processing result according to the second data feature information through the second model.

由上述过程可见,在整个数据处理过程中,作为数据提供方的第一设备不会获取数据处理模型中任意模型架构对应的完整模型参数,从而进一步保障了模型提供方的模型的安全性;同时,作为模型提供方的第二设备也不会获取完整的第一数据,从而保障了数据提供方的数据的安全性。此外,相较于相关技术来说,本申请只需在第一模型架构部分进行两次数据处理,无需重复整个数据处理流程,因此,仍然在一定程度上减少了处理过程中所需耗费的计算量,提高了数据处理效率。As can be seen from the above process, during the entire data processing process, the first device, acting as the data provider, will not obtain the complete model parameters corresponding to any model architecture in the data processing model, thereby further ensuring the security of the model provider's model. At the same time, the second device, acting as the model provider, will not obtain the complete first data, thereby ensuring the security of the data provider's data. In addition, compared to related technologies, this application only needs to perform data processing twice on the first model architecture part, without repeating the entire data processing flow. Therefore, it still reduces the amount of computing required in the processing process to a certain extent and improves data processing efficiency.

可以理解的是,模型进行数据处理的速度通常情况下与数据大小相关,数据越大,则数据处理速度通常越慢。在第一模型和第三模型的模型架构相同,即数据处理流程相同时,数据大小决定了第一模型和第三模型的数据处理速度。因此,在一种可能的实现方式中,由于需要等待第一模型和第三模型两方都处理完毕,才能够合成得到第一数据特征信息,为了避免由于数据大小不同,导致第一模型和第三模型中处理较大的数据分片的模型还未处理完毕,另一个模型已经处理完成,从而需要等待的问题,第一设备可以使第一数据分片与第二数据分片的数据大小相同,从而整体上能够带来最佳的数据处理速度,保障数据处理效率。It is understandable that the speed at which the model processes data is usually related to the size of the data. The larger the data, the slower the data processing speed is. When the model architecture of the first model and the third model are the same, that is, when the data processing process is the same, the data size determines the data processing speed of the first model and the third model. Therefore, in one possible implementation, since it is necessary to wait for both the first model and the third model to be processed before the first data feature information can be synthesized, in order to avoid the problem that due to the different data sizes, the model that processes the larger data slice in the first model and the third model has not yet been processed, and the other model has been processed, thus needing to wait, the first device can make the data size of the first data slice the same as that of the second data slice, thereby being able to achieve the best data processing speed as a whole and ensure data processing efficiency.

上述过程主要在输入数据维度对数据提供方的数据进行了保护,在另一种可能的实现方式中,本申请还可以对模型输出的数据处理结果进行保护,以避免数据提供方以外的其他方获知数据处理结果。The above process mainly protects the data provider's data in the input data dimension. In another possible implementation method, the present application can also protect the data processing results output by the model to prevent other parties other than the data provider from knowing the data processing results.

参见图5,图5为本申请实施例提供的一种数据处理方法的信令图,其中,步骤S504为步骤S203的一种可能的实现方式,步骤S505为步骤S204的一种可能的实现方式,该方法包括:Referring to FIG. 5 , FIG. 5 is a signaling diagram of a data processing method provided in an embodiment of the present application, wherein step S504 is a possible implementation of step S203, and step S505 is a possible implementation of step S204. The method includes:

S501:第一设备确定第一数据对应的加密信息和解密信息。S501: The first device determines encryption information and decryption information corresponding to first data.

其中,加密信息用于表征对第一数据对应的数据处理结果的加密方式,解密信息用于解密通过该加密方式加密得到的数据。该加密方式可以包括多种,此处不作限定。The encryption information is used to represent the encryption method of the data processing result corresponding to the first data, and the decryption information is used to decrypt the data encrypted by the encryption method. The encryption method can include multiple methods and is not limited here.

S502:第一设备通过第一模型,根据第一数据生成第一数据特征信息。S502: The first device generates first data feature information according to the first data using a first model.

S503:第一设备在第一数据特征信息中添加噪声信息,得到第二数据特征信息。S503: The first device adds noise information to the first data feature information to obtain second data feature information.

S504:第一设备向第二设备发送第二数据特征信息和加密信息,以指示第二设备通过第二模型,根据第二数据特征信息和加密信息生成第一数据处理结果。S504: The first device sends the second data characteristic information and the encryption information to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information and the encryption information through the second model.

在向第二设备发送第二数据特征信息时,第一设备可以将加密信息也发送给第二设备,使第二设备中的第二模型可以基于该加密信息对处理得到的数据处理结果进行加密,最终输出加密后的第一数据处理结果,从而使模型提供方无法获知准确的数据处理结果,从输出侧保障了数据提供方的数据的安全性。When sending the second data feature information to the second device, the first device can also send the encryption information to the second device, so that the second model in the second device can encrypt the processed data processing results based on the encryption information, and finally output the encrypted first data processing results, so that the model provider cannot know the accurate data processing results, thereby ensuring the security of the data provider's data from the output side.

S505:第二设备获取第一设备发送的第二数据特征信息和加密信息。S505: The second device obtains the second data characteristic information and encryption information sent by the first device.

S506:第二设备通过第二模型,根据第二数据特征信息和加密信息生成第一数据处理结果。S506: The second device generates a first data processing result according to the second data feature information and the encryption information through the second model.

其中,第二设备可以通过第二模型,根据第二数据特征信息生成初始数据处理结果,进而根据加密信息对初始数据处理结果进行加密,输出第一数据处理结果。初始数据处理结果即为通过对第二数据特征信息进行数据处理得到的数据处理结果,本申请实施例的目的即为保护该数据处理结果不被模型提供方所获知。The second device can generate an initial data processing result based on the second data feature information using the second model, and then encrypt the initial data processing result based on the encryption information to output the first data processing result. The initial data processing result is the data processing result obtained by processing the second data feature information. The purpose of this embodiment of the application is to protect this data processing result from being known by the model provider.

需要强调的是,第二模型的输入为第二数据特征信息,输出为第一数据处理结果,中间得到初始数据处理结果后并不会输出,而是直接执行加密流程,因此第二设备无法得到该初始数据处理结果。由于解密信息由第一设备所持有,因此第二设备无法对第一数据处理结果进行解密,从而保障了数据安全。It's important to emphasize that the second model's input is the second data feature information, and its output is the first data processing result. After obtaining the initial data processing result, it is not output, but the encryption process is directly executed. Therefore, the second device cannot obtain this initial data processing result. Because the decryption information is held by the first device, the second device cannot decrypt the first data processing result, thus ensuring data security.

S507:第二设备向第一设备发送第一数据处理结果,以指示第一设备通过解密信息,对第一数据处理结果进行解密,得到初始数据处理结果。S507: The second device sends the first data processing result to the first device to instruct the first device to decrypt the first data processing result by using the decryption information to obtain the initial data processing result.

S508:第一设备获取第二设备发送的第一数据处理结果。S508: The first device obtains the first data processing result sent by the second device.

S509:第一设备通过解密信息,对第一数据处理结果进行解密,得到初始数据处理结果。S509: The first device decrypts the first data processing result by using the decryption information to obtain an initial data processing result.

由于第二模型处理第二数据特征信息时,针对中间得到的初始数据处理结果不会直接输出,而是对其执行加密流程,得到加密后的第一数据处理结果,因此第二设备无法得到该初始数据处理结果。并且,解密信息由第一设备所持有,第二设备无法对第一数据处理结果进行解密,从而保证作为模型提供方的第二设备无法获取数据提供方所提供的数据,保障了数据的安全性。When the second model processes the second data feature information, it doesn't directly output the initial data processing result. Instead, it performs an encryption process on it to obtain the encrypted first data processing result. Therefore, the second device cannot obtain this initial data processing result. Furthermore, the decryption information is held by the first device, preventing the second device from decrypting the first data processing result. This ensures that the second device, as the model provider, cannot access the data provided by the data provider, thus ensuring data security.

接下来,将针对第一设备和第二设备两侧分别对应的技术细节进行详细介绍。Next, the technical details corresponding to the first device and the second device will be introduced in detail.

针对第一设备侧,本申请可以应用于多种数据的数据处理场景。以文本信息类型的数据处理为例,在一种可能的实现方式中,第一设备可以通过以下方式对文本信息进行处理,得到用于输入模型的第一数据。For the first device side, this application can be applied to data processing scenarios of various data. Taking data processing of text information type as an example, in one possible implementation, the first device can process the text information in the following manner to obtain first data for input into the model.

第一设备可以获取待处理文本信息,待处理文本信息即为需要通过数据处理模型得到数据处理结果的信息,例如可以为需要输入查询模型的查询文本信息等。为了使数据处理模型能够更好的理解文本信息的信息构成,从而能够进行更加准确的数据处理,第一设备可以对待处理文本信息进行分词处理,得到待处理文本信息对应的分词集合,分词集合中包括多个分词,分词可以为文本信息中的字、词组、短语、短句等,此处不作限定。分词处理可以采用多种分词方法,例如可以采用字节对码化(Byte-Pair Encoding,简称BPE)、单词片段(WordPiece),句子片段(SentencePiece)等方法,此处不作限定。The first device can obtain text information to be processed, which is information that needs to obtain data processing results through a data processing model, such as query text information that needs to be input into a query model. In order to enable the data processing model to better understand the information composition of the text information, thereby being able to perform more accurate data processing, the first device can perform word segmentation processing on the text information to be processed, and obtain a word segmentation set corresponding to the text information to be processed. The word segmentation set includes multiple word segments, and the word segments can be characters, phrases, short sentences, etc. in the text information, which are not limited here. Word segmentation processing can adopt a variety of word segmentation methods, such as Byte-Pair Encoding (BPE), word fragments (WordPiece), sentence fragments (SentencePiece), etc., which are not limited here.

然后,为了使数据处理模型能够理解各个分词的含义,第一设备可以确定多个分词分别对应的分词编码,从而将文本信息转换为模型能够理解的编码信息。分词编码用于表征对应的分词,不同分词对应的分词编码有所不同,从而能够通过分词编码对分词进行有效表征。分词编码的编码方案可以包括多种,例如可以采用8位元编码(Universal Character Set/Unicode Transformation Format,简称utf-8)等,此处不作限定。第一设备可以根据多个分词分别对应的分词编码,生成第一数据,从而使数据处理模型能够充分理解待处理文本信息的信息内容。Then, in order to enable the data processing model to understand the meaning of each segmentation, the first device can determine the segmentation codes corresponding to the multiple segmentations, thereby converting the text information into coding information that the model can understand. The segmentation code is used to represent the corresponding segmentation, and the segmentation codes corresponding to different segmentations are different, so that the segmentations can be effectively represented by the segmentation code. The coding schemes for segmentation codes can include multiple, for example, 8-bit coding (Universal Character Set/Unicode Transformation Format, referred to as utf-8), etc., which are not limited here. The first device can generate the first data according to the segmentation codes corresponding to the multiple segmentations, so that the data processing model can fully understand the information content of the text information to be processed.

其中,为了进一步提高数据处理精度,除了分词本身对应的信息内容外,第一设备还可以将其他有助于数据处理的信息融入到第一数据中。In order to further improve the accuracy of data processing, in addition to the information content corresponding to the word segmentation itself, the first device can also integrate other information that is helpful for data processing into the first data.

例如,在一种可能的实现方式中,在确定多个分词分别对应的分词编码时,第一设备不仅可以针对多个分词中的每个分词,确定其对应的分词编码,还可以确定其对应的分段编码。具体的,待处理文本信息包括多个单元分别对应的文本信息,单元是指能够包括多个分词的文本信息单元,例如可以为句单元、段落单元等。通常情况下,处于同一单元中的多个分词之间的关联关系更为密切,不同单元中的分词之间的关联关系较弱,因此,通过标识出多个分词分别所属的单元,能够有助于数据处理模型分析分词之间的关联关系,进而能够进行更加准确的数据处理。For example, in one possible implementation, when determining the segmentation codes corresponding to a plurality of segmentations, the first device can not only determine the segmentation code corresponding to each of the plurality of segmentations, but also determine the segmentation code corresponding to each of the plurality of segmentations. Specifically, the text information to be processed includes text information corresponding to a plurality of units, where a unit refers to a text information unit that can include a plurality of segmentations, such as a sentence unit, a paragraph unit, etc. Normally, the association between a plurality of segmentations in the same unit is closer, and the association between the segmentations in different units is weaker. Therefore, by identifying the units to which the plurality of segmentations belong, it can help the data processing model analyze the association between the segmentations, thereby enabling more accurate data processing.

基于此,第一设备可以针对多个分词中的每个分词,确定该分词对应的分段编码,该分段编码用于表征该分词所处的单元。相应地,在根据多个分词分别对应的分词编码,生成第一数据时,第一设备可以根据多个分词分别对应的分词编码和分段编码,生成第一数据,从而使模型能够基于分段编码清楚的获知多个分词之间的关联关系,提高数据处理精度。Based on this, the first device can determine, for each of the multiple segmentations, a segmentation code corresponding to the segmentation, where the segmentation code is used to represent the unit in which the segmentation is located. Accordingly, when generating first data based on the segmentation codes corresponding to the multiple segmentations, the first device can generate the first data based on the segmentation codes and segmentation codes corresponding to the multiple segmentations, thereby enabling the model to clearly understand the association relationship between the multiple segmentations based on the segmentation codes, thereby improving data processing accuracy.

除了所属单元之间的关系外,不同分词之间的位置关系也是影响文本信息内容的重要因素,相同的分词分布在不同文本信息位置时,所带来的信息内容可能会有所不同,且多个分词之间的位置关系也能够在一定程度上影响由多个分词构成的文本信息的信息内容。从而,在另一种可能的实现方式中,第一设备还可以针对多个分词中的每个分词,确定该分词对应的位置编码,位置编码用于标识分词在待处理文本信息中的位置分布,例如,可以标识分词为待处理文本信息中的第一个分词、第二个分词等。当待处理文本信息中只包括一个单元时,该位置编码可以直接标识分词在待处理文本信息中的位置关系,当待处理文本信息中包括多个单元时,位置编码可以标识分词在所属单元中的位置关系,例如多个词语在同一句子中的位置分布等。In addition to the relationship between the units to which they belong, the positional relationship between different segmentations is also an important factor affecting the content of the text information. When the same segmentation is distributed in different text information positions, the information content it brings may be different, and the positional relationship between multiple segmentations can also affect the information content of the text information composed of multiple segmentations to a certain extent. Therefore, in another possible implementation method, the first device can also determine the position code corresponding to each segmentation in the multiple segmentations. The position code is used to identify the position distribution of the segmentation in the text information to be processed. For example, the segmentation can be identified as the first segmentation, the second segmentation, etc. in the text information to be processed. When the text information to be processed includes only one unit, the position code can directly identify the positional relationship of the segmentation in the text information to be processed. When the text information to be processed includes multiple units, the position code can identify the positional relationship of the segmentation in the unit to which it belongs, such as the position distribution of multiple words in the same sentence.

在根据多个分词分别对应的分词编码,生成第一数据时,第一设备可以根据多个分词分别对应的分词编码和位置编码,生成第一数据,从而可以使模型能够获知多个分词之间的位置关系,有助于模型进一步理解待处理文本信息的文本语义,进而能够提高数据处理精度。When generating the first data based on the segmentation codes corresponding to multiple segmentations, the first device can generate the first data based on the segmentation codes and position codes corresponding to the multiple segmentations, so that the model can know the positional relationship between the multiple segmentations, which helps the model further understand the text semantics of the text information to be processed, and thus improve the data processing accuracy.

上已述及,分词和分词编码具有较强的对应关系,即通常情况下,分词和分词编码一一对应,以表征出每个分词的独特性。为了进一步提高数据处理效率,在一种可能的实现方式中,可以构建编码映射关系,使第一设备能够基于编码映射关系实现分词编码的快速确定。As mentioned above, there is a strong correspondence between word segmentation and word segmentation encoding. That is, under normal circumstances, there is a one-to-one correspondence between word segmentation and word segmentation encoding, which characterizes the uniqueness of each word segmentation. To further improve data processing efficiency, in one possible implementation, a code mapping relationship can be established, allowing the first device to quickly determine the word segmentation encoding based on the code mapping relationship.

在确定多个分词分别对应的分词编码,第一设备可以根据编码映射关系,确定多个分词分别对应的分词编码,其中,编码映射关系用于记录分词与分词编码之间的映射关系,从而只需通过简单的映射关系查找即可确定分词编码,简化了分词编码确定流程,一方面能够进一步提高数据处理效率,另一方面能够降低第一设备的数据处理压力,使本申请的数据处理方法能够应用于更多数据处理性能较差的设备上,扩大数据处理的泛用性。When determining the segmentation codes corresponding to multiple segmentations, the first device can determine the segmentation codes corresponding to the multiple segmentations based on the code mapping relationship, wherein the code mapping relationship is used to record the mapping relationship between the segmentations and the segmentation codes, so that the segmentation codes can be determined by a simple mapping relationship search, which simplifies the segmentation code determination process. On the one hand, it can further improve the data processing efficiency, and on the other hand, it can reduce the data processing pressure of the first device, so that the data processing method of the present application can be applied to more devices with poor data processing performance, thereby expanding the versatility of data processing.

可以理解的是,由于分词种类、数量较多,可能存在部分分词没有已经生成过的分词编码,因此可能在编码映射关系中不具有部分分词对应的分词编码。此时,第一设备可以通过其他方式,例如可以通过编码算法来生成这种分词对应的分词编码。为了提高编码映射关系的全面性,在编码映射关系中不具有某个分词的情况下,第一设备可以将该分词与分词编码之间的映射关系更新到编码映射关系中,从而在下一次需要针对该分词生成分词编码时,可以直接基于编码映射关系来生成,无需再次进行其他编码处理。It is understandable that due to the large number of types and quantities of word segmentations, there may be some word segmentation codes that have not been generated, so there may not be word segmentation codes corresponding to some word segmentations in the code mapping relationship. At this time, the first device can generate the word segmentation codes corresponding to such word segmentations in other ways, such as through a coding algorithm. In order to improve the comprehensiveness of the coding mapping relationship, when a certain word segmentation is not included in the coding mapping relationship, the first device can update the mapping relationship between the word segmentation and the word segmentation code to the coding mapping relationship, so that the next time a word segmentation code needs to be generated for the word segmentation, it can be generated directly based on the coding mapping relationship without the need for other coding processing.

接下来,将针对作为模型提供方的第二设备进行详细的技术介绍。Next, a detailed technical introduction will be given to the second device serving as the model provider.

首先,可以理解的是,由于本申请将原本完整的模型架构拆分为了第一模型架构和第二模型架构,因此可能会对数据处理结果带来一定的影响。基于此,在一种可能的实现方式中,在进行模型架构拆分时,第二设备可以设置第一程度阈值,该第一程度阈值用于衡量模型架构拆分对于数据处理精度的影响。第二设备可以保障第一数据处理结果与第二数据处理结果之间的差异程度小于第一程度阈值,第二数据处理结果为通过数据处理模型处理第一数据得到的数据处理结果。即,在拆分模型架构时,第二设备可以保障基于拆分后的模型架构得到的数据处理结果与基于完整的模型架构得到的数据处理结果较为接近,从而能够保障通过本申请的方式进行数据处理的处理精度。First of all, it can be understood that since the present application splits the original complete model architecture into a first model architecture and a second model architecture, it may have a certain impact on the data processing results. Based on this, in a possible implementation method, when splitting the model architecture, the second device can set a first degree threshold, and the first degree threshold is used to measure the impact of the model architecture splitting on the data processing accuracy. The second device can ensure that the degree of difference between the first data processing result and the second data processing result is less than the first degree threshold, and the second data processing result is the data processing result obtained by processing the first data through the data processing model. That is, when splitting the model architecture, the second device can ensure that the data processing result obtained based on the split model architecture is close to the data processing result obtained based on the complete model architecture, thereby ensuring the processing accuracy of data processing performed in the manner of the present application.

具体的,在一种可能的实现方式中,第二设备可以通过以下方式进行模型拆分:Specifically, in a possible implementation, the second device may perform model splitting in the following manner:

首先,第二设备可以获取第二数据,第二数据具有对应的样本数据处理结果,样本数据处理结果为通过数据处理模型对第二数据进行处理的结果,即通过完整的模型架构进行数据处理后得到的数据处理结果。First, the second device can obtain the second data, and the second data has a corresponding sample data processing result. The sample data processing result is the result of processing the second data through the data processing model, that is, the data processing result obtained after data processing through the complete model architecture.

第二设备可以基于初始结构分段方式,对数据处理模型对应的模型架构进行分段,得到第一初始模型架构和第二初始模型架构,第一初始模型架构和第二初始模型架构可以构成数据处理模型对应的模型架构。第二设备可以构建第一初始模型和第二初始模型,其中,第一初始模型架构为第一初始模型对应的模型架构,第二初始模型架构为第二初始模型对应的模型架构,从而经过第一初始模型和第二初始模型数据处理,能够模拟经过数据处理模型的数据处理。The second device can segment the model architecture corresponding to the data processing model based on the initial structure segmentation method to obtain a first initial model architecture and a second initial model architecture. The first initial model architecture and the second initial model architecture can constitute the model architecture corresponding to the data processing model. The second device can construct a first initial model and a second initial model, wherein the first initial model architecture is the model architecture corresponding to the first initial model, and the second initial model architecture is the model architecture corresponding to the second initial model, so that data processing by the first initial model and the second initial model can simulate data processing by the data processing model.

第二设备可以通过第一初始模型,根据第二数据生成第三数据特征信息,第三数据特征信息用于表征第二数据对应的数据特征,然后通过第二初始模型,根据第四数据特征信息生成待定数据处理结果,第四数据特征信息为通过在第三数据特征信息中添加噪声信息得到的,从而能够表征出噪声信息对第三数据特征信息的影响。其中,噪声信息的添加方式可以包括多种,例如,若需要对噪声信息进行保密,则可以由第一设备进行添加,从而保障第二设备在后续处理流程中,无法基于噪声信息对数据进行还原;若无需对噪声信息进行保密,则可以由第二设备进行添加。The second device can generate third data feature information based on the second data through the first initial model. The third data feature information is used to characterize the data features corresponding to the second data. Then, through the second initial model, the device generates a pending data processing result based on the fourth data feature information. The fourth data feature information is obtained by adding noise information to the third data feature information, thereby characterizing the impact of the noise information on the third data feature information. There are multiple ways to add noise information. For example, if the noise information needs to be kept confidential, it can be added by the first device to ensure that the second device cannot restore the data based on the noise information in subsequent processing. If the noise information does not need to be kept confidential, it can be added by the second device.

此外,为了保障在实际应用时数据处理的精确度,在针对结构分段方式进行分析时,所应用的噪声信息可以和实际应用时的噪声信息一致,从而保障噪声信息对于模型数据处理过程的影响程度较为接近,避免额外因素的干扰。In addition, in order to ensure the accuracy of data processing in actual applications, the noise information used in the analysis of the structural segmentation method can be consistent with the noise information in actual applications, thereby ensuring that the impact of the noise information on the model data processing process is relatively close and avoiding interference from additional factors.

上已述及,样本数据处理结果为基于完整的模型架构对第二数据进行处理的处理结果,而待定数据处理结果为基于两部分分开的模型架构对第二数据进行处理的处理结果,因此,待定数据处理结果与样本数据处理结果之间的差异,能够表征出分段后的模型架构对于数据处理准确度的影响,差异越小,则说明基于初始结构分段方式对模型架构进行分段时,对于数据处理准确度的影响越小。从而,第二设备可以根据该差异调节初始结构分段方式,得到结构分段方式,通过结构分段方式确定出的待定数据处理结果与样本数据处理结果之间的差异程度小于第一程度阈值,进而能够保障基于该结构分段方式对数据处理模型进行分段后,所得到的第一模型和第二模型能够较为准确的对数据进行处理。As mentioned above, the sample data processing result is the processing result of processing the second data based on the complete model architecture, while the pending data processing result is the processing result of processing the second data based on the two-part model architecture. Therefore, the difference between the pending data processing result and the sample data processing result can characterize the impact of the segmented model architecture on the data processing accuracy. The smaller the difference, the smaller the impact on the data processing accuracy when the model architecture is segmented based on the initial structural segmentation method. Therefore, the second device can adjust the initial structural segmentation method according to the difference to obtain the structural segmentation method. The degree of difference between the pending data processing result and the sample data processing result determined by the structural segmentation method is less than the first degree threshold, thereby ensuring that after the data processing model is segmented based on the structural segmentation method, the first model and the second model obtained can process the data more accurately.

第二设备可以基于结构分段方式对数据处理模型对应的模型架构进行分段,得到第一模型架构和第二模型架构,其中第一模型架构可以用于构成上述第一模型和第三模型,第二模型架构可以用于构成第二模型。The second device can segment the model architecture corresponding to the data processing model based on a structural segmentation method to obtain a first model architecture and a second model architecture, wherein the first model architecture can be used to construct the above-mentioned first model and third model, and the second model architecture can be used to construct the second model.

此外,通过上述内容可见,噪声信息也是影响数据处理准确度的因素之一。由于在本申请中,噪声信息的目的在于改变数据,而不在于影响数据处理结果,因此可以基于噪声信息对数据处理结果的影响程度来选择噪声信息的种类。在本申请中,可以预设第二程度阈值,该第二程度阈值用于衡量噪声信息是否对数据处理结果有较大影响,所选择的噪声信息满足第一数据处理结果与第二数据处理结果之间的差异程度小于第二程度阈值,第二数据处理结果为通过数据处理模型处理第一数据得到的数据处理结果,从而能够保障不会因为添加噪声信息而导致数据处理结果不准确,影响数据提供方的模型使用效果。In addition, it can be seen from the above that noise information is also one of the factors that affect the accuracy of data processing. Since the purpose of noise information in this application is to change the data rather than to affect the data processing results, the type of noise information can be selected based on the degree of influence of the noise information on the data processing results. In this application, a second degree threshold can be preset, which is used to measure whether the noise information has a greater impact on the data processing results. The selected noise information satisfies the difference between the first data processing result and the second data processing result less than the second degree threshold. The second data processing result is the data processing result obtained by processing the first data through the data processing model, thereby ensuring that the data processing results will not be inaccurate due to the addition of noise information, affecting the model usage effect of the data provider.

为了便于理解本申请提供的技术方案,接下来,将结合一种实际应用场景,对本申请提供的技术方案进行介绍。In order to facilitate understanding of the technical solution provided by this application, the technical solution provided by this application will be introduced below in combination with an actual application scenario.

参见图6,图6为本申请实施例提供的一种实际应用场景中数据处理方法的信令图,在该实际应用场景中,计算机设备包括作为数据提供方的第一设备和作为模型提供方的第二设备,该方法包括:Referring to FIG. 6 , FIG. 6 is a signaling diagram of a data processing method in an actual application scenario provided by an embodiment of the present application. In this actual application scenario, a computer device includes a first device as a data provider and a second device as a model provider. The method includes:

S601:第一设备确定第一数据对应的加密信息和解密信息。S601: The first device determines encryption information and decryption information corresponding to first data.

在本申请中,原始数据可以为文本信息,第一数据可以为经过分词和编码处理后的分词编码结果U。在U中可以添加分段编码信息(segment encoding)、位置编码信息(position encoding)等,此处不做限定。In this application, the original data may be text information, and the first data may be the word segmentation encoding result U after word segmentation and encoding processing. Segment encoding information (segment encoding), position encoding information (position encoding), etc. may be added to U, which is not limited here.

S602:第一设备对第一数据进行分片处理,得到第一数据分片和第二数据分片。S602: The first device performs fragmentation processing on the first data to obtain a first data fragment and a second data fragment.

第一设备可以将分词结果U分片,得到第一数据分片和第二数据分片第一设备持有第二设备持有在本实际应用场景中,<>A格式的数据均表示数据分片,其中,表示第一模型处理的数据分片,表示第三模型处理的数据分片。The first device may fragment the word segmentation result U to obtain a first data fragment and the second data shard First device holder Second device holder In this actual application scenario, data in the <>A format all represent data fragments, where: Indicates the data shard processed by the first model, Indicates the data shards processed by the third model.

S603:第一设备通过第一模型,根据第一数据分片生成第一子特征信息。S603: The first device generates first sub-feature information according to the first data slice using the first model.

在本申请中,第一模型架构可以为图7所示的模型架构的N次循环,输入数据在一次循环中,依次经过多头注意力机制层(Multi-head Attention)、归一化层(Layer Norm)、前馈层(Feed Forwards)和第二个归一化层。第一模型和第三模型均对应该第一模型架构。In this application, the first model architecture can be N cycles of the model architecture shown in Figure 7. In one cycle, the input data passes through a multi-head attention layer, a normalization layer, a feed-forward layer, and a second normalization layer. The first and third models both correspond to this first model architecture.

S604:第一设备向第二设备发送第二数据分片,指示第二设备通过第三模型,根据第二数据分片生成第二子特征信息。S604: The first device sends the second data slice to the second device, instructing the second device to generate second sub-feature information according to the second data slice using the third model.

S605:第二设备获取第一设备发送的第二数据分片。S605: The second device obtains the second data fragment sent by the first device.

S606:第二设备通过第三模型,根据第二数据分片生成第二子特征信息。S606: The second device generates second sub-feature information according to the second data slice using the third model.

第一模型架构中的各层作用如下:The functions of each layer in the first model architecture are as follows:

1、多头注意力机制层:是由多个注意力机制层(Attention)拼接而成,处理方式如下公式所示:1. Multi-head attention mechanism layer: It is composed of multiple attention mechanism layers (Attention), and the processing method is shown in the following formula:

(<A0>A||<A1>A||…||<Ah>A)<WO>A ( <A 0 > A || <A 1 > A ||…|| <A h > A )<WO> A

其中,W0为模型权重,其中每个attention的计算如下公式所示:Among them, W0 is the model weight, and the calculation of each attention is shown in the following formula:

其中<Q>A=<X>A<WQ>A,<K>A=<X>A<WK>A,<V>A=<X>A<WV>A,<X>A为输入模型的数据分片,<WQ>A,<WK>A与<WV>A均为模型权重,即模型参数。其中,第一模型架构应用于第一模型和第三模型,数据处理模型对应第一模型架构的模型参数可以切分为第一模型参数和第二模型参数这两个数据分片,其中,第一模型对应的第一模型参数为第三模型对应的第二模型参数为<A>A为各个注意力机制计算的输出,通过多头注意力机制层,可以强化模型对于各个分词与文本信息整体之间的关联关系,从而加深模型对于输入数据的理解。Where <Q> A = <X> A <WQ> A , <K> A = <X> A <WK> A , <V> A = <X> A <WV> A , <X> A is the data slice of the input model, <WQ> A , <WK> A and <WV> A are all model weights, i.e., model parameters. The first model architecture is applied to the first model and the third model, and the model parameters of the data processing model corresponding to the first model architecture can be divided into two data slices, the first model parameters and the second model parameters. The first model parameter corresponding to the first model is and The second model parameters corresponding to the third model are and <A> A is the output of each attention mechanism calculation. Through the multi-head attention mechanism layer, the model can strengthen the relationship between each word segmentation and the overall text information, thereby deepening the model's understanding of the input data.

2、归一化层:通过第一模型和第三模型可以分别进行如下计算:
2. Normalization layer: The following calculations can be performed through the first and third models respectively:

<Xi>A为经过各个注意力机制层的输出,最后归一化层输出如下所示:<X i > A is the output of each attention mechanism layer, and the final normalized layer output is as follows:

LayerNorm(<X>A)=<G>A/<σ>A⊙(<Xi>A-<μ>A)+<B>A LayerNorm(<X> A )=<G> A /<σ> A ⊙(<X i > A -<μ> A )+<B> A

其中,G与B为模型参数中的超参数。Among them, G and B are hyperparameters in the model parameters.

3、前馈层:通过第一模型和第三模型可以分别进行如下计算:3. Feedforward layer: The following calculations can be performed through the first and third models respectively:

<Z>A=<W1>Af(<X>A<W0>A)<Z> A =<W 1 > A f(<X> A <W 0 > A )

其中,W0与W1为模型参数中的权重参数,f(·)为激活函数,例如可以为激活函数高斯误差线性单元GeLu。Wherein, W 0 and W 1 are weight parameters in the model parameters, and f(·) is an activation function, for example, the activation function can be a Gaussian error linear unit GeLu.

通过上述过程,第一模型可以输出第一子特征信息第三模型可以输出第二子特征信息 Through the above process, the first model can output the first sub-feature information The third model can output the second sub-feature information

S607:第二设备向第一设备发送第二子特征信息,以指示第一设备根据第一子特征信息和第二子特征信息确定第一数据特征信息。S607: The second device sends the second sub-feature information to the first device to instruct the first device to determine the first data feature information according to the first sub-feature information and the second sub-feature information.

S608:第一设备获取第二设备发送的第二子特征信息。S608: The first device obtains the second sub-feature information sent by the second device.

S609:第一设备根据第一子特征信息和第二子特征信息,确定第一数据特征信息。S609: The first device determines first data feature information according to the first sub-feature information and the second sub-feature information.

第一数据特征信息X的结合方式可以如下公式所示:
The combination method of the first data feature information X can be expressed as follows:

S610:第一设备在第一数据特征信息中添加噪声信息,得到第二数据特征信息。S610: The first device adds noise information to the first data feature information to obtain second data feature information.

第一设备可以在第一数据特征信息X添加高斯噪声作为噪声信息,其中,σ为可配置参数,得到第二数据特征信息X′。The first device may add Gaussian noise to the first data feature information X As the noise information, where σ is a configurable parameter, the second data feature information X′ is obtained.

S611:第一设备向第二设备发送第二数据特征信息和加密信息,以指示第二设备通过第二模型,根据第二数据特征信息和加密信息生成第一数据处理结果。S611: The first device sends second data feature information and encryption information to the second device to instruct the second device to generate a first data processing result according to the second data feature information and encryption information through a second model.

S612:第二设备获取第一设备发送的第二数据特征信息。S612: The second device obtains the second data characteristic information sent by the first device.

S613:第二设备通过第二模型,根据第二数据特征信息生成初始数据处理结果,以及根据加密信息对初始数据处理结果进行加密,输出第一数据处理结果。S613: The second device generates an initial data processing result according to the second data feature information through the second model, encrypts the initial data processing result according to the encryption information, and outputs the first data processing result.

S614:第二设备向第一设备发送第一数据处理结果,以指示第一设备通过解密信息,对第一数据处理结果进行解密,得到初始数据处理结果。S614: The second device sends the first data processing result to the first device to instruct the first device to decrypt the first data processing result through decryption information to obtain an initial data processing result.

S615:第一设备获取第二设备发送的第一数据处理结果。S615: The first device obtains the first data processing result sent by the second device.

S616:第一设备通过解密信息,对第一数据处理结果进行解密,得到初始数据处理结果。S616: The first device decrypts the first data processing result by using the decryption information to obtain an initial data processing result.

需要强调的是,本申请涉及的模型均可以采用具有相似功能的多种模型架构,在本申请实施例中只是采用一种进行举例,并不做限定。It should be emphasized that the models involved in this application can adopt multiple model architectures with similar functions. In the embodiments of this application, only one model architecture is used as an example and is not limited.

通过上述过程可见,本申请技术方案相较于相关技术,具有以下多个方面的技术效果进步:From the above process, it can be seen that the technical solution of this application has the following technical effects and improvements compared with the related art:

1、本申请可以通过模型架构拆分、模型参数拆分等方式,使数据提供方无法获得完整的数据处理模型,保障了模型提供方的模型安全。1. This application can prevent data providers from obtaining a complete data processing model by splitting the model architecture and model parameters, thereby ensuring the model security of the model provider.

2、本申请可以通过添加噪声信息、数据处理结果加密、对输入数据进行分片等方式,保障数据提供方在输入数据和输出结果两个维度的数据安全。2. This application can ensure the data security of data providers in both input data and output results by adding noise information, encrypting data processing results, and sharding input data.

3、本申请可以通过建立编码映射关系等方式,提高数据提供方的数据处理效率,降低数据提供方的数据处理压力,从而进一步提高数据处理方法的泛用性。3. This application can improve the data processing efficiency of the data provider and reduce the data processing pressure of the data provider by establishing a coding mapping relationship, thereby further improving the versatility of the data processing method.

4、本申请可以通过对架构分段方式和噪声信息进行调节,来降低模型架构分段和噪声信息添加对数据处理结果准确度的影响,保障数据处理精度。4. This application can reduce the impact of model architecture segmentation and noise information addition on the accuracy of data processing results by adjusting the architecture segmentation method and noise information, thereby ensuring data processing accuracy.

基于上述实施例提供的应用于第一设备的数据处理方法,本申请还提供了一种数据处理装置,参见图8,图8为本申请实施例提供的一种数据处理装置的结构框图,该装置800包括第一生成单元801、第一添加单元802和第一发送单元803:Based on the data processing method applied to the first device provided in the above embodiment, the present application further provides a data processing device. Referring to FIG8 , FIG8 is a structural block diagram of a data processing device provided in an embodiment of the present application. The device 800 includes a first generating unit 801, a first adding unit 802, and a first sending unit 803:

所述第一生成单元801,用于通过所述第一模型,根据第一数据生成第一数据特征信息,所述第一数据特征信息用于表征所述第一数据的数据特征;The first generating unit 801 is configured to generate first data feature information according to the first data using the first model, where the first data feature information is used to represent data features of the first data;

所述第一添加单元802,用于在所述第一数据特征信息中添加噪声信息,得到第二数据特征信息;The first adding unit 802 is configured to add noise information to the first data feature information to obtain second data feature information;

所述第一发送单元803,用于向第二设备发送所述第二数据特征信息,以指示所述第二设备通过第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。The first sending unit 803 is used to send the second data characteristic information to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information through a second model. The first model and the second model are used to constitute a data processing model, and the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.

在一种可能的实现方式中,所述第一生成单元801具体用于:In a possible implementation, the first generating unit 801 is specifically configured to:

对所述第一数据进行分片处理,得到第一数据分片和第二数据分片,所述第一数据分片和第二数据分片用于构成所述第一数据;Slice the first data to obtain a first data slice and a second data slice, wherein the first data slice and the second data slice are used to constitute the first data;

通过所述第一模型,根据所述第一数据分片生成第一子特征信息,所述第一子特征信息用于表征所述第一数据分片的数据特征,所述第一模型具有第一模型参数;generating, using the first model, first sub-feature information according to the first data slice, where the first sub-feature information is used to characterize data features of the first data slice, and the first model has first model parameters;

向所述第二设备发送所述第二数据分片,指示所述第二设备通过第三模型,根据所述第二数据分片生成第二子特征信息,所述第二子特征信息用于表征所述第二数据分片的数据特征,所述第三模型与所述第一模型的架构相同,所述第三模型具有第二模型参数,所述第一模型参数和所述第二模型参数用于构成目标模型参数,所述目标模型参数为所述数据处理模型中的所述第一模型的模型参数;Sending the second data slice to the second device, instructing the second device to generate second sub-feature information based on the second data slice using a third model, where the second sub-feature information is used to characterize data features of the second data slice, wherein the third model has the same architecture as the first model, the third model has second model parameters, and the first model parameters and the second model parameters are used to constitute target model parameters, where the target model parameters are model parameters of the first model in the data processing model;

获取所述第二设备发送的所述第二子特征信息;Obtaining the second sub-feature information sent by the second device;

根据所述第一子特征信息和所述第二子特征信息,确定所述第一数据特征信息。The first data feature information is determined according to the first sub-feature information and the second sub-feature information.

在一种可能的实现方式中,所述第一数据分片与所述第二数据分片的数据大小相同。In a possible implementation, the first data slice and the second data slice have the same data size.

在一种可能的实现方式中,所述装置还包括第一获取单元、分词单元、第一确定单元和第二生成单元:In a possible implementation, the apparatus further includes a first acquisition unit, a word segmentation unit, a first determination unit, and a second generation unit:

所述第一获取单元,用于获取待处理文本信息;The first acquiring unit is used to acquire the text information to be processed;

所述分词单元,用于对所述待处理文本信息进行分词处理,得到所述待处理文本信息中包括的多个分词;The word segmentation unit is used to perform word segmentation processing on the text information to be processed to obtain a plurality of word segments included in the text information to be processed;

所述第一确定单元,用于确定所述多个分词分别对应的分词编码;The first determining unit is used to determine the word segmentation codes corresponding to the multiple word segmentations respectively;

所述第二生成单元,用于根据所述多个分词分别对应的分词编码,生成所述第一数据。The second generating unit is configured to generate the first data according to the word segmentation codes corresponding to the multiple word segmentations.

在一种可能的实现方式中,In one possible implementation,

所述待处理文本信息包括多个单元分别对应的文本信息,所述装置还包括第二确定单元:The text information to be processed includes text information corresponding to a plurality of units respectively, and the apparatus further includes a second determining unit:

所述第二确定单元,用于针对所述多个分词中的每个分词,确定所述分词对应的分段编码,所述分段编码用于表征所述分词所处的单元;The second determining unit is configured to determine, for each of the multiple segmentations, a segmentation code corresponding to the segmentation, wherein the segmentation code is used to represent the unit in which the segmentation is located;

所述第二生成单元具体用于:The second generating unit is specifically configured to:

根据所述多个分词分别对应的分词编码和分段编码,生成所述第一数据。The first data is generated according to the word segmentation codes and segmentation codes corresponding to the multiple word segmentations.

在一种可能的实现方式中,所述装置还包括第三确定单元:In a possible implementation, the apparatus further includes a third determining unit:

所述第三确定单元,用于针对所述多个分词中的每个分词,确定所述分词对应的位置编码,所述位置编码用于表征所述目标分词在所述待处理文本信息中的位置分布;The third determining unit is configured to determine, for each of the multiple segmented words, a position code corresponding to the segmented word, wherein the position code is used to represent a position distribution of the target segmented word in the text information to be processed;

所述第二生成单元具体用于:The second generating unit is specifically configured to:

根据所述多个分词分别对应的分词编码和位置编码,生成所述第一数据。The first data is generated according to the word segmentation codes and position codes corresponding to the multiple word segmentations.

在一种可能的实现方式中,所述第一确定单元具体用于:In a possible implementation manner, the first determining unit is specifically configured to:

根据编码映射关系,确定所述多个分词分别对应的分词编码,所述编码映射关系用于记录分词与分词编码之间的映射关系;Determine the segmentation codes corresponding to the multiple segmentations according to the code mapping relationship, wherein the code mapping relationship is used to record the mapping relationship between the segmentations and the segmentation codes;

所述装置还包括更新单元:The device further comprises an updating unit:

所述更新单元,用于在所述编码映射关系中不具有所述分词的情况下,将所述分词与所述分词编码之间的映射关系,更新到所述编码映射关系中。The updating unit is configured to update the mapping relationship between the word segmentation and the word segmentation encoding into the encoding mapping relationship when the word segmentation is not included in the encoding mapping relationship.

在一种可能的实现方式中,所述装置还包括第四确定单元:In a possible implementation, the apparatus further includes a fourth determining unit:

所述第四确定单元,用于确定所述第一数据对应的加密信息和解密信息,所述加密信息用于表征对所述第一数据对应的数据处理结果的加密方式,所述解密信息用于解密通过所述加密方式加密得到的数据;The fourth determining unit is configured to determine encryption information and decryption information corresponding to the first data, wherein the encryption information is used to represent an encryption method for a data processing result corresponding to the first data, and the decryption information is used to decrypt data encrypted using the encryption method;

所述第一发送单元803具体用于:The first sending unit 803 is specifically configured to:

向所述第二设备发送所述第二数据特征信息和所述加密信息,以指示所述第二设备通过第二模型,根据所述第二数据特征信息和所述加密信息生成所述第一数据处理结果,所述第二模型用于根据所述第二数据特征信息生成初始数据处理结果,以及根据所述加密信息对所述初始数据处理结果进行加密,输出所述第一数据处理结果;sending the second data characteristic information and the encryption information to the second device to instruct the second device to generate the first data processing result based on the second data characteristic information and the encryption information using a second model, wherein the second model is configured to generate an initial data processing result based on the second data characteristic information, encrypt the initial data processing result based on the encryption information, and output the first data processing result;

所述装置还包括第二获取单元和解密单元:The device further includes a second acquisition unit and a decryption unit:

所述第二获取单元,用于获取所述第二设备发送的所述第一数据处理结果;The second acquiring unit is configured to acquire the first data processing result sent by the second device;

所述解密单元,用于通过所述解密信息,对所述第一数据处理结果进行解密,得到所述初始数据处理结果。The decryption unit is used to decrypt the first data processing result using the decryption information to obtain the initial data processing result.

基于上述实施例提供的应用于第二设备的数据处理方法,本申请还提供了一种数据处理装置,参见图9,图9为本申请实施例提供的一种数据处理装置的结构框图,该装置900包括第三获取单元901和第三生成单元902:Based on the data processing method applied to the second device provided in the above embodiment, the present application further provides a data processing device. See FIG9 , which is a structural block diagram of a data processing device provided in an embodiment of the present application. The device 900 includes a third acquisition unit 901 and a third generation unit 902:

所述第三获取单元901,用于获取第一设备发送的第二数据特征信息,所述第二数据特征信息为所述第一设备通过在第一数据特征信息中添加噪声信息得到的,所述第一数据特征信息为所述第一设备通过第一模型根据第一数据生成的;The third acquiring unit 901 is configured to acquire second data feature information sent by the first device, where the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device based on the first data using the first model;

所述第三生成单元902,用于通过所述第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。The third generation unit 902 is used to generate a first data processing result based on the second data feature information through the second model. The first model and the second model are used to constitute a data processing model. The first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model.

在一种可能的实现方式中,所述第一数据处理结果与第二数据处理结果之间的差异程度小于第一程度阈值,所述第二数据处理结果为通过所述数据处理模型处理所述第一数据得到的数据处理结果。In a possible implementation, the degree of difference between the first data processing result and the second data processing result is less than a first degree threshold, and the second data processing result is a data processing result obtained by processing the first data through the data processing model.

在一种可能的实现方式中,所述装置还包括第四获取单元、第一分段单元、第四生成单元、第五生成单元、调节单元和第二分段单元:In a possible implementation, the apparatus further includes a fourth acquiring unit, a first segmenting unit, a fourth generating unit, a fifth generating unit, an adjusting unit, and a second segmenting unit:

所述第四获取单元,用于获取第二数据,所述第二数据具有对应的样本数据处理结果,所述样本数据处理结果为通过所述数据处理模型对所述第二数据进行处理的结果;The fourth acquiring unit is configured to acquire second data, where the second data has a corresponding sample data processing result, where the sample data processing result is a result of processing the second data using the data processing model;

所述第一分段单元,用于基于初始结构分段方式,对所述数据处理模型进行分段,得到第一初始模型和第二初始模型;The first segmentation unit is configured to segment the data processing model based on the initial structure segmentation method to obtain a first initial model and a second initial model;

所述第四生成单元,用于通过所述第一初始模型,根据所述第二数据生成第三数据特征信息,所述第三数据特征信息用于表征所述第二数据对应的数据特征;The fourth generating unit is configured to generate third data feature information based on the second data using the first initial model, where the third data feature information is used to represent data features corresponding to the second data;

所述第五生成单元,用于通过所述第二初始模型,根据第四数据特征信息生成待定数据处理结果,所述第四数据特征信息为通过在所述第三数据特征信息中添加所述噪声信息得到的;the fifth generating unit is configured to generate a pending data processing result according to fourth data feature information using the second initial model, where the fourth data feature information is obtained by adding the noise information to the third data feature information;

所述调节单元,用于根据所述待定数据处理结果与所述样本数据处理结果之间的差异,调节所述初始结构分段方式,以得到结构分段方式,通过所述结构分段方式确定出的所述待定数据处理结果与所述样本数据处理结果之间的差异程度小于第一程度阈值;the adjusting unit is configured to adjust the initial structural segmentation mode according to a difference between the pending data processing result and the sample data processing result, so as to obtain a structural segmentation mode, wherein a degree of difference between the pending data processing result and the sample data processing result determined by the structural segmentation mode is less than a first degree threshold;

所述第二分段单元,用于基于所述结构分段方式,对所述数据处理模型进行分段,得到所述第一模型和所述第二模型。The second segmentation unit is used to segment the data processing model based on the structural segmentation method to obtain the first model and the second model.

在一种可能的实现方式中,所述噪声信息满足所述第一数据处理结果与第二数据处理结果之间的差异程度小于第二程度阈值,所述第二数据处理结果为通过所述数据处理模型处理所述第一数据得到的数据处理结果。In a possible implementation, the noise information satisfies that the degree of difference between the first data processing result and the second data processing result is less than a second degree threshold, and the second data processing result is a data processing result obtained by processing the first data through the data processing model.

在一种可能的实现方式中,所述第一设备还用于对所述第一数据进行分片处理,得到第一数据分片和第二数据分片,所述第一数据分片和所述第二数据分片用于构成所述第一数据,所述第二设备还包括第三模型,所述第三模型与所述第一模型的架构相同,所述第三模型具有第二模型参数,所述装置还包括第五获取单元、第六生成单元和第二发送单元:In one possible implementation, the first device is further configured to perform sharding processing on the first data to obtain first data shards and second data shards, wherein the first data shards and the second data shards are used to constitute the first data; the second device further includes a third model, the third model having the same architecture as the first model, and the third model having second model parameters; and the apparatus further includes a fifth acquiring unit, a sixth generating unit, and a second sending unit:

所述第五获取单元,用于获取所述第一设备发送的第二数据分片;The fifth acquiring unit is configured to acquire the second data fragment sent by the first device;

所述第六生成单元,用于通过所述第三模型,根据所述第二数据分片生成第二子特征信息;The sixth generating unit is configured to generate second sub-feature information according to the second data slice using the third model;

所述第二发送单元,用于向所述第一设备发送所述第二子特征信息,以指示所述第一设备根据所述第一子特征信息和所述第二子特征信息确定所述第一数据特征信息,所述第一子特征信息为所述第一设备通过所述第一模型,根据所述第一数据分片生成的特征信息,所述第一模型具有第一模型参数,所述第一模型参数和所述第二模型参数用于构成目标模型参数,所述目标模型参数为所述数据处理模型中的所述第一模型的模型参数。The second sending unit is used to send the second sub-feature information to the first device to instruct the first device to determine the first data feature information based on the first sub-feature information and the second sub-feature information, where the first sub-feature information is feature information generated by the first device according to the first data slice through the first model, and the first model has first model parameters. The first model parameters and the second model parameters are used to constitute target model parameters, and the target model parameters are model parameters of the first model in the data processing model.

在一种可能的实现方式中,所述第三获取单元901具体用于:In a possible implementation, the third acquiring unit 901 is specifically configured to:

获取所述第一设备发送的第二数据特征信息和加密信息,所述加密信息用于表征对所述第一数据对应的数据处理结果的加密方式;Obtaining second data characteristic information and encryption information sent by the first device, where the encryption information is used to represent an encryption method for a data processing result corresponding to the first data;

所述第三生成单元902具体用于:通过所述第二模型,根据所述第二数据特征信息生成初始数据处理结果,以及根据所述加密信息对所述初始数据处理结果进行加密,输出所述第一数据处理结果;The third generating unit 902 is specifically configured to: generate an initial data processing result according to the second data feature information using the second model, encrypt the initial data processing result according to the encryption information, and output the first data processing result;

所述装置还包括第三发送单元:The device further includes a third sending unit:

所述第三发送单元,用于向所述第一设备发送所述第一数据处理结果,以指示所述第一设备通过所述解密信息,对所述第一数据处理结果进行解密,得到所述初始数据处理结果,所述解密信息用于解密通过所述加密方式加密得到的数据。The third sending unit is used to send the first data processing result to the first device to instruct the first device to decrypt the first data processing result through the decryption information to obtain the initial data processing result, and the decryption information is used to decrypt the data encrypted by the encryption method.

本申请实施例还提供了一种计算机设备,请参见图10所示,该计算机设备可以是终端设备,以终端设备为手机为例:The embodiment of the present application further provides a computer device, as shown in FIG10 , which may be a terminal device. For example, the terminal device is a mobile phone.

图10示出的是与本申请实施例提供的终端设备相关的手机的部分结构的框图。参考图10,手机包括:射频(Radio Frequency,简称RF)电路710、存储器720、输入单元730、显示单元740、传感器750、音频电路760、无线保真(Wireless Fidelity,简称WiFi)模块770、处理器780、以及电源790等部件。本领域技术人员可以理解,图10中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。FIG10 is a block diagram showing a partial structure of a mobile phone related to a terminal device provided in an embodiment of the present application. Referring to FIG10 , the mobile phone includes components such as a radio frequency (RF) circuit 710, a memory 720, an input unit 730, a display unit 740, a sensor 750, an audio circuit 760, a wireless fidelity (WiFi) module 770, a processor 780, and a power supply 790. Those skilled in the art will appreciate that the mobile phone structure shown in FIG10 does not limit the mobile phone and may include more or fewer components than shown, or combine certain components, or arrange the components differently.

下面结合图10对手机的各个构成部件进行具体的介绍:The following is a detailed introduction to the various components of the mobile phone in conjunction with Figure 10:

RF电路710可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器780处理;另外,将设计上行的数据发送给基站。通常,RF电路710包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,简称LNA)、双工器等。此外,RF电路710还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,简称GSM)、通用分组无线服务(General Packet Radio Service,简称GPRS)、码分多址(Code Division Multiple Access,简称CDMA)、宽带码分多址(Wideband Code Division Multiple Access,简称WCDMA)、长期演进(Long Term Evolution,简称LTE)、电子邮件、短消息服务(Short Messaging Service,简称SMS)等。The RF circuit 710 can be used to receive and transmit signals during information transmission or calls. Specifically, it receives downlink information from the base station and sends it to the processor 780 for processing. It also transmits uplink data to the base station. Typically, the RF circuit 710 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low-noise amplifier (LNA), a duplexer, and the like. Furthermore, the RF circuit 710 can communicate with the network and other devices via wireless communications. Such wireless communications can utilize any communication standard or protocol, including but not limited to Global System of Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, and Short Messaging Service (SMS).

存储器720可用于存储软件程序以及模块,处理器780通过运行存储在存储器720的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器720可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器720可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 720 can be used to store software programs and modules. The processor 780 executes the various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 720. The memory 720 can mainly include a program storage area and a data storage area. The program storage area can store an operating system and at least one application required for a function (such as a sound playback function, an image playback function, etc.); the data storage area can store data created based on the use of the mobile phone (such as audio data, a phone book, etc.). In addition, the memory 720 can include high-speed random access memory and non-volatile memory, such as at least one disk storage device, a flash memory device, or other volatile solid-state storage device.

输入单元730可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元730可包括触控面板731以及其他输入设备732。触控面板731,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板731上或在触控面板731附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板731可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器780,并能接收处理器780发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板731。除了触控面板731,输入单元730还可以包括其他输入设备732。具体地,其他输入设备732可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The input unit 730 can be used to receive input digital or character information, and to generate key signal input related to the user settings and function control of the mobile phone. Specifically, the input unit 730 may include a touch panel 731 and other input devices 732. The touch panel 731, also known as a touch screen, can collect user touch operations on or near it (such as operations performed by the user using any suitable object or accessory such as a finger, stylus, etc. on or near the touch panel 731) and drive the corresponding connection device according to a pre-set program. Optionally, the touch panel 731 may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch direction and detects the signal caused by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 780. It can also receive commands sent by the processor 780 and execute them. In addition, the touch panel 731 can be implemented using various types such as resistive, capacitive, infrared and surface acoustic wave. In addition to the touch panel 731, the input unit 730 may further include other input devices 732. Specifically, the other input devices 732 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick.

显示单元740可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元740可包括显示面板741,可选的,可以采用液晶显示器(Liquid Crystal Display,简称LCD)、有机发光二极管(Organic Light-Emitting Diode,简称OLED)等形式来配置显示面板741。进一步的,触控面板731可覆盖显示面板741,当触控面板731检测到在其上或附近的触摸操作后,传送给处理器780以确定触摸事件的类型,随后处理器780根据触摸事件的类型在显示面板741上提供相应的视觉输出。虽然在图10中,触控面板731与显示面板741是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板731与显示面板741集成而实现手机的输入和输出功能。The display unit 740 can be used to display information input by the user or information provided to the user, as well as various menus of the mobile phone. The display unit 740 may include a display panel 741. Optionally, the display panel 741 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. Furthermore, a touch panel 731 may overlay the display panel 741. When the touch panel 731 detects a touch operation on or near it, it transmits the information to the processor 780 to determine the type of touch event. The processor 780 then provides a corresponding visual output on the display panel 741 based on the type of touch event. Although in Figure 10, the touch panel 731 and the display panel 741 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 731 and the display panel 741 can be integrated to implement the input and output functions of the mobile phone.

手机还可包括至少一种传感器750,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板741的亮度,接近传感器可在手机移动到耳边时,关闭显示面板741和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。The mobile phone may also include at least one sensor 750, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 741 according to the brightness of the ambient light, and the proximity sensor may turn off the display panel 741 and/or the backlight when the mobile phone is moved to the ear. As a type of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in all directions (generally three axes), and can detect the magnitude and direction of gravity when stationary. It can be used for applications that identify the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for other sensors that the mobile phone can also be configured with, such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., they will not be described here.

音频电路760、扬声器761,传声器762可提供用户与手机之间的音频接口。音频电路760可将接收到的音频数据转换后的电信号,传输到扬声器761,由扬声器761转换为声音信号输出;另一方面,传声器762将收集的声音信号转换为电信号,由音频电路760接收后转换为音频数据,再将音频数据输出处理器780处理后,经RF电路710以发送给比如另一手机,或者将音频数据输出至存储器720以便进一步处理。Audio circuit 760, speaker 761, and microphone 762 provide an audio interface between the user and the phone. Audio circuit 760 converts received audio data into electrical signals and transmits them to speaker 761, which then converts them into sound signals for output. Microphone 762, on the other hand, converts collected sound signals into electrical signals, which are then received by audio circuit 760 and converted into audio data. The audio data is then processed by processor 780 and transmitted to, for example, another phone via RF circuit 710, or stored in memory 720 for further processing.

WiFi属于短距离无线传输技术,手机通过WiFi模块770可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图10示出了WiFi模块770,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。WiFi is a short-range wireless transmission technology. A mobile phone uses WiFi module 770 to help users send and receive emails, browse the web, and access streaming media, providing wireless broadband internet access. Although FIG10 illustrates WiFi module 770, it is understood that it is not a required component of the mobile phone and can be omitted as needed without changing the essence of the invention.

处理器780是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器720内的软件程序和/或模块,以及调用存储在存储器720内的数据,执行手机的各种功能和处理数据,从而对手机进行整体检测。可选的,处理器780可包括一个或多个处理单元;优选的,处理器780可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器780中。Processor 780 is the control center of the phone, connecting all parts of the phone using various interfaces and circuits. By running or executing software programs and/or modules stored in memory 720 and accessing data stored in memory 720, it performs various phone functions and processes data, thereby performing overall phone testing. Optionally, processor 780 may include one or more processing units; preferably, processor 780 may integrate an application processor and a modem processor, where the application processor primarily handles the operating system, user interface, and application programs, while the modem processor primarily handles wireless communications. It is understood that the modem processor may not be integrated into processor 780.

手机还包括给各个部件供电的电源790(比如电池),优选的,电源可以通过电源管理系统与处理器780逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The mobile phone also includes a power supply 790 (such as a battery) for supplying power to various components. Preferably, the power supply can be logically connected to the processor 780 through a power management system, thereby managing charging, discharging, and power consumption management functions through the power management system.

尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。Although not shown, the mobile phone may also include a camera, a Bluetooth module, etc., which will not be described in detail here.

在本实施例中,该终端设备所包括的处理器780还用于执行上述第一设备侧或者第二设备侧的数据处理方法。In this embodiment, the processor 780 included in the terminal device is also used to execute the above-mentioned data processing method on the first device side or the second device side.

本申请实施例还提供一种服务器,请参见图11所示,图11为本申请实施例提供的服务器800的结构图,服务器800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(Central Processing Units,简称CPU)822(例如,一个或一个以上处理器)和存储器832,一个或一个以上存储应用程序842或数据844的存储介质830(例如一个或一个以上海量存储设备)。其中,存储器832和存储介质830可以是短暂存储或持久存储。存储在存储介质830的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器822可以设置为与存储介质830通信,在服务器800上执行存储介质830中的一系列指令操作。The embodiment of the present application also provides a server, as shown in Figure 11. Figure 11 is a structural diagram of the server 800 provided in the embodiment of the present application. The server 800 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPUs) 822 (for example, one or more processors) and memories 832, and one or more storage media 830 (for example, one or more mass storage devices) for storing application programs 842 or data 844. Among them, the memories 832 and the storage media 830 can be temporary storage or permanent storage. The program stored in the storage medium 830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server. Furthermore, the central processing unit 822 can be configured to communicate with the storage medium 830 to execute a series of instruction operations in the storage medium 830 on the server 800.

服务器800还可以包括一个或一个以上电源826,一个或一个以上有线或无线网络接口850,一个或一个以上输入输出接口858,和/或,一个或一个以上操作系统841。The server 800 may also include one or more power supplies 826 , one or more wired or wireless network interfaces 850 , one or more input and output interfaces 858 , and/or one or more operating systems 841 .

上述实施例中由服务器所执行的步骤可以基于图11所示的服务器结构。The steps executed by the server in the above embodiment may be based on the server structure shown in FIG11 .

本申请实施例还提供一种计算机可读存储介质,用于存储计算机程序,该计算机程序用于执行前述各个实施例所述的数据处理方法中的任意一种实施方式。An embodiment of the present application further provides a computer-readable storage medium for storing a computer program, which is used to execute any one of the data processing methods described in the aforementioned embodiments.

本申请实施例还提供了一种包括计算机程序的计算机程序产品,当其在计算机设备上运行时,使得所述计算机设备执行上述实施例中任意一项所述的数据处理方法。An embodiment of the present application further provides a computer program product including a computer program, which, when executed on a computer device, enables the computer device to execute the data processing method described in any one of the above embodiments.

可以理解的是,在本申请的具体实施方式中,涉及到用户信息(如数据提供方提供的数据)等相关的数据,当本申请以上实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It is understandable that in the specific implementation of this application, when it comes to user information (such as data provided by the data provider) and other related data, when the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of relevant data must comply with relevant laws, regulations and standards of relevant countries and regions.

本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质可以是下述介质中的至少一种:只读存储器(英文:read-only memory,缩写:ROM)、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art will understand that all or part of the steps of implementing the above-mentioned method embodiment can be completed by hardware related to program instructions, and the aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps of the above-mentioned method embodiment; and the aforementioned storage medium can be at least one of the following media: read-only memory (English: read-only memory, abbreviated as: ROM), RAM, magnetic disk or optical disk, and other media that can store program codes.

需要说明的是,本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于设备及系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的设备及系统实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。It should be noted that the various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the device and system embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiments. The device and system embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of this embodiment. A person of ordinary skill in the art can understand and implement it without expending creative work.

以上所述,仅为本申请的一种具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。The above is merely one specific embodiment of the present application, but the scope of protection of the present application is not limited thereto. Any changes or substitutions that can be easily conceived by a person skilled in the art within the technical scope disclosed in this application should be included in the scope of protection of the present application. Therefore, the scope of protection of the present application should be based on the scope of protection of the claims.

Claims (19)

一种数据处理方法,所述方法由第一设备执行,所述第一设备中包括第一模型,所述方法包括:A data processing method is performed by a first device, wherein the first device includes a first model, and the method includes: 通过所述第一模型,根据第一数据生成第一数据特征信息,所述第一数据特征信息用于表征所述第一数据的数据特征;generating, using the first model, first data feature information according to the first data, where the first data feature information is used to characterize data features of the first data; 在所述第一数据特征信息中添加噪声信息,得到第二数据特征信息;adding noise information to the first data feature information to obtain second data feature information; 向第二设备发送所述第二数据特征信息,以指示所述第二设备通过第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。The second data characteristic information is sent to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information through a second model, the first model and the second model are used to constitute a data processing model, and the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model. 根据权利要求1所述的方法,所述通过所述第一模型,根据第一数据生成第一数据特征信息,包括:According to the method of claim 1, generating first data feature information based on the first data using the first model includes: 对所述第一数据进行分片处理,得到第一数据分片和第二数据分片,所述第一数据分片和所述第二数据分片用于构成所述第一数据;Slice the first data to obtain a first data slice and a second data slice, wherein the first data slice and the second data slice are used to constitute the first data; 通过所述第一模型,根据所述第一数据分片生成第一子特征信息,所述第一子特征信息用于表征所述第一数据分片的数据特征,所述第一模型具有第一模型参数;generating, using the first model, first sub-feature information according to the first data slice, where the first sub-feature information is used to characterize data features of the first data slice, and the first model has first model parameters; 向所述第二设备发送所述第二数据分片,以指示所述第二设备通过第三模型,根据所述第二数据分片生成第二子特征信息,所述第二子特征信息用于表征所述第二数据分片的数据特征,所述第三模型与所述第一模型的架构相同,所述第三模型具有第二模型参数,所述第一模型参数和所述第二模型参数用于构成目标模型参数,所述目标模型参数为所述数据处理模型中的所述第一模型的模型参数;Sending the second data slice to the second device to instruct the second device to generate second sub-feature information based on the second data slice using a third model, where the second sub-feature information is used to characterize data features of the second data slice, wherein the third model has the same architecture as the first model, the third model has second model parameters, and the first model parameters and the second model parameters are used to constitute target model parameters, which are model parameters of the first model in the data processing model; 获取所述第二设备发送的所述第二子特征信息;Obtaining the second sub-feature information sent by the second device; 根据所述第一子特征信息和所述第二子特征信息,确定所述第一数据特征信息。The first data feature information is determined according to the first sub-feature information and the second sub-feature information. 根据权利要求2所述的方法,所述第一数据分片与所述第二数据分片的数据大小相同。According to the method of claim 2, the data size of the first data slice and the second data slice is the same. 根据权利要求1至3任一项所述的方法,所述方法还包括:The method according to any one of claims 1 to 3, further comprising: 获取待处理文本信息;Get the text information to be processed; 对所述待处理文本信息进行分词处理,得到所述待处理文本信息中包括的多个分词;Performing word segmentation processing on the text information to be processed to obtain a plurality of word segments included in the text information to be processed; 确定所述多个分词分别对应的分词编码;Determine the word segmentation codes corresponding to the multiple word segmentations respectively; 根据所述多个分词分别对应的分词编码,生成所述第一数据。The first data is generated according to the word segmentation codes corresponding to the multiple word segmentations. 根据权利要求4所述的方法,所述待处理文本信息包括多个单元分别对应的文本信息,所述方法还包括:The method according to claim 4, wherein the text information to be processed includes text information corresponding to a plurality of units, the method further comprising: 针对所述多个分词中的每个分词,确定所述分词对应的分段编码,所述分段编码用于表征所述分词所处的单元;For each of the multiple participles, determining a segment code corresponding to the participle, wherein the segment code is used to represent the unit in which the participle is located; 所述根据所述多个分词分别对应的分词编码,生成所述第一数据,包括:Generating the first data according to the word segmentation codes corresponding to the multiple word segmentations includes: 根据所述多个分词分别对应的分词编码和分段编码,生成所述第一数据。The first data is generated according to the word segmentation codes and segmentation codes corresponding to the multiple word segmentations. 根据权利要求4或5所述的方法,所述方法还包括:The method according to claim 4 or 5, further comprising: 针对所述多个分词中的每个分词,确定所述分词对应的位置编码,所述位置编码用于表征所述分词在所述待处理文本信息中的位置分布;For each of the multiple participles, determining a position code corresponding to the participle, wherein the position code is used to represent a position distribution of the participle in the text information to be processed; 所述根据所述多个分词分别对应的分词编码,生成所述第一数据,包括:Generating the first data according to the segmentation codes corresponding to the multiple segmentations includes: 根据所述多个分词分别对应的分词编码和位置编码,生成所述第一数据。The first data is generated according to the word segmentation codes and position codes respectively corresponding to the multiple word segmentations. 根据权利要求4至6任一项所述的方法,所述确定所述多个分词分别对应的分词编码,包括:According to the method according to any one of claims 4 to 6, determining the segmentation codes corresponding to the multiple segmentations respectively includes: 根据编码映射关系,确定所述多个分词分别对应的分词编码,所述编码映射关系用于记录分词与分词编码之间的映射关系;Determine the segmentation codes corresponding to the multiple segmentations according to the code mapping relationship, wherein the code mapping relationship is used to record the mapping relationship between the segmentations and the segmentation codes; 所述方法还包括:The method further comprises: 在所述编码映射关系中不具有所述分词的情况下,将所述分词与所述分词编码之间的映射关系,更新到所述编码映射关系中。In the case that the code mapping relationship does not contain the word segmentation, the mapping relationship between the word segmentation and the word segmentation code is updated to the code mapping relationship. 根据权利要求1至7任一项所述的方法,所述方法还包括:The method according to any one of claims 1 to 7, further comprising: 确定所述第一数据对应的加密信息和解密信息,所述加密信息用于表征对所述第一数据对应的数据处理结果的加密方式,所述解密信息用于解密通过所述加密方式加密得到的数据;Determining encryption information and decryption information corresponding to the first data, wherein the encryption information is used to represent an encryption method for a data processing result corresponding to the first data, and the decryption information is used to decrypt data encrypted by the encryption method; 所述向第二设备发送所述第二数据特征信息,以指示所述第二设备通过第二模型,根据所述第二数据特征信息生成第一数据处理结果,包括:The sending the second data feature information to the second device to instruct the second device to generate a first data processing result according to the second data feature information using a second model includes: 向所述第二设备发送所述第二数据特征信息和所述加密信息,以指示所述第二设备通过第二模型,根据所述第二数据特征信息和所述加密信息生成所述第一数据处理结果,所述第二模型用于根据所述第二数据特征信息生成初始数据处理结果,以及根据所述加密信息对所述初始数据处理结果进行加密,输出所述第一数据处理结果;sending the second data characteristic information and the encryption information to the second device to instruct the second device to generate the first data processing result based on the second data characteristic information and the encryption information using a second model, wherein the second model is configured to generate an initial data processing result based on the second data characteristic information, encrypt the initial data processing result based on the encryption information, and output the first data processing result; 所述方法还包括:The method further comprises: 获取所述第二设备发送的所述第一数据处理结果;Obtaining a result of processing the first data sent by the second device; 通过所述解密信息,对所述第一数据处理结果进行解密,得到所述初始数据处理结果。The first data processing result is decrypted using the decryption information to obtain the initial data processing result. 一种数据处理方法,所述方法由第二设备执行,第二设备中包括第二模型,所述方法包括:A data processing method is performed by a second device, the second device including a second model, and the method includes: 获取第一设备发送的第二数据特征信息,所述第二数据特征信息为所述第一设备通过在第一数据特征信息中添加噪声信息得到的,所述第一数据特征信息为所述第一设备通过第一模型根据第一数据生成的;Obtaining second data feature information sent by the first device, where the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device based on the first data using a first model; 通过所述第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。Through the second model, a first data processing result is generated according to the second data feature information. The first model and the second model are used to constitute a data processing model. The first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model. 根据权利要求9所述的方法,所述第一数据处理结果与第二数据处理结果之间的差异程度小于第一程度阈值,所述第二数据处理结果为通过所述数据处理模型处理所述第一数据得到的数据处理结果。According to the method according to claim 9, the degree of difference between the first data processing result and the second data processing result is less than a first degree threshold, and the second data processing result is a data processing result obtained by processing the first data through the data processing model. 根据权利要求9或10所述的方法,所述方法还包括:The method according to claim 9 or 10, further comprising: 获取第二数据,所述第二数据具有对应的样本数据处理结果,所述样本数据处理结果为通过所述数据处理模型对所述第二数据进行处理的结果;Acquire second data, where the second data has a corresponding sample data processing result, where the sample data processing result is a result of processing the second data using the data processing model; 基于初始结构分段方式,对所述数据处理模型进行分段,得到第一初始模型和第二初始模型;Segmenting the data processing model based on the initial structure segmentation method to obtain a first initial model and a second initial model; 通过所述第一初始模型,根据所述第二数据生成第三数据特征信息,所述第三数据特征信息用于表征所述第二数据对应的数据特征;generating, using the first initial model, third data feature information according to the second data, wherein the third data feature information is used to characterize data features corresponding to the second data; 通过所述第二初始模型,根据第四数据特征信息生成待定数据处理结果,所述第四数据特征信息为通过在所述第三数据特征信息中添加所述噪声信息得到的;generating, by the second initial model, a pending data processing result according to fourth data feature information, wherein the fourth data feature information is obtained by adding the noise information to the third data feature information; 根据所述待定数据处理结果与所述样本数据处理结果之间的差异,调节所述初始结构分段方式,以得到结构分段方式,通过所述结构分段方式确定出的所述待定数据处理结果与所述样本数据处理结果之间的差异程度小于第一程度阈值;adjusting the initial structural segmentation mode according to a difference between the pending data processing result and the sample data processing result to obtain a structural segmentation mode, wherein a degree of difference between the pending data processing result and the sample data processing result determined by the structural segmentation mode is less than a first degree threshold; 基于所述结构分段方式,对所述数据处理模型进行分段,得到所述第一模型和所述第二模型。Based on the structural segmentation method, the data processing model is segmented to obtain the first model and the second model. 根据权利要求9至11任一项所述的方法,所述噪声信息满足所述第一数据处理结果与第二数据处理结果之间的差异程度小于第二程度阈值,所述第二数据处理结果为通过所述数据处理模型处理所述第一数据得到的数据处理结果。According to the method according to any one of claims 9 to 11, the noise information satisfies that the degree of difference between the first data processing result and the second data processing result is less than a second degree threshold, and the second data processing result is a data processing result obtained by processing the first data through the data processing model. 根据权利要求9至12任一项所述的方法,所述第一设备还用于对所述第一数据进行分片处理,得到第一数据分片和第二数据分片,所述第一数据分片和所述第二数据分片用于构成所述第一数据,所述第二设备还包括第三模型,所述第三模型与所述第一模型的架构相同,所述第三模型具有第二模型参数,所述方法还包括:According to the method according to any one of claims 9 to 12, the first device is further configured to perform sharding processing on the first data to obtain first data shards and second data shards, the first data shards and the second data shards being used to constitute the first data, the second device further comprising a third model having the same architecture as the first model and having second model parameters, the method further comprising: 获取所述第一设备发送的所述第二数据分片;Obtaining the second data fragment sent by the first device; 通过所述第三模型,根据所述第二数据分片生成第二子特征信息;generating second sub-feature information according to the second data slice using the third model; 向所述第一设备发送所述第二子特征信息,以指示所述第一设备根据所述第一子特征信息和所述第二子特征信息确定所述第一数据特征信息,所述第一子特征信息为所述第一设备通过所述第一模型,根据所述第一数据分片生成的特征信息,所述第一模型具有第一模型参数,所述第一模型参数和所述第二模型参数用于构成目标模型参数,所述目标模型参数为所述数据处理模型中的所述第一模型的模型参数。The second sub-feature information is sent to the first device to instruct the first device to determine the first data feature information based on the first sub-feature information and the second sub-feature information, where the first sub-feature information is feature information generated by the first device according to the first data slice through the first model, and the first model has first model parameters. The first model parameters and the second model parameters are used to constitute target model parameters, and the target model parameters are model parameters of the first model in the data processing model. 根据权利要求9至13任一项所述的方法,所述获取第一设备发送的第二数据特征信息,包括:According to the method according to any one of claims 9 to 13, obtaining the second data characteristic information sent by the first device includes: 获取所述第一设备发送的第二数据特征信息和加密信息,所述加密信息用于表征对所述第一数据对应的数据处理结果的加密方式;Obtaining second data characteristic information and encryption information sent by the first device, where the encryption information is used to represent an encryption method for a data processing result corresponding to the first data; 所述通过所述第二模型,根据所述第二数据特征信息生成第一数据处理结果,包括:Generating a first data processing result according to the second data feature information using the second model includes: 通过所述第二模型,根据所述第二数据特征信息生成初始数据处理结果,以及根据所述加密信息对所述初始数据处理结果进行加密,输出所述第一数据处理结果;generating an initial data processing result according to the second data feature information using the second model, encrypting the initial data processing result according to the encryption information, and outputting the first data processing result; 所述方法还包括:The method further comprises: 向所述第一设备发送所述第一数据处理结果,以指示所述第一设备通过所述解密信息,对所述第一数据处理结果进行解密,得到所述初始数据处理结果,所述解密信息用于解密通过所述加密方式加密得到的数据。The first data processing result is sent to the first device to instruct the first device to decrypt the first data processing result through the decryption information to obtain the initial data processing result, and the decryption information is used to decrypt the data encrypted by the encryption method. 一种数据处理装置,所述装置包括第一生成单元、第一添加单元和第一发送单元:A data processing device, comprising a first generating unit, a first adding unit, and a first sending unit: 所述第一生成单元,用于通过所述第一模型,根据第一数据生成第一数据特征信息,所述第一数据特征信息用于表征所述第一数据的数据特征;The first generating unit is configured to generate first data feature information according to the first data using the first model, where the first data feature information is used to represent data features of the first data; 所述第一添加单元,用于在所述第一数据特征信息中添加噪声信息,得到第二数据特征信息;The first adding unit is configured to add noise information to the first data feature information to obtain second data feature information; 所述第一发送单元,用于向第二设备发送所述第二数据特征信息,以指示所述第二设备通过第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。The first sending unit is used to send the second data characteristic information to the second device to instruct the second device to generate a first data processing result according to the second data characteristic information through a second model. The first model and the second model are used to constitute a data processing model, and the first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model. 一种数据处理装置,所述装置包括第三获取单元和第三生成单元:A data processing device, comprising a third acquiring unit and a third generating unit: 所述第三获取单元,用于获取第一设备发送的第二数据特征信息,所述第二数据特征信息为所述第一设备通过在第一数据特征信息中添加噪声信息得到的,所述第一数据特征信息为所述第一设备通过第一模型根据第一数据生成的;the third acquiring unit being configured to acquire second data feature information sent by the first device, where the second data feature information is obtained by the first device by adding noise information to the first data feature information, and the first data feature information is generated by the first device based on the first data using the first model; 所述第三生成单元,用于通过所述第二模型,根据所述第二数据特征信息生成第一数据处理结果,所述第一模型和所述第二模型用于构成数据处理模型,所述第一数据处理结果用于表征通过所述数据处理模型处理所述第一数据得到的数据处理结果。The third generation unit is used to generate a first data processing result based on the second data feature information through the second model. The first model and the second model are used to constitute a data processing model. The first data processing result is used to represent the data processing result obtained by processing the first data through the data processing model. 一种计算机设备,所述计算机设备包括处理器以及存储器:A computer device comprising a processor and a memory: 所述存储器用于存储计算机程序,并将所述计算机程序传输给所述处理器;The memory is used to store a computer program and transmit the computer program to the processor; 所述处理器用于根据所述计算机程序中的指令执行权利要求1-8中任意一项所述的数据处理方法,或执行权利要求9-14中任意一项所述的数据处理方法。The processor is configured to execute the data processing method described in any one of claims 1 to 8, or execute the data processing method described in any one of claims 9 to 14, according to instructions in the computer program. 一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行权利要求1-8中任意一项所述的数据处理方法,或执行权利要求9-14中任意一项所述的数据处理方法。A computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, wherein the computer program is used to execute the data processing method described in any one of claims 1 to 8, or execute the data processing method described in any one of claims 9 to 14. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现权利要求1-8中任意一项所述的数据处理方法,或实现权利要求9-14中任意一项所述的数据处理方法。A computer program product, comprising a computer program, wherein when the computer program is executed by a processor, the computer program implements the data processing method according to any one of claims 1 to 8, or the data processing method according to any one of claims 9 to 14.
PCT/CN2025/080397 2024-03-18 2025-03-04 Data processing method and related apparatus Pending WO2025195159A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202410307489.0 2024-03-18
CN202410307489.0A CN117955732B (en) 2024-03-18 2024-03-18 Data processing method and related device

Publications (1)

Publication Number Publication Date
WO2025195159A1 true WO2025195159A1 (en) 2025-09-25

Family

ID=90800210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2025/080397 Pending WO2025195159A1 (en) 2024-03-18 2025-03-04 Data processing method and related apparatus

Country Status (2)

Country Link
CN (1) CN117955732B (en)
WO (1) WO2025195159A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117955732B (en) * 2024-03-18 2024-06-25 腾讯科技(深圳)有限公司 Data processing method and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200244435A1 (en) * 2019-01-28 2020-07-30 The Toronto-Dominion Bank Homomorphic computations on encrypted data within a distributed computing environment
CN112668038A (en) * 2020-06-02 2021-04-16 华控清交信息科技(北京)有限公司 Model training method and device and electronic equipment
CN114519436A (en) * 2022-02-16 2022-05-20 京东科技控股股份有限公司 Model training method, device, equipment and storage medium
CN115801266A (en) * 2022-10-20 2023-03-14 深圳供电局有限公司 Data transmission method, device, computer equipment and storage medium
CN117056962A (en) * 2023-07-21 2023-11-14 厦门大学 Federal learning large model fine tuning method and device
CN117955732A (en) * 2024-03-18 2024-04-30 腾讯科技(深圳)有限公司 Data processing method and related device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929886B (en) * 2019-12-06 2022-03-22 支付宝(杭州)信息技术有限公司 Model training, prediction method and system
CN111125760B (en) * 2019-12-20 2022-02-15 支付宝(杭州)信息技术有限公司 Model training, prediction method and system for protecting data privacy
CN112329073B (en) * 2021-01-05 2021-07-20 腾讯科技(深圳)有限公司 Distributed data processing method, device, computer equipment and storage medium
CN114417394B (en) * 2021-12-08 2025-03-25 海南火链科技有限公司 Blockchain-based data storage method, device, equipment and readable storage medium
CN114553612B (en) * 2022-04-27 2022-07-26 深圳市一航网络信息技术有限公司 Data encryption and decryption method and device, storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200244435A1 (en) * 2019-01-28 2020-07-30 The Toronto-Dominion Bank Homomorphic computations on encrypted data within a distributed computing environment
CN112668038A (en) * 2020-06-02 2021-04-16 华控清交信息科技(北京)有限公司 Model training method and device and electronic equipment
CN114519436A (en) * 2022-02-16 2022-05-20 京东科技控股股份有限公司 Model training method, device, equipment and storage medium
CN115801266A (en) * 2022-10-20 2023-03-14 深圳供电局有限公司 Data transmission method, device, computer equipment and storage medium
CN117056962A (en) * 2023-07-21 2023-11-14 厦门大学 Federal learning large model fine tuning method and device
CN117955732A (en) * 2024-03-18 2024-04-30 腾讯科技(深圳)有限公司 Data processing method and related device

Also Published As

Publication number Publication date
CN117955732A (en) 2024-04-30
CN117955732B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
JP7185014B2 (en) Model training method, machine translation method, computer device and program
US11057216B2 (en) Protection method and protection system of system partition key data and terminal
CN109543200A (en) Text translation method and device
CN110825863B (en) Text pair fusion method and device
CN115589281A (en) Decryption method, related device and storage medium
WO2019148397A1 (en) Storage of decomposed sensitive data in different application environments
WO2025195159A1 (en) Data processing method and related apparatus
CN114511438A (en) Method, device and equipment for controlling load
CN116541865A (en) Password input method, device, equipment and storage medium based on data security
CN109766705B (en) A circuit-based data verification method, device and electronic device
CN114629649A (en) Data processing method and device based on cloud computing and storage medium
CN106295379A (en) Encrypt input method and device, deciphering input method and device and relevant device
CN117592089B (en) Data processing method, device, equipment and storage medium
CN111104566B (en) Feature index encoding method, device, electronic equipment and storage medium
US20240362343A1 (en) Homomorphic operation system and operating method thereof
CN114297693B (en) Model pre-training method and device, electronic equipment and storage medium
CN117118647A (en) A data encryption method, device, computer equipment and storage medium
CN105653534B (en) Data processing method and device
CN115801308A (en) Data processing method, related device and storage medium
CN115567297A (en) Cross-site request data processing method and device
CN116630375A (en) Processing method and related device for key points in image
CN111625278A (en) Generation method of source code file and related equipment
CN119862302B (en) Training methods, devices, equipment, and storage media for video text retrieval models
US20250132913A1 (en) Securing data sent between computing devices
CN110719261B (en) Active page data processing method and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25772867

Country of ref document: EP

Kind code of ref document: A1