[go: up one dir, main page]

US20180089587A1 - Systems and Methods for Communication Efficient Distributed Mean Estimation - Google Patents

Systems and Methods for Communication Efficient Distributed Mean Estimation Download PDF

Info

Publication number
US20180089587A1
US20180089587A1 US15/676,076 US201715676076A US2018089587A1 US 20180089587 A1 US20180089587 A1 US 20180089587A1 US 201715676076 A US201715676076 A US 201715676076A US 2018089587 A1 US2018089587 A1 US 2018089587A1
Authority
US
United States
Prior art keywords
vector
rotated
computing device
quantized
quantization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/676,076
Inventor
Ananda Theertha Suresh
Sanjiv Kumar
Hugh Brendan McMahan
Xinnan Yu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to US15/676,076 priority Critical patent/US20180089587A1/en
Priority to US15/708,793 priority patent/US11196800B2/en
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MCMAHAN, Hugh Brendan, YU, Xinnan, KUMAR, SANJIV, SURESH, ANANDA THEERTHA
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Publication of US20180089587A1 publication Critical patent/US20180089587A1/en
Priority to US17/502,794 priority patent/US11785073B2/en
Priority to US18/240,799 priority patent/US12219004B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • G06N99/005
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • G06N7/005
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators
    • G06F7/582Pseudo-random number generators
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present disclosure relates generally to distributed computing. More particularly, the present disclosure relates to systems and methods for communication efficient distributed mean estimation.
  • the goal of distributed mean estimation is to estimate the mean of such vectors.
  • This basic estimation problem is used as a subroutine in several learning and optimization tasks where data is distributed across several clients. For example, in Lloyd's algorithm for k-means clustering, if data is distributed across several clients, the server needs to compute the means of all clusters in each update step. Similarly, for principal components analysis (PCA), if data samples are distributed across several clients, then for the power-iteration method, the server needs to average the output of all clients in each step.
  • PCA principal components analysis
  • each client obtains a copy of a global model.
  • the clients then update the model independently based on their local data.
  • the updates (usually in the form of gradients) are then sent to a server, where they are averaged and used to update the global model.
  • a critical step in all of the above algorithms is to estimate the mean of a set of vectors.
  • communication cost can be significant in example distributed computing systems where each client can be a low-power and/or low-bandwidth device such as, for example, a mobile phone, an embedded computing device, or other connected smart devices such as intelligent speakers, cameras, home appliances, vehicle computing systems, etc.
  • One aspect of the present disclosure is directed to a computing system to facilitate transmission of machine-learned model updates from client devices to a centralized server computing device.
  • the computing system includes one or more client computing devices.
  • Each client computing device includes one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the client computing device to perform operations.
  • the operations include determining an update to a machine-learned model based at least in part on a local dataset stored at the client computing device.
  • the operations include rotating the update by a random rotation matrix to obtain a rotated update.
  • the operations include performing probabilistic quantization of the rotated update to obtain a quantized rotated update.
  • the operations include transmitting the quantized rotated update to the centralized server computing device.
  • the computing system includes one or more client computing devices.
  • Each client computing device includes one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the client computing device to perform operations.
  • the operations include obtaining a vector.
  • the operations include rotating the vector by a random rotation matrix to obtain a rotated vector.
  • the operations include performing probabilistic quantization of the rotated vector to obtain a quantized rotated vector.
  • the operations include transmitting the quantized rotated vector.
  • the computing system includes one or more client computing devices.
  • Each client computing device includes one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the client computing device to perform operations.
  • the operations include obtaining a vector.
  • the operations include performing probabilistic quantization of the vector to obtain a quantized vector.
  • Performing probabilistic quantization of the vector includes determining a value for each of a number of quantization levels based at least in part on a magnitude of the vector and a minimum coordinate value included in the vector.
  • Performing probabilistic quantization of the vector includes quantizing each coordinate of the vector into one of the number of quantization levels.
  • FIG. 1 depicts a graphical diagram of example results of distributed mean estimation on data generated from a Gaussian distribution according to example embodiments of the present disclosure.
  • FIGS. 2A-D depict graphical diagrams of example results for performance of Lloyd's algorithm with different types of quantizations according to example embodiments of the present disclosure.
  • FIGS. 3A-D depict graphical diagrams of example results for performance of power iteration with different types of quantizations according to example embodiments of the present disclosure.
  • FIG. 4 depicts a block diagram of an example computing system according to example embodiments of the present disclosure.
  • FIG. 5 depicts a swim lane flow diagram of an example method to perform a stochastic rotated quantization technique according to example embodiments of the present disclosure.
  • FIG. 6 depicts a swim lane flow diagram of an example method to perform a variable length coding technique according to example embodiments of the present disclosure.
  • the present disclosure provides systems and methods for communication efficient distributed mean estimation.
  • the present disclosure addresses the need for distributed learning and optimization algorithms with low communication cost.
  • the systems and methods of the present disclosure make no probabilistic assumptions on the data.
  • aspects of the present disclosure can be implemented by a system in which a number of vectors reside on a number of different client computing devices, and each client device seeks to transmit its respective vector to a centralized server computing device to enable the server device to estimate the mean of the vectors.
  • the techniques of the present disclosure can enable communication efficient uploads of local machine-learned model gradients from the client devices to the server device, where the server device aggregates the received gradients to update a global machine-learned model.
  • One aspect of the present disclosure is directed to a random rotation technique to improve communication efficiency.
  • a client computing device can rotate a vector by a random rotation matrix and then subsequently perform probabilistic quantization on the rotated vector.
  • the present disclosure provides a stochastic k-level quantization technique. As demonstrated by the present disclosure, performing the random rotation prior to quantization can significantly reduce the quantization error, thereby leading to improved communication efficiency.
  • the client device can transmit the quantized rotated vector to a centralized server computing device.
  • the server computing device can receive multiple vectors from multiple client computing devices.
  • the server computing device can determine a mean of the multiple received vectors and can de-rotate the mean vector using an inverse random rotation matrix.
  • this random rotation technique can reduce the mean square error of the mean estimation significantly by a factor of
  • variable length coding technique to improve communication efficiency.
  • the client computing device can encode the quantized vector according to a variable length coding scheme (e.g., by computing variable length codes).
  • the server computing device can then decode the received vectors using the variable length coding scheme.
  • This variable length coding technique can reduce the mean squared error by (d) as compared to a naive approach that includes neither random rotation nor variable length coding.
  • the present disclosure mathematically demonstrates that this variable length coding approach is communication optimal.
  • the present disclosure enables transmission of information (e.g., machine-learned model updates) in a more efficient and lower bandwidth manner.
  • the random rotation technique and the variable length coding technique can reduce the number of bits required to represent the information, thereby enabling faster and lower bandwidth transmission of the information.
  • the present disclosure enables distributed training techniques in which machine-learned models are trained locally on-device and then a global model is generated or updated based on a mean of the updates resulting from the local training.
  • the present disclosure enables and enhances such distributed scheme.
  • the distributed training scheme results in better (e.g., more accurate) global models which have improved performance due to their exposure to additional training data. Furthermore, the distributed training scheme that is enhanced by the present disclosure improves user privacy as training data can be retained on the device and is not required to be sent to a central location. As yet another example technical effect and benefit, the present disclosure also provides enhanced security or encryption of information that is transmitted to a central location. For example, through the use of private rotation matrices the transmitted information can be rotated and only entities with the inverse of the private matrix can de-rotate to extract the information.
  • the present disclosure is structured as follows: first, the present disclosure demonstrates that for d dimensional data with n clients, a naive stochastic rounding approach yields a mean squared error (MSE) of ⁇ (d/n) and uses a constant number of bits per dimension per client.
  • MSE mean squared error
  • this naive algorithm is extended in two ways: by demonstrating that application of a structured random rotation before quantization reduces the error to ((log d)/n) and application of a better coding strategy further reduces the error to (1/n).
  • the present disclosure also demonstrates that the latter coding strategy is optimal up to a constant in the minimax sense. That is, it achieves the best MSE for a given communication cost.
  • the present disclosure demonstrates the practicality of the algorithms described herein by applying them to distributed Lloyd's algorithm for k-means and power iteration for principal component analysis (PCA).
  • PCA principal component analysis
  • the goal of distributed mean estimation is to estimate the mean of the vectors:
  • this basic estimation problem is used in several learning and optimization tasks where data is distributed across several clients, including, for example, Lloyd's algorithm for k-means clustering (see, e.g., Lloyd, Stuart. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129-137, 1982); principal components analysis (PCA); and the training of large-scale neural networks and other statistical models.
  • each client obtains a copy of a global model.
  • the clients then update the model independently based on their local data.
  • the updates (usually in the form of gradients) are then sent to a server, where they are averaged and used to update the global model.
  • a critical step in all of the above algorithms is to estimate the mean of a set of vectors as in Eq. Error! Reference source not found.
  • the communication efficient systems, methods, and techniques of the present disclosure can involve or be performed in any configuration or scenario where data (e.g., in the form of vectors) resides on or is produced by one or more different client devices and is communicated to another device such as, for example, a central server device.
  • data e.g., in the form of vectors
  • another device such as, for example, a central server device.
  • communication cost can be significant in example distributed computing systems where each client can be a low-power and/or low-bandwidth device such as, for example, a mobile phone or other mobile computing device, an embedded computing device, a worker device in a datacenter, or other connected smart devices such as intelligent speakers, cameras, home appliances, vehicle computing systems, etc.
  • the systems and methods of the present disclosure address the basic problem of achieving the optimal minimax rate in distributed mean estimation with limited communication or other communication constraints (e.g., communication costs).
  • the systems, methods, models, and results of the present disclosure differ from previous works on mean estimation in a number of ways, including the following examples.
  • certain previous works assume that the data is generated independent and identically distributed (i.i.d.) according to some distribution.
  • the present disclosure does not make any distribution assumptions on data.
  • the objective in certain prior works is to estimate the mean of the underlying statistical model, while the goal in at least some implementations of the present disclosure is to estimate the empirical mean of the data.
  • the communication algorithms provided herein are simultaneous and independent. That is, the clients independently send data to the server and they can transmit at the same time.
  • each client transmits a function of X i (say ⁇ (X i )), and a central server estimates the mean by some function of ⁇ (X i ), ⁇ (X 2 ), . . . , ⁇ (X n ).
  • be any such protocol and let i ( ⁇ , X i ) be the expected number of transmitted bits by the i-th client during protocol ⁇ , where throughout the present disclosure, expectation is over the randomness in protocol ⁇ .
  • the estimated mean be ⁇ dot over ( ⁇ circumflex over (X) ⁇ ) ⁇ .
  • the MSE of the estimate is
  • Private randomness refers to random values that are generated by each machine separately
  • public randomness refers to a sequence of random values that are shared among all parties.
  • the server can communicate a random seed that can be used by clients to emulate public randomness. More particularly, in some implementations, both the client and the server can have knowledge of a synchronized random seed.
  • the client and the server can each use the random seed (e.g., along with a pseudo-random number generator) to separately produce random values (e.g., a random rotation matrix as will be discussed further below).
  • the seed can be 32 bits long.
  • E ⁇ ( ⁇ , S d ) ⁇ def ⁇ max X n ⁇ : ⁇ X i ⁇ S d ⁇ ⁇ i ⁇ E ⁇ ( ⁇ , X n ) .
  • ⁇ (c) denote the set of all protocols with communication cost at most c.
  • the minimax MSE is
  • Section 2.1 it is first shown that a naive stochastic binary quantization algorithm (denoted by ⁇ sb ) achieves an MSE of
  • the present disclosure proposes two approaches.
  • the present disclosure demonstrates that preprocessing the data by a random rotation reduces the mean squared error. Specifically, in Theorem 3, it is shown that this new scheme (denoted by ⁇ srk ) achieves an MSE of
  • a second approach provided by the present disclosure uses the same quantization as ⁇ sk but encodes levels via variable length coding. Instead of using [log 2 k] bits per dimension, it is shown that using variable length encoding such as arithmetic coding to compress the data reduces the communication cost significantly. In particular, in Theorem 4 it is shown that there is a scheme (denoted by ⁇ svk ) such that
  • ⁇ svk has the best MSE for a given communication cost. Note that ⁇ svk uses k quantization levels but still uses (1) bits per dimension per client for all k ⁇ square root over (d) ⁇ .
  • stochastic rotated quantization has several practical advantages: it uses fixed length coding and hence can be combined with encryption schemes for privacy preserving secure aggregation. It can also provide lower quantization error in some scenarios due to better constants (see, e.g., Section 7 for details).
  • Theorem 1 There exists a universal constant t ⁇ 1 such that for communication cost c ⁇ ndt and n ⁇ 1/t,
  • Section 2 the stochastic uniform quantization technique is analyzed in Section 2.
  • Section 3 a novel stochastic rotated quantization technique is proposed, and in Section 4 arithmetic coding is analyzed.
  • Section 5 the above algorithms are combined with a sampling technique and the upper bound on the minimax risk is stated.
  • Section 6 the matching minimax lower bounds are stated.
  • Section 7 some practical considerations are discussed and the algorithms are applied on distributed power iteration and Lloyd's algorithm.
  • Section 8 provides a proof of Lemma 7.
  • Section 9 describes example computing systems that can implement the techniques described herein.
  • Section 10 describes example methods to implement the techniques described herein and Section 11 provides some additional disclosure.
  • Y i ⁇ ( j ) ( X i m ⁇ ⁇ ax with ⁇ ⁇ probability ⁇ ⁇ X i ⁇ ( j ) - X i m ⁇ ⁇ i ⁇ ⁇ n X i ma ⁇ ⁇ x - X i m ⁇ ⁇ i ⁇ ⁇ n , X i m ⁇ ⁇ i ⁇ n otherwise .
  • Lemma 1 There exists an implementation of stochastic binary quantization that uses d+ (1) bits per client and hence ( ⁇ sb ,X n ) ⁇ n ⁇ (d+ (1)).
  • Lemma 2 implies the following upper bound.
  • Lemma 4 There exists a set of vectors X n such that
  • the algorithm proposed in this subsection gives MSE ⁇ (d/n).
  • MSE ⁇ (d/n).
  • d can be on the order of millions, yet n can be much smaller than that.
  • the MSE is even larger than the norm of the vector.
  • a natural generalization of binary quantization is k-level quantization. Let k be a positive integer larger than 2.
  • the algorithm quantizes each coordinate into one of B i (r)s stochastically. In ⁇ sk , for the i-th client and j-th coordinate, if X i (j) ⁇ [B i (r), B i (r+1)),
  • Y i ⁇ ( j ) ( B i ⁇ ( r + 1 ) with ⁇ ⁇ probability ⁇ ⁇ X i ⁇ ( j ) - B i ⁇ ( r ) B i ⁇ ( r + 1 ) - B i ⁇ ( r ) B i ⁇ ( r ) otherwise .
  • the server estimates X by
  • Lemma 5 There exists an implementation of stochastic k-level quantization that uses d ⁇ log(k) ⁇ + (1) bits per client and hence ( ⁇ sk ,X n ) ⁇ n ⁇ (d ⁇ log 2 k ⁇ + (1)).
  • the mean squared loss can be bounded as follows.
  • This method can be denominated as stochastic rotated quantization and is denoted herein by ⁇ srk .
  • the communication cost is same as ⁇ sk and is given by Lemma 5. Next the MSE is bounded.
  • an orthogonal matrix R that achieves low (Z i max ) 2 and (Z i min ) 2 is beneficial.
  • a type of orthogonal matrix that permits fast matrix-vector products is also beneficial.
  • Naive orthogonal matrices that support fast multiplication such as block-diagonal matrices often result in high values of (Z i max ) 2 and (Z i min ) 2 .
  • H is a Walsh-Hadamard matrix. See Horadam, Kathy J. Hadamard matrices and their applications . Princeton university press, 2012. The Walsh-Hadamard matrix of dimension 2 m for m ⁇ is given by the recursive formula,
  • H ⁇ ( 2 1 ) [ 1 1 1 - 1 ]
  • H ⁇ ( 2 m ) [ H ⁇ ( 2 m - 1 ) H ⁇ ( 2 m - 1 ) H ⁇ ( 2 m - 1 ) - H ⁇ ( 2 m - 1 ) ] .
  • the present disclosure provides use of a variable length coding strategy to minimize the number of bits.
  • the present disclosure proposes to, in some implementations, further encode the transmitted values using universal compression schemes. See, e.g., Krichevsky, R and Trofimov, V. The performance of universal encoding. IEEE Transactions on Information Theory, 27(2):199-207, 1981; and Falahatgar, Moein and Jafarpour, Ashkan and Orlitsky, Alon and Pichapati, Venkatadheeraj and Suresh, Ananda Theertha. Universal compression of power-law distributions. ISIT, 2015.
  • variable length coding scheme e.g., arithmetic or Huffman coding
  • ⁇ svk This technique is denoted herein by ⁇ svk . Since the vectors are quantized the same way in ⁇ sk and ⁇ svk , the MSE of ⁇ svk is also given by Theorem 2. The communication cost is now bounded.
  • the total number of bits arithmetic coding uses is
  • ⁇ ⁇ ( ⁇ p , X n ) p ⁇ ⁇ ( ⁇ , X n ) .
  • the lower bound relies on the lower bounds on distributed statistical estimation due to Zhang, Yuchen, Duchi, John, Jordan, Michael I, and Wainwright, Martin J. Information-theoretic lower bounds for distributed statistical estimation with communication constraints. NIPS, 2013.
  • Inequality (a) follows from the fact that 2(a ⁇ b) 2 +2(b ⁇ c) 2 ⁇ (a ⁇ c) 2 .
  • Inequality (b) follows from Lemma 9 and (c) follows from the fact that ( ⁇ ,S d ) ⁇ ndt/4 and n ⁇ 4/t.
  • variable-length coding method provides the lowest quantization error asymptotically when using a constant number of bits.
  • stochastic rotated quantization may be preferred due to (hidden) constant factors and the fact that it uses a fixed amount of bits per dimension. For example, considering quantizing the vector [ ⁇ 1,1,0,0], stochastic rotated quantization can use 1 bit per dimension and gives zero error, whereas the other two protocols do not. To see this, observe that the naive quantization will quantize 0 to either 1 or ⁇ 1 and variable length coding cannot achieve 0 error with 1 bit per dimension due to its constant factors.
  • the rotated quantization is preferred when applied on “unbalanced” data, due to the fact that the rotation can correct the unbalancedness.
  • This is demonstrated by generating a dataset where the value of the last feature dimension entry is much larger than others.
  • 1000 datapoints were generated each with 256 dimensions. The first 255 dimensions are generated i.i.d. from N(0,1), and the last dimension is generated from N(100,1).
  • the rotated stochastic quantization has the best performance on this example dataset. The improvement is especially significant for low bit rate cases.
  • FIG. 1 illustrates distributed mean estimation on data generated from a Gaussian distribution.
  • each client has access to a subset of data points.
  • the server broadcasts the cluster centers to all the clients.
  • Each client updates the centers based on its local data, and sends the centers back to the server.
  • the server then updates the centers by computing the weighted average of the centers sent from all clients.
  • the client compresses the new centers before sending to the server. This saves the uplink communication cost, which is often the bottleneck of distributed learning.
  • the downlink is a broadcast, and therefore its cost can be reduced by a factor of O(n/log n) without quantization, where n is the number of clients.
  • FIGS. 2A-D show example results in which both the number of centers and the number of clients are set to 10.
  • FIGS. 2A-D illustrate Lloyd's algorithm with different types of quantizations. Two settings are illustrated as tested: 16 quantization levels and 32 quantizations levels.
  • the x-axis is the average number of bits sent for each data dimensions, and the y-axis is the global objective of Lloyd's algorithm.
  • Power iteration is a widely used method to compute the top eigenvector of a matrix.
  • each client has access to a subset of data.
  • the server broadcasts the current estimate of the eigenvector to all clients.
  • Each client updates the eigenvector based on one power iteration on its local data, and sends the updated eigenvector back to the server.
  • the server updates the eigenvector by computing the weighted average of the eigenvectors sent by all clients.
  • Data points A 1 , A 2 , . . . A m are distributed across n clients.
  • a client sends:
  • the server can average the received information to compute v t+1 .
  • the client compresses the estimated eigenvector before sending to the server.
  • FIGS. 3A-D show the results.
  • the dataset is distributed over 100 clients.
  • FIGS. 3A-D illustrate power iteration with different types of quantizations. Two settings were tested: 16 quantization levels and 32 quantization levels.
  • the x-axis is the average number of bits sent for each data dimension and they y-axis is the l 2 distance between the computed eigenvector and the ground-truth vector.
  • variable-length coding achieves the lowest quantization error in most of the settings. Furthermore, for low-bit rate, stochastic rotated quantization is competitive with variable-length coding.
  • D(j) be the j th diagonal entry of D.
  • Z i max is a function of d independent random variables D(1), D (2), . . . D(d). Changing D(j) changes the Z i max by at most
  • FIG. 4 depicts an example system 200 for distributed computing.
  • System 200 can include a server computing device 210 and a plurality of client computing devices 230 .
  • the server computing device 210 can be configured to access a global machine-learned model and to provide the global model to the plurality of client devices 230 .
  • the model can be, for instance, a linear regression model, logistic regression model, a support vector machine model, a neural network (e.g. convolutional neural network, recurrent neural network, etc.), or other machine-learned model.
  • the sever 210 can be configured to communicate with client devices 230 over one or more networks 280 .
  • Client devices 230 can each be configured to determine one or more local updates associated with the model based at least in part on locally stored data 236 .
  • the locally stored data 236 can include audio files, image files, video files, log entries, and/or various other suitable data.
  • the data 236 can be any data derived through a user interaction with a client device 230 .
  • the data 236 across the plurality of devices 230 includes data that is respectively stored at each device 230 .
  • the collective data 236 across all devices 230 is highly unbalanced and not independent and identically distributed.
  • Client devices 230 can be configured to provide the local updates to the server 210 .
  • the data 236 may be privacy sensitive. In this manner, the local updates can be performed and provided to server 210 without compromising the privacy of data 236 . For instance, in such implementations, data 236 is not itself provided to server 210 since the local update does not include the actual data 236 .
  • one or more of encryption techniques, random noise techniques, and/or other security techniques can be added to the training process to obscure any inferable information from the local updates.
  • server 210 can receive each local update from client device 230 , and can aggregate the local updates to determine a global update to the machine-learned model. In some implementations, server 210 can determine a weighted average or other mean of the local updates and determine the global update based at least in part on the average.
  • scaling or other techniques can be applied to the local updates to determine the global update. For instance, a local step size can be applied for each client device 230 , the aggregation can be performed proportionally to various data partition sizes of client devices 230 , and/or one or more scaling factors can be applied to the local and/or aggregated updates. It will be appreciated that various other techniques can be applied without deviating from the scope of the present disclosure.
  • FIG. 4 depicts an example computing system 200 that can be used to implement the methods and systems of the present disclosure.
  • the system 200 can be implemented using a client-server architecture that includes a server 210 that communicates with one or more client devices 230 over a network 280 .
  • the system 200 includes a server 210 , such as a web server.
  • the server 210 can be implemented using any suitable computing device(s).
  • the server 210 can have one or more processors 212 and one or more memory devices 214 .
  • the server 210 can be implemented using one server device or a plurality of server devices. In implementations in which a plurality of devices are used, such plurality of devices can operate according to a parallel computing architecture, a sequential computing architecture, or a combination thereof.
  • the server 210 can also include a network interface 236 used to communicate with the client devices 230 over the network 280 .
  • the network interface can include any suitable components for interfacing with one more networks, including for example, transmitters, receivers, ports, controllers, antennas, or other suitable components.
  • the one or more processors 212 can include any suitable processing device, such as a microprocessor, microcontroller, integrated circuit, logic device, or other suitable processing device.
  • the one or more memory devices 214 can include one or more computer-readable media, including, but not limited to, non-transitory computer-readable media, RAM, ROM, hard drives, flash drives, or other memory devices.
  • the one or more memory devices 214 can store information accessible by the one or more processors 212 , including computer-readable instructions 218 that can be executed by the one or more processors 212 .
  • the instructions 218 can be any set of instructions that when executed by the one or more processors 212 , cause the one or more processors 212 to perform operations.
  • the server 210 can further include a de-rotator 220 .
  • the de-rotator 220 can de-rotate a vector that has been rotated by a client device 230 .
  • the de-rotater can use an inverse random rotation matrix that is an inverse of a random rotation matrix used by the client device 230 to rotate the vector.
  • a mean of several rotated vectors can be determined prior to de-rotation and the de-rotator can de-rotated the determined mean.
  • the de-rotator 220 can determine one or more inverse random rotation matrices based at least in part on one or more seeds that are shared (e.g., respectively shared or universally shared) with the client devices 230 .
  • the server 210 can further include a decoder 222 .
  • the decoder 222 can decode a vector that has been encoded by a client device 230 (e.g., according to one of the encoding techniques discussed above).
  • the decoder can decode a vector that has been encoded according to variable length coding techniques such as, for example, Huffman coding or arithmetic coding.
  • the server 210 can further include a mean calculator 224 .
  • the mean calculator 224 can be configured to receive a plurality of vectors (e.g., decoded vectors, rotated vectors, and/or de-rotated vectors) and to determine a mean of the vectors (e.g., a mean vector). In some implementations in which the mean is determined prior to de-rotation, the mean vector can subsequently be de-rotated by the de-rotator 220 .
  • the one or more memory devices 214 can also store data 216 that can be retrieved, manipulated, created, or stored by the one or more processors 212 .
  • the data 216 can include, for instance, local updates, global parameters, and other data.
  • the data 216 can be stored in one or more databases.
  • the one or more databases can be connected to the server 210 by a high bandwidth LAN or WAN, or can also be connected to server 210 through network 280 .
  • the one or more databases can be split up so that they are located in multiple locales.
  • the server 210 can exchange data with client devices 230 over the network 280 .
  • client devices 230 can be connected to the server 210 over the network 280 .
  • Each of the client devices 230 can be any suitable type of computing device, such as a general purpose computer, special purpose computer, laptop, desktop, mobile device, navigation system, intelligent speaker or home assistant, home appliance, smartphone, tablet, computing device that is able to be worn, gaming console, worker device in a datacenter, a display with one or more processors, or other suitable computing device.
  • a client device 230 can include one or more processor(s) 232 and a memory 234 .
  • the one or more processor(s) 232 can include, for example, one or more central processing units (CPUs), graphics processing units (GPUs) dedicated to efficiently rendering images or performing other specialized calculations, Tensor processing units (TPUs), and/or other processing devices.
  • the memory 234 can include one or more computer-readable media and can store information accessible by the one or more processors 232 , including instructions 238 that can be executed by the one or more processors 232 and data 236 .
  • the client computing device 230 can include a vector calculator 240 that is implementable to determine one or more vectors (e.g., local updates) according to example aspects of the present disclosure.
  • the vector calculator 240 can perform one or more training techniques such as, for example, backwards propagation of errors to re-train or otherwise update a machine-learned model based on the locally stored data 236 , thereby generating an update vector (e.g., a gradient).
  • the vector calculator 240 can be included in an application or can be included in the operating system of the device 230 . In other implementations, the vector calculator 240 can be any component or system that determines a vector to be transmitted to the server computing device 210 .
  • the client computing device 230 can further include a rotater 242 .
  • the rotater 242 can rotate a vector by a random rotation matrix (e.g., by multiplying the vector by the matrix).
  • the rotater 242 can determine the random rotation matrix based on a seed.
  • the client computing device 230 can further include a quantizer 246 .
  • the quantizer 246 can quantize a vector.
  • the quantizer 246 can perform stochastic binary quantization or stochastic k-level quantization as described above to quantize the vector.
  • the client computing device 230 can further include an encoder 248 .
  • the encoder 248 can perform one or more of the encoding techniques described above (e.g., variable length coding such as, for example, Huffman coding or arithmetic coding).
  • the data 236 can include data examples to be used in solving one or more optimization problems.
  • the data examples of each client device 230 can be distributed unevenly among the client devices, such that no client device 230 includes a representative sample of the overall distribution of the training data examples.
  • the data 236 can further include a vector to be communicated to the server 210 .
  • the client device 230 of FIG. 4 can include various input/output devices for providing and receiving information from a user, such as a touch screen, touch pad, data entry keys, speakers, and/or a microphone suitable for voice recognition.
  • the client device 230 can also include a network interface 250 used to communicate with one or more remote computing devices (e.g. server 210 ) over the network 280 .
  • the network interface 250 can include any suitable components for interfacing with one more networks, including for example, transmitters, receivers, ports, controllers, antennas, or other suitable components.
  • the network 280 can be any type of communications network, such as a local area network (e.g. intranet), wide area network (e.g. Internet), cellular network, or some combination thereof.
  • the network 280 can also include a direct connection between a client device 230 and the server 210 .
  • communication between the server 210 and a client device 230 can be carried via network interface using any type of wired and/or wireless connection, using a variety of communication protocols (e.g. TCP/IP, HTTP, SMTP, FTP), encodings or formats (e.g. HTML, XML), and/or protection schemes (e.g. VPN, secure HTTP, SSL).
  • the vector calculator 240 , the rotater 242 , the quantizer 246 , the encoder 248 , the de-rotater 220 , the decoder 222 , and the mean calculator 224 can include computer logic utilized to provide desired functionality.
  • each of vector calculator 240 , the rotater 242 , the quantizer 246 , the encoder 248 , the de-rotater 220 , the decoder 222 , and the mean calculator 224 can be implemented in hardware, firmware and/or software controlling a general purpose processor.
  • each of vector calculator 240 , the rotater 242 , the quantizer 246 , the encoder 248 , the de-rotater 220 , the decoder 222 , and the mean calculator 224 includes program code files stored on the storage device, loaded into memory and executed by a processor or can be provided from computer program products, for example, computer executable instructions that are stored in a tangible computer-readable storage medium such as, for example, a RAM disk or card or other computer-readable optical or magnetic media.
  • the techniques of the present disclosure are discussed primarily with reference to vectors to be transmitted or uploaded, the techniques described herein can also be applied to other data structures as well.
  • the client computing device 230 e.g., the vector calculator 240
  • the client computing device 230 can first flatten the data structure to form a vector.
  • the techniques described herein e.g., random rotation, probabilistic quantization, and/or variable length coding
  • the server computing device 210 can re-shape the resulting vector (e.g., a mean vector) back to the original dimension(s).
  • the present disclosure can be generalized to other data structures through conversion (e.g., flattening) of the other data structure to vector format prior to rotation, quantization, encoding, etc. After de-rotation, decoding, etc., the de-rotated or decoded vector can be reshaped back into the original dimension(s).
  • FIG. 5 depicts a swim lane flow diagram of an example method 500 to perform a stochastic rotated quantization technique according to example embodiments of the present disclosure.
  • the left-hand side of FIG. 5 illustrates operations performed by each of one or more client computing devices while the right-hand side of FIG. 5 illustrates operations performed by a server computing device.
  • a client computing device obtains a vector.
  • the vector can be any vector that is to be transmitted to the server computing device.
  • Example vectors include a machine-learned model update vector that describes one or more parameters of a machine-learned model or one or more updates to the one or more parameters of the machine-learned model; a cluster vector that describes a plurality of cluster centers or a plurality of updates to the plurality of cluster centers; and a power iteration vector that describes an eigenvector.
  • obtaining the vector can include computing the vector based on a local dataset that is stored locally at the client computing device.
  • the client computing device rotates the vector by a random rotation matrix to obtain a rotated vector.
  • the random rotation matrix can be a product of a Walsh-Hadamard matrix with a diagonal matrix.
  • the diagonal matrix can include independent and identically distributed Rademacher entries.
  • the method 500 can further include obtaining a seed that is shared with the server computing device.
  • the method 500 can include generating the random rotation matrix based at least in part on the seed.
  • the client computing device performs probabilistic quantization of the rotated vector to obtain a quantized rotated vector.
  • performing probabilistic quantization at 506 can include performing stochastic binary quantization or performing stochastic k-level quantization.
  • performing probabilistic quantization at 506 can include determining a value for each of a number of quantization levels based at least in part on a magnitude of the rotated vector and a minimum coordinate value included in the rotated vector; and quantizing each coordinate of the rotated vector into one of the number of quantization levels.
  • the client computing device transmits the quantized rotated vector to the server computing device.
  • various additional encodings, cryptography, and/or privacy preserving manipulations can be performed prior to transmission.
  • the server computing device receives the quantized rotated vectors from the client computing devices.
  • the server computing device can decode or unquantize each quantized rotated vector to the extent possible. For example, this may include transforming references or codings into particular data entries to which the references refer.
  • the server computing device determines a mean of all the quantized rotated vectors received from all of the client computing devices to obtain a mean rotated vector.
  • the server computing device de-rotates the mean rotated vector by an inverse random rotation matrix to obtain a mean de-rotated vector.
  • the inverse random rotation matrix can be the inverse of the random rotation matrix used at 504 .
  • method 500 can include obtaining, by the server computing device, a seed, where the seed is shared with at least one of the client computing devices from which a quantized rotated vector is received. For example, each client computing device can have a different seed or seeds can be used by multiple client computing devices.
  • the method 500 can include generating, by the server computing device, the inverse random rotation matrix based at least in part on the seed.
  • the operations illustrated at 512 and 514 can be performed in reverse order, such that the vectors are individually de-rotated prior to taking the mean.
  • the method 500 can further include performing, by the server computing device, a global update based on the mean de-rotated vector.
  • FIG. 6 depicts a swim lane flow diagram of an example method 600 to perform a variable length coding technique according to example embodiments of the present disclosure.
  • the left-hand side of FIG. 6 illustrates operations performed by each of one or more client computing devices while the right-hand side of FIG. 6 illustrates operations performed by a server computing device.
  • a client computing device obtains a vector.
  • the vector can be any vector that is to be transmitted to the server computing device.
  • Example vectors include a machine-learned model update vector that describes one or more parameters of a machine-learned model or one or more updates to the one or more parameters of the machine-learned model; a cluster vector that describes a plurality of cluster centers or a plurality of updates to the plurality of cluster centers; and a power iteration vector that describes an eigenvector.
  • obtaining the vector can include computing the vector based on a local dataset that is stored locally at the client computing device.
  • the client computing device performs probabilistic quantization of the vector to obtain a quantized vector.
  • performing probabilistic quantization at 604 can include performing stochastic binary quantization or performing stochastic k-level quantization.
  • performing probabilistic quantization at 604 can include determining a value for each of a number of quantization levels based at least in part on a magnitude of the vector to be quantized and a minimum coordinate value included in the vector; and quantizing each coordinate of the vector into one of the number of quantization levels.
  • the client computing device encodes the quantized vector according to a variable length coding scheme to obtain an encoded quantized vector.
  • the variable length coding scheme can include Huffman coding or arithmetic coding.
  • the client computing device transmits the encoded quantized vector to the server computing device.
  • the server computing device receives the encoded quantized vectors from the client computing devices.
  • the server computing device decodes each encoded quantized vector according to the variable length coding scheme to obtain the quantized vector.
  • the server computing device can decode or unquantize each quantized rotated vector to the extent possible. For example, this may include transforming references or codings into particular data entries to which the references refer.
  • the server computing device determines a mean of all the quantized vectors received from all of the client computing devices to obtain a mean vector.
  • the method 600 can further include performing, by the server computing device, a global update based on the mean de-rotated vector.
  • the technology discussed herein makes reference to servers, databases, software applications, and other computer-based systems, as well as actions taken and information sent to and from such systems.
  • the inherent flexibility of computer-based systems allows for a great variety of possible configurations, combinations, and divisions of tasks and functionality between and among components.
  • processes discussed herein can be implemented using a single device or component or multiple devices or components working in combination.
  • Databases and applications can be implemented on a single system or distributed across multiple systems. Distributed components can operate sequentially or in parallel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present disclosure provides systems and methods for communication efficient distributed mean estimation. In particular, aspects of the present disclosure can be implemented by a system in which a number of vectors reside on a number of different clients, and a centralized server device seeks to estimate the mean of such vectors. According to one aspect of the present disclosure, a client computing device can rotate a vector by a random rotation matrix and then subsequently perform probabilistic quantization on the rotated vector. According to another aspect of the present disclosure, subsequent to quantization but prior to transmission, the client computing can encode the quantized vector according to a variable length coding scheme (e.g., by computing variable length codes).

Description

    FIELD
  • The present disclosure relates generally to distributed computing. More particularly, the present disclosure relates to systems and methods for communication efficient distributed mean estimation.
  • BACKGROUND
  • Given a number of vectors that reside on a number of different clients, the goal of distributed mean estimation is to estimate the mean of such vectors. This basic estimation problem is used as a subroutine in several learning and optimization tasks where data is distributed across several clients. For example, in Lloyd's algorithm for k-means clustering, if data is distributed across several clients, the server needs to compute the means of all clusters in each update step. Similarly, for principal components analysis (PCA), if data samples are distributed across several clients, then for the power-iteration method, the server needs to average the output of all clients in each step.
  • Recently, algorithms involving distributed mean estimation have been used extensively in training large-scale neural networks and other statistical models. In an example scenario of synchronized distributed learning, each client obtains a copy of a global model. The clients then update the model independently based on their local data. The updates (usually in the form of gradients) are then sent to a server, where they are averaged and used to update the global model. A critical step in all of the above algorithms is to estimate the mean of a set of vectors.
  • However, one of the main bottlenecks in distributed algorithms is the communication cost, which can be prohibitive for modern applications. For example, communication cost can be significant in example distributed computing systems where each client can be a low-power and/or low-bandwidth device such as, for example, a mobile phone, an embedded computing device, or other connected smart devices such as intelligent speakers, cameras, home appliances, vehicle computing systems, etc.
  • SUMMARY
  • Aspects and advantages of embodiments of the present disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the embodiments.
  • One aspect of the present disclosure is directed to a computing system to facilitate transmission of machine-learned model updates from client devices to a centralized server computing device. The computing system includes one or more client computing devices. Each client computing device includes one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the client computing device to perform operations. The operations include determining an update to a machine-learned model based at least in part on a local dataset stored at the client computing device. The operations include rotating the update by a random rotation matrix to obtain a rotated update. The operations include performing probabilistic quantization of the rotated update to obtain a quantized rotated update. The operations include transmitting the quantized rotated update to the centralized server computing device.
  • Another aspect of the present disclosure is directed to a computing system. The computing system includes one or more client computing devices. Each client computing device includes one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the client computing device to perform operations. The operations include obtaining a vector. The operations include rotating the vector by a random rotation matrix to obtain a rotated vector. The operations include performing probabilistic quantization of the rotated vector to obtain a quantized rotated vector. The operations include transmitting the quantized rotated vector.
  • Another aspect of the present disclosure is directed to a computing system. The computing system includes one or more client computing devices. Each client computing device includes one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the client computing device to perform operations. The operations include obtaining a vector. The operations include performing probabilistic quantization of the vector to obtain a quantized vector. Performing probabilistic quantization of the vector includes determining a value for each of a number of quantization levels based at least in part on a magnitude of the vector and a minimum coordinate value included in the vector. Performing probabilistic quantization of the vector includes quantizing each coordinate of the vector into one of the number of quantization levels.
  • Other aspects of the present disclosure are directed to various systems, apparatuses, non-transitory computer-readable media, user interfaces, and electronic devices.
  • These and other features, aspects, and advantages of various embodiments of the present disclosure will become better understood with reference to the following description and appended claims. The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate example embodiments of the present disclosure and, together with the description, serve to explain the related principles.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Detailed discussion of embodiments directed to one of ordinary skill in the art is set forth in the specification, which makes reference to the appended figures, in which:
  • FIG. 1 depicts a graphical diagram of example results of distributed mean estimation on data generated from a Gaussian distribution according to example embodiments of the present disclosure.
  • FIGS. 2A-D depict graphical diagrams of example results for performance of Lloyd's algorithm with different types of quantizations according to example embodiments of the present disclosure.
  • FIGS. 3A-D depict graphical diagrams of example results for performance of power iteration with different types of quantizations according to example embodiments of the present disclosure.
  • FIG. 4 depicts a block diagram of an example computing system according to example embodiments of the present disclosure.
  • FIG. 5 depicts a swim lane flow diagram of an example method to perform a stochastic rotated quantization technique according to example embodiments of the present disclosure.
  • FIG. 6 depicts a swim lane flow diagram of an example method to perform a variable length coding technique according to example embodiments of the present disclosure.
  • Generally, the present disclosure provides systems and methods for communication efficient distributed mean estimation. In particular, the present disclosure addresses the need for distributed learning and optimization algorithms with low communication cost. However, unlike previous works, the systems and methods of the present disclosure make no probabilistic assumptions on the data.
  • Aspects of the present disclosure can be implemented by a system in which a number of vectors reside on a number of different client computing devices, and each client device seeks to transmit its respective vector to a centralized server computing device to enable the server device to estimate the mean of the vectors. As one example application, the techniques of the present disclosure can enable communication efficient uploads of local machine-learned model gradients from the client devices to the server device, where the server device aggregates the received gradients to update a global machine-learned model.
  • One aspect of the present disclosure is directed to a random rotation technique to improve communication efficiency. In particular, a client computing device can rotate a vector by a random rotation matrix and then subsequently perform probabilistic quantization on the rotated vector. For example, the present disclosure provides a stochastic k-level quantization technique. As demonstrated by the present disclosure, performing the random rotation prior to quantization can significantly reduce the quantization error, thereby leading to improved communication efficiency.
  • After quantization, the client device can transmit the quantized rotated vector to a centralized server computing device. The server computing device can receive multiple vectors from multiple client computing devices. The server computing device can determine a mean of the multiple received vectors and can de-rotate the mean vector using an inverse random rotation matrix.
  • In such fashion, the vectors can be communicated from the clients to the server in a highly efficient manner. In particular, in some implementations, this random rotation technique can reduce the mean square error of the mean estimation significantly by a factor of
  • ( d log d ) .
  • Another aspect of the present disclosure is directed to a variable length coding technique to improve communication efficiency. In particular, subsequent to quantization but prior to transmission of the vector, the client computing device can encode the quantized vector according to a variable length coding scheme (e.g., by computing variable length codes). The server computing device can then decode the received vectors using the variable length coding scheme. This variable length coding technique can reduce the mean squared error by
    Figure US20180089587A1-20180329-P00001
    (d) as compared to a naive approach that includes neither random rotation nor variable length coding. In fact, the present disclosure mathematically demonstrates that this variable length coding approach is communication optimal.
  • The systems and methods of the present disclosure provide a number of technical effects and benefits. As a first example technical effect and benefit, the present disclosure enables transmission of information (e.g., machine-learned model updates) in a more efficient and lower bandwidth manner. For example, the random rotation technique and the variable length coding technique can reduce the number of bits required to represent the information, thereby enabling faster and lower bandwidth transmission of the information. As another example technical effect and benefit, by improving the ability to transmit vectors for aggregation at a central location, the present disclosure enables distributed training techniques in which machine-learned models are trained locally on-device and then a global model is generated or updated based on a mean of the updates resulting from the local training. Thus, the present disclosure enables and enhances such distributed scheme. The distributed training scheme results in better (e.g., more accurate) global models which have improved performance due to their exposure to additional training data. Furthermore, the distributed training scheme that is enhanced by the present disclosure improves user privacy as training data can be retained on the device and is not required to be sent to a central location. As yet another example technical effect and benefit, the present disclosure also provides enhanced security or encryption of information that is transmitted to a central location. For example, through the use of private rotation matrices the transmitted information can be rotated and only entities with the inverse of the private matrix can de-rotate to extract the information.
  • The present disclosure is structured as follows: first, the present disclosure demonstrates that for d dimensional data with n clients, a naive stochastic rounding approach yields a mean squared error (MSE) of Θ(d/n) and uses a constant number of bits per dimension per client. Next, this naive algorithm is extended in two ways: by demonstrating that application of a structured random rotation before quantization reduces the error to
    Figure US20180089587A1-20180329-P00001
    ((log d)/n) and application of a better coding strategy further reduces the error to
    Figure US20180089587A1-20180329-P00001
    (1/n). The present disclosure also demonstrates that the latter coding strategy is optimal up to a constant in the minimax sense. That is, it achieves the best MSE for a given communication cost. In addition, the present disclosure demonstrates the practicality of the algorithms described herein by applying them to distributed Lloyd's algorithm for k-means and power iteration for principal component analysis (PCA).
  • 1.1 Example Discussion of Distributed Mean Aggregation
  • Given n vectors Xn
    Figure US20180089587A1-20180329-P00002
    X1, X2 . . . , Xn ε
    Figure US20180089587A1-20180329-P00003
    d that reside on n clients, the goal of distributed mean estimation is to estimate the mean of the vectors:
  • X _ = def 1 n i = 1 n X i . ( 1 )
  • As described above, this basic estimation problem is used in several learning and optimization tasks where data is distributed across several clients, including, for example, Lloyd's algorithm for k-means clustering (see, e.g., Lloyd, Stuart. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129-137, 1982); principal components analysis (PCA); and the training of large-scale neural networks and other statistical models. For example, in an example scenario of synchronized distributed learning, each client obtains a copy of a global model. The clients then update the model independently based on their local data. The updates (usually in the form of gradients) are then sent to a server, where they are averaged and used to update the global model. A critical step in all of the above algorithms is to estimate the mean of a set of vectors as in Eq. Error! Reference source not found.
  • More generally, however, the communication efficient systems, methods, and techniques of the present disclosure can involve or be performed in any configuration or scenario where data (e.g., in the form of vectors) resides on or is produced by one or more different client devices and is communicated to another device such as, for example, a central server device.
  • In particular, one of the main bottlenecks in distributed algorithms or other scenarios where data is communicated from a client device to a server device is the communication cost, which can be prohibitive for modern applications. For example, communication cost can be significant in example distributed computing systems where each client can be a low-power and/or low-bandwidth device such as, for example, a mobile phone or other mobile computing device, an embedded computing device, a worker device in a datacenter, or other connected smart devices such as intelligent speakers, cameras, home appliances, vehicle computing systems, etc.
  • Given such a wide set of applications, the systems and methods of the present disclosure address the basic problem of achieving the optimal minimax rate in distributed mean estimation with limited communication or other communication constraints (e.g., communication costs).
  • The systems, methods, models, and results of the present disclosure differ from previous works on mean estimation in a number of ways, including the following examples. As a first example, certain previous works assume that the data is generated independent and identically distributed (i.i.d.) according to some distribution. In contrast, the present disclosure does not make any distribution assumptions on data. As a second example difference, the objective in certain prior works is to estimate the mean of the underlying statistical model, while the goal in at least some implementations of the present disclosure is to estimate the empirical mean of the data.
  • 1.2 Example Model
  • In at least some implementations, the communication algorithms provided herein are simultaneous and independent. That is, the clients independently send data to the server and they can transmit at the same time. In any independent communication protocol, each client transmits a function of Xi (say ƒ (Xi)), and a central server estimates the mean by some function of ƒ(Xi), ƒ(X2), . . . , ƒ(Xn). Let π be any such protocol and let
    Figure US20180089587A1-20180329-P00004
    i (π, Xi) be the expected number of transmitted bits by the i-th client during protocol π, where throughout the present disclosure, expectation is over the randomness in protocol π.
  • The total number of bits transmitted by all clients with the protocol π is
  • ( π , X n ) = def i = 1 n i ( π , X i ) .
  • Let the estimated mean be {dot over ({circumflex over (X)})}. For a protocol π, the MSE of the estimate is
  • ( π , X n ) = [ X _ ^ - X _ 2 2 ] .
  • The systems and methods of the present disclosure can be implemented with the use of private and/or public randomness. Private randomness refers to random values that are generated by each machine separately, and public randomness refers to a sequence of random values that are shared among all parties.
  • As one example, in the absence of public randomness, the server can communicate a random seed that can be used by clients to emulate public randomness. More particularly, in some implementations, both the client and the server can have knowledge of a synchronized random seed. The client and the server can each use the random seed (e.g., along with a pseudo-random number generator) to separately produce random values (e.g., a random rotation matrix as will be discussed further below). In one example, the seed can be 32 bits long.
  • The algorithms provided by the present disclosure work for any Xn. However, to measure the minimax performance, without loss of generality, the following is provided with reference to the scenario where each Xi∈Sd, the ball of radius 1 in
    Figure US20180089587A1-20180329-P00003
    d i.e., X∈Sd iff

  • X∥ 2≦1,
  • where ∥X∥2 denotes the l2 norm of the vector X. For a protocol π, the worst case error for all Xn∈Sd is
  • ( π , S d ) = def max X n : X i S d i ( π , X n ) .
  • Let Π(c) denote the set of all protocols with communication cost at most c. The minimax MSE is
  • ɛ ( Π ( c ) , S d ) = def min π Π ( c ) ɛ ( π , S d ) .
  • 1.3 Example Results and Discussion 1.3.1 Example Algorithms
  • The MSE ε(π, Xn) is first analyzed for three algorithms, when
    Figure US20180089587A1-20180329-P00004
    (π, Xn)=Θ(nd), i.e., each client sends a constant number of bits per dimension.
  • Stochastic Uniform Quantization.
  • In Section 2.1, it is first shown that a naive stochastic binary quantization algorithm (denoted by πsb) achieves an MSE of
  • ɛ ( π sb , X n ) = Θ ( d n · 1 n i = 1 n X i 2 2 ) ,
  • and
    Figure US20180089587A1-20180329-P00004
    sb, Xn)=n·(d+
    Figure US20180089587A1-20180329-P00005
    (1)), where
    Figure US20180089587A1-20180329-P00005
    (1) is used to denote
    Figure US20180089587A1-20180329-P00001
    (log(dn)). That is, each client sends one bit per dimension. It is further shown that this bound is tight. In many practical scenarios, d is much larger than n and the above error is prohibitive.
  • A natural way to decease the error is to increase the number of levels of quantization. If k levels of quantization are used, in Theorem 2, the error deceases as
  • ɛ ( π sk , X n ) = ( d n ( k - 1 ) 2 · 1 n i = 1 n X i 2 2 ) . ( 2 )
  • However, the communication cost would increase to
    Figure US20180089587A1-20180329-P00004
    sk, Xn)=n·(d┌log2k┐+
    Figure US20180089587A1-20180329-P00005
    (1)) bits, which can be expensive, if the MSE is desired to be o(d/n).
  • In order to reduce the communication cost, the present disclosure proposes two approaches.
  • Stochastic Rotated Quantization:
  • The present disclosure demonstrates that preprocessing the data by a random rotation reduces the mean squared error. Specifically, in Theorem 3, it is shown that this new scheme (denoted by πsrk) achieves an MSE of
  • ɛ ( π srk , X n ) = ( log d n ( k - 1 ) 2 · 1 n i = 1 n X i 2 2 ) ,
  • and has a communication cost of
    Figure US20180089587A1-20180329-P00004
    srk, Xn)=n·(d┌log2 k┐+
    Figure US20180089587A1-20180329-P00005
    (1)). Note that all logarithms are to base e, unless stated otherwise. Also note that the new scheme achieves much smaller MSE than naive stochastic quantization for the same communication cost.
  • Variable Length Coding:
  • A second approach provided by the present disclosure uses the same quantization as πsk but encodes levels via variable length coding. Instead of using [log2 k] bits per dimension, it is shown that using variable length encoding such as arithmetic coding to compress the data reduces the communication cost significantly. In particular, in Theorem 4 it is shown that there is a scheme (denoted by πsvk) such that
  • ( π svk , X n ) = ( nd ( 1 + log ( k 2 d + 1 ) ) + ~ ( n ) ) , and ɛ ( π svk , X n ) = ɛ ( π sk , X n ) . ( 3 )
  • Hence, setting k=√{square root over (d)} in Eqs. Error! Reference source not found. and Error! Reference source not found. yields
  • ɛ ( π svk , X n ) = ( 1 n · 1 n i = 1 n X i 2 2 ) ,
  • and with Θ(nd) bits of communication i.e., constant number of bits per dimension per client. Of the three protocols, πsvk has the best MSE for a given communication cost. Note that πsvk uses k quantization levels but still uses
    Figure US20180089587A1-20180329-P00005
    (1) bits per dimension per client for all k≦√{square root over (d)}.
  • Theoretically, while variable length coding has better guarantees, stochastic rotated quantization has several practical advantages: it uses fixed length coding and hence can be combined with encryption schemes for privacy preserving secure aggregation. It can also provide lower quantization error in some scenarios due to better constants (see, e.g., Section 7 for details).
  • 1.3.2 Example Minimax MSE
  • In the above protocols, all of the clients transmit the data. According to an aspect of the present disclosure, these protocols can be augmented with a sampling procedure, where only a random fraction of clients transmit data. The present disclosure demonstrates that a combination of k-level quantization, variable length coding, and sampling can be used to achieve information theoretically optimal MSE for a given communication cost. In particular, combining Corollary 1 and Theorem 5 yields the following minimax result:
  • Theorem 1 There exists a universal constant t<1 such that for communication cost c≦ndt and n≧1/t,
  • ɛ ( Π ( c ) , S d ) = Θ ( min ( 1 , d c ) ) .
  • This result shows that the product of communication cost and MSE scales linearly in the number of dimensions.
  • The remainder of the present disclosure is organized as follows. First, the stochastic uniform quantization technique is analyzed in Section 2. In Section 3, a novel stochastic rotated quantization technique is proposed, and in Section 4 arithmetic coding is analyzed. In Section 5, the above algorithms are combined with a sampling technique and the upper bound on the minimax risk is stated. In Section 6, the matching minimax lower bounds are stated. In Section 7 some practical considerations are discussed and the algorithms are applied on distributed power iteration and Lloyd's algorithm. Section 8 provides a proof of Lemma 7. Section 9 describes example computing systems that can implement the techniques described herein. Section 10 describes example methods to implement the techniques described herein and Section 11 provides some additional disclosure.
  • 2. Example Stochastic Uniform Quantization 2.1 Example Stochastic Binary Quantization
  • For a vector Xi, let Xi max=max1≦j≦dXi(j) and similarly let Xi min=min1≦j≦dXi(j). In the stochastic binary quantization protocol πsb, for each client i, the quantized value for each coordinate j is generated independently with private randomness as
  • Y i ( j ) = ( X i m ax with probability X i ( j ) - X i m i n X i ma x - X i m i n , X i m i n otherwise .
  • Observe
    Figure US20180089587A1-20180329-P00006
    Yi(j)=Xi(j). The server estimates X by
  • X _ ^ π sb = 1 n i = 1 n Y i .
  • First, the communication cost of this protocol will be bounded as described below.
  • Lemma 1 There exists an implementation of stochastic binary quantization that uses d+
    Figure US20180089587A1-20180329-P00005
    (1) bits per client and hence
    Figure US20180089587A1-20180329-P00004
    sb,Xn)≦n·(d+
    Figure US20180089587A1-20180329-P00005
    (1)).
  • Proof. Instead of sending vectors Yi, clients transmit two real values Xi max and Xi min (to a desired error) and a bit vector Y′i such that Y′i(j)=1 if Yi=Xi max and 0 otherwise. Hence each client transmits d+2r bits, where r is the number of bits to transmit the real value to a desired error.
  • Let B be the maximum norm of the underlying vectors. To bound r, observe that using r bits, one can represent a number between −B and B to an error of B/2r−1. Thus using 3 log2(dn)+1 bits one can represent the minimum and maximum to an additive error of B/(nd)3. This error in transmitting minimum and maximum of the vector does not affect the calculations and it is ignored for simplicity. Note that in practice, each dimension of Xi is often stored as a 32 bit or 64 bit float, and r should be set as either 32 or 64. In this case, using an even larger r does not further reduce the error.
  • End proof.
  • Next, the estimation error of this protocol is computed as described below.
  • Lemma 2 For any set of vectors Xn,
  • ɛ ( π sb , X n ) = 1 n 2 i = 1 n j = 1 d ( X i ma x - X i ( j ) ) ( X i ( j ) - X i m i n ) . Proof . ɛ ( π sb , X n ) = X _ ^ - X _ 2 2 = 1 n 2 E i = 1 n ( Y i - X i ) 2 2 = 1 n 2 i = 1 n Y i - X i 2 2 ,
  • where the last equality follows by observing that Yi−Xi, ∀i, are independent zero mean random variables. The proof follows by observing that for every i,
  • Y i - X i 2 2 = j = 1 d [ ( Y i ( j ) - X i ( j ) ) 2 ] = j = 1 d ( X i ma x - X i ( j ) ) ( X i ( j ) - X i m i n ) .
  • End proof.
  • Lemma 2 implies the following upper bound.
  • Lemma 3 For any set of vectors Xn,
  • ɛ ( π sb , X n ) d 2 n · 1 n i = 1 n X i 2 2 .
  • Proof. The proof follows by Lemma 2 observing that ∀j
  • ( X i ma x - X i ( j ) ) ( X i ( j ) - X i m i n ) ( X i ma x - X i m i n ) 2 4 , and ( X i m ax - X i m i n ) 2 2 X i 2 2 . ( 4 )
  • End proof.
  • It can also be shown that the above bound is tight:
  • Lemma 4 There exists a set of vectors Xn such that
  • ɛ ( π sb , X n ) d - 2 2 n · 1 n i = 1 n X i 2 2 .
  • Proof. For every i, let Xi be defined as follows. Xi(1)=1/√{square root over (2)}, Xi(2)=−1/√{square root over (2)}, and for all j>2, Xi(j)=0. For every i,
  • X i ma x = 1 2 and X i m i n = - 1 2 .
  • Substituting these bounds in the conclusion of Lemma 2 (which is an equality) yields the theorem.
  • End proof.
  • Therefore, the algorithm proposed in this subsection gives MSE Θ(d/n). Such an error is too large for real-world use. For example, in the application of neural networks, d can be on the order of millions, yet n can be much smaller than that. In such cases, the MSE is even larger than the norm of the vector.
  • 2.2 Example Stochastic k-Level Quantization
  • A natural generalization of binary quantization is k-level quantization. Let k be a positive integer larger than 2. The present disclosure proposes a k-level stochastic quantization scheme πsk to quantize each coordinate. Recall that for a vector Xi, Xi max=max1≦j≦dXi(j) and Xi min=min1≦j≦dXi(j). For every integer r in the range [0,k), let
  • B i ( r ) = def X i m i n + rs i k - 1 ,
  • where si satisfies Xi min+si≧Xi max. A natural choice for si would be Xi max−Xi min. However, as will be demonstrated in Section 4, a higher value of si and variable length coding has better guarantees. The algorithm quantizes each coordinate into one of Bi(r)s stochastically. In πsk, for the i-th client and j-th coordinate, if Xi(j)∈[Bi(r), Bi(r+1)),
  • Y i ( j ) = ( B i ( r + 1 ) with probability X i ( j ) - B i ( r ) B i ( r + 1 ) - B i ( r ) B i ( r ) otherwise .
  • The server estimates X by
  • X _ ^ π sk = 1 n i = 1 n Y i .
  • As before, the communication complexity of this protocol is bounded. The proof is similar to that of Lemma 1 and hence omitted.
  • Lemma 5 There exists an implementation of stochastic k-level quantization that uses d┌log(k)┐+
    Figure US20180089587A1-20180329-P00005
    (1) bits per client and hence
    Figure US20180089587A1-20180329-P00004
    sk,Xn)≦n·(d┌log2 k┐+
    Figure US20180089587A1-20180329-P00005
    (1)).
  • The mean squared loss can be bounded as follows.
  • Theorem 2 If Xi max−Xi min≦si≦√{square root over (2)}∥Xi2 ∀i, then for any Xn, the πsk protocol satisfies,
  • ( π sk , X n ) d 2 n ( k - 1 ) 2 · 1 n i = 1 n X i 2 2 . Proof . ( π sk , X n ) = X _ ^ - X _ 2 2 = 1 n 2 i = 1 n ( Y i - X i ) 2 2 = 1 n 2 i = 1 n Y i - X i 2 2 1 n 2 i = 1 n d s i 2 4 ( k - 1 ) 2 , ( 5 )
  • where the last equality follows by observing Yi(j)−Xi (j) is an independent zero mean random variable with
  • ( Y i ( j ) - X i ( j ) ) 2 s i 2 4 ( k - 1 ) 2 · s i 2 X i 2
  • completes the proof.
  • End proof.
  • This section is concluded by noting that si=Xi max−Xi min satisfies the conditions for the above theorem by Eq. Error! Reference source not found.
  • 3. Example Stochastic Rotated Quantization
  • Next, it is shown that the algorithm of the previous section can be significantly improved by a new protocol. The motivation comes from the fact that the MSE of stochastic binary quantization and stochastic k-level quantization is
  • O ( d n ( X i ma x - X i m i n ) 2 )
  • (the proof of Lemma 3 and Theorem 2 with si=Xi max−Xi min) Therefore the MSE is smaller when XP′ and Xi max are close. For example, when Xi is generated uniformly on the unit sphere, with high probability,
  • X i ma x - X i m i n is ( log d d ) .
  • See, e.g., Dasgupta, Sanjoy and Gupta, Anupam. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms, 22(1):60-65, 2003. In such case, ε(πsk,Xn) is
  • ( log d n )
  • instead of
  • ( d n ) .
  • In this section, it is shown that even without any assumptions on the distribution of the data, Xi max−Xi min can be “reduced” with a structured random rotation, yielding an
  • ( log d n )
  • error. This method can be denominated as stochastic rotated quantization and is denoted herein by πsrk.
  • In some implementations of the stochastic rotated quantization technique, using public randomness, all clients and the central server generate a random rotation matrix (random orthogonal matrix) R∈
    Figure US20180089587A1-20180329-P00003
    d×d according to some known distribution. Let Zi=RXi and Z=RX. In the stochastic rotated quantization protocol πsrk(R), clients quantize the vectors Zi instead of Xi and transmit them similar to πsrk. The server estimates X by
  • X _ ^ π srk = R - 1 Z _ ^ , Z _ ^ = 1 n i = 1 n Y i .
  • The communication cost is same as πsk and is given by Lemma 5. Next the MSE is bounded.
  • Lemma 6 For any Xn, ε(πsrk(R),Xn) is at most
  • d 2 n 2 ( k - 1 ) 2 i = 1 n R [ ( Z i ma x ) 2 + ( Z i m i n ) 2 ] ,
  • where Zi=RXi and for every i, let si=Zi max−Zi min.
  • Proof.
  • ( π srk , X n ) = π X _ ^ - X _ 2 = π R - 1 Z _ ^ - R - 1 Z _ 2 = ( a ) π Z _ ^ - Z _ 2 = ( b ) R π [ Z _ ^ - Z _ 2 | Z 1 n ] d 4 n 2 ( k - 1 ) 2 i = 1 n R [ ( Z i ma x - Z i m i n ) 2 ] ,
  • where the last inequality follows Eq. Error! Reference source not found. and the value of si. (a) follows from the fact that rotation does not change the norm of the vector, and (b) follows from the tower law of expectation. The lemma follows from observing that

  • (Z i max −Z i min)2≦2(Z i max)2+2(Z i min)2.
  • End proof.
  • To obtain strong bounds, an orthogonal matrix R that achieves low (Zi max)2 and (Zi min)2 is beneficial. In addition, due to the fact that d can be huge in practice, a type of orthogonal matrix that permits fast matrix-vector products is also beneficial. Naive orthogonal matrices that support fast multiplication such as block-diagonal matrices often result in high values of (Zi max)2 and (Zi min)2. As such, the present disclosure provides a special type of orthogonal matrix R=HD, where D is a random diagonal matrix with i.i.d. Rademacher entries (±1 with probability 0.5). H is a Walsh-Hadamard matrix. See Horadam, Kathy J. Hadamard matrices and their applications. Princeton university press, 2012. The Walsh-Hadamard matrix of dimension 2m for m∈
    Figure US20180089587A1-20180329-P00007
    is given by the recursive formula,
  • H ( 2 1 ) = [ 1 1 1 - 1 ] , H ( 2 m ) = [ H ( 2 m - 1 ) H ( 2 m - 1 ) H ( 2 m - 1 ) - H ( 2 m - 1 ) ] .
  • Both applying the rotation and inverse rotation take
    Figure US20180089587A1-20180329-P00001
    (d log d) time and
    Figure US20180089587A1-20180329-P00001
    (1) additional space (with an in-place algorithm). The next lemma bounds
    Figure US20180089587A1-20180329-P00006
    (Zi max)2 and
    Figure US20180089587A1-20180329-P00006
    (Zi min)2 for this choice of R. The lemma is similar to that of Ailon, Nir and Chazelle, Bernard. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. STOC, 2006. The proof is provided in Section 8 for completeness.
  • Lemma 7 Let R=HD, where D is a diagonal matrix with independent Radamacher random variables. For every i and every sequence Xn,
  • [ ( Z i min ) 2 ] = [ ( Z i max ) 2 ] X i 2 2 ( 2 log d + 2 ) d .
  • Combining the above two lemmas yields a significant result.
  • Theorem 3 For any Xn, πsrk(HD) protocol satisfies,
  • ɛ ( π srk ( HD ) , X n ) 2 log d + 2 n ( k - 1 ) 2 · 1 n i = 1 n X i 2 2 .
  • 4. Example Variable Length Coding
  • Instead of preprocessing the data via a rotation matrix as in πsrk, in this section the present disclosure provides use of a variable length coding strategy to minimize the number of bits.
  • Consider the stochastic k-level quantization technique. A natural way of transmitting Yi is sending the bin number for each coordinate, thus the total number of bits the algorithm sends per transmitted coordinate would be d┌log2 k┐. This naive implementation is sub-optimal. Instead, The present disclosure proposes to, in some implementations, further encode the transmitted values using universal compression schemes. See, e.g., Krichevsky, R and Trofimov, V. The performance of universal encoding. IEEE Transactions on Information Theory, 27(2):199-207, 1981; and Falahatgar, Moein and Jafarpour, Ashkan and Orlitsky, Alon and Pichapati, Venkatadheeraj and Suresh, Ananda Theertha. Universal compression of power-law distributions. ISIT, 2015.
  • As one example technique that can be performed: encode hr, the number of times each quantized value r has appeared, and then use a variable length coding scheme (e.g., arithmetic or Huffman coding) corresponding to the distribution
  • p r = h r d .
  • This technique is denoted herein by πsvk. Since the vectors are quantized the same way in πsk and πsvk, the MSE of πsvk is also given by Theorem 2. The communication cost is now bounded.
  • Theorem 4 Let si=√{square root over (2)}∥Xi∥. There exists an implementation of πsvk such that
    Figure US20180089587A1-20180329-P00004
    svk, Xn) is at most
  • n ( d ( 2 + log 2 ( ( k - 1 ) 2 2 d + 5 4 ) ) + k log 2 ( d + k ) e k + ~ ( 1 ) ) .
  • Proof. As in Lemma 1,
    Figure US20180089587A1-20180329-P00005
    (1) bits are used to transmit the si's and Xi min. Recall that hr is the number of coordinates that are quantized into bin r, and r takes k possible values. Furthermore, Σr hr=d. Thus the number of bits necessary to represent the hr's is
  • log 2 ( d + k - 1 k - 1 ) k log 2 ( d + k ) e k .
  • Once the hr's have been compressed, variable length coding, such as, for example, arithmetic coding, corresponding to the distribution pr=hr/d can be used to compress and transmit bin values for each coordinate. The total number of bits arithmetic coding uses is
  • d r = 0 k - 1 h r d log 2 d h r + 2.
  • See, MacKay, David J C. Information theory, inference and learning algorithms. Cambridge university press, 2003.
  • Let pr=hr/d, a=(k−1)Xi min, b=si, and β=Σr=0 k−11((a+br)2+δ). Note that
  • r p r log 2 1 p r = r p r log 2 1 / ( ( ( a + br ) 2 + δ ) β ) p r + r p r log 2 ( ( ( a + br ) 2 + δ ) β ) r p r log 2 ( ( ( a + br ) 2 + δ ) β ) log 2 ( r p r ( a + br ) 2 + δ ) + log 2 β ,
  • where the first inequality follows from the positivity of KL-divergence. Choosing δ=si 2, yields β≦4/si 2 and hence,
  • r p r log 2 1 p r log 2 ( r p r ( a + br ) 2 + s i 2 ) + log 2 ( 4 / s i 2 ) .
  • Note that if Yi(j) belongs to bin r, (a+br)2=(k−1)2 Yi 2 (j). Recall that hr is the number of coordinates quantized into bin r. Hence Σr hr (a+br)2 is the scaled norm-square of Yi, i.e.,
  • r h r ( a + br ) 2 = ( k - 1 ) 2 j = 1 d Y i 2 ( j ) = j = 1 d ( ( X i ( j ) + α ( j ) ) ( k - 1 ) ) 2 ,
  • where the α(j)=Yi(j)−Xi(j). Taking expectations on both sides and using the fact that the α(j) are independent zero mean random variables over a range of si/(k−1), provides
  • r h r ( a + br ) 2 = j = 1 d ( X i ( j ) 2 + α ( j ) 2 ) ( k - 1 ) 2 X i 2 2 ( ( k - 1 ) 2 + d 2 ) .
  • Using Jensen's inequality yields the result.
  • End proof.
  • Thus if k=√{square root over (d)}+1, the communication complexity is
    Figure US20180089587A1-20180329-P00001
    (nd) and the MSE is
    Figure US20180089587A1-20180329-P00001
    (1/n).
  • 5. Example Communication MSE Trade-off
  • In the above protocols, all the clients transmit and hence the communication cost scales linearly with n. However, the present disclosure demonstrates that any of the above protocols can be combined by client sampling to obtain trade-offs between the MSE and the communication cost. Note that similar analysis also holds for sampling the coordinates.
  • Let π be a protocol where the mean estimate is of the form:
  • X _ ^ = R - 1 1 n i = 1 n Y i . ( 6 )
  • All three protocols that have been discussed are of this form. Let πp be the protocol where each client participates independently with probability p. The server estimates X by
  • X _ ^ π p = R - 1 · 1 np i S Y i ,
  • where Yis are defined in the previous section and S is the set of clients that transmitted.
  • Lemma 8 For any set of vectors Xn and protocol π of the form Equation Error! Reference source not found. its sampled version πp satisfies
  • ɛ ( π p , X n ) = 1 p · ɛ ( π , X n ) + 1 - p np i = 1 n X i 2 2 . ( π p , X n ) = p · ( π , X n ) .
  • Proof. The proof of communication cost follows from Lemma 5 and the fact that in expectation, np clients transmit. The MSE is now bounded. Let S be the set of clients that transmit. The error ε(πp, Xn) is
  • [ X _ ^ - X _ 2 2 ] = [ 1 np i S R - 1 Y i - X _ 2 2 ] = [ 1 np i S X i - X _ 2 2 + 1 n 2 p 2 i S ( R - 1 Y i - X i ) 2 2 ] ,
  • where the last equality follows by observing that R−1Yi−Xi are independent zero mean random variables and hence for any i,
    Figure US20180089587A1-20180329-P00006
    [(R−1Yi−Xi)T i∈SXiX)]=0. The first term can be bounded as
  • 1 np i S X i - X _ 2 2 = 1 n 2 i = 1 n 1 p X i 1 i S - X i 2 2 = 1 n 2 i = 1 n ( p ( 1 - p ) 2 p 2 X i 2 2 + ( 1 - p ) X i 2 2 ) = 1 - p np · 1 n i = 1 n X i 2 2 .
  • Furthermore, the second term can be bounded as
  • [ 1 n 2 p 2 i S ( R - 1 Y i - X i ) 2 2 ] = ( a ) 1 n 2 p 2 i S [ ( R - 1 Y i - X i ) 2 2 ] = 1 n 2 p 2 i = 1 n [ ( R - 1 Y i - X i ) 2 2 1 i S ] = 1 n 2 p i = 1 n [ ( R - 1 Y i - X i ) 2 2 ] = 1 n 2 p [ i = 1 n ( R - 1 Y i - X i ) 2 2 ] = 1 p ( π , X n )
  • where the last equality follows from the assumption that π's mean estimate is of the form Error! Reference source not found. (a) follows from the fact that R−1Yi−Xi are independent zero mean random variables.
  • End proof.
  • Combining the above lemma with Theorem 4, and choosing k=√{square root over (d)}+1 results in the following.
  • Corollary 1 For every c≦nd(2+log2 (7/4)), there exists a protocol π such that
    Figure US20180089587A1-20180329-P00004
    (π,Sd)≦c and
  • ( π , S d ) = ( min ( 1 , d c ) ) .
  • 6. Example Lower Bounds
  • The lower bound relies on the lower bounds on distributed statistical estimation due to Zhang, Yuchen, Duchi, John, Jordan, Michael I, and Wainwright, Martin J. Information-theoretic lower bounds for distributed statistical estimation with communication constraints. NIPS, 2013.
  • Lemma 9 (Zhang et al., 2013 Proposition 2) There exists a set of distributions
    Figure US20180089587A1-20180329-P00008
    d supported on
  • [ - 1 d , 1 d ] d
  • such that if any centralized server wishes to estimate the mean of the underlying unknown distribution, then for any independent protocol π
  • max p d d [ θ ( p d ) - θ ^ π 2 2 ] t min ( 1 , d ( π ) ) ,
  • where
    Figure US20180089587A1-20180329-P00004
    (π) is the communication cost of the protocol, θ (pd) is the mean of pd and t is a positive constant.
  • Theorem 5 Let t be the constant in Lemma 9. For every c≦ndt/4 and n≧4/t,
  • ( Π ( c ) , S d ) t 4 min ( 1 , d c ) .
  • Proof. Given n samples from the underlying distribution where each sample belongs to Sd, it is easy to see that
  • [ θ ( p d ) - θ ^ π 2 2 ] 1 n ,
  • where {circumflex over (θ)}(pd) is the empirical mean of the observed samples. Let
    Figure US20180089587A1-20180329-P00008
    d be the set of distributions in Lemma 9. Hence for any protocol π there exists a distribution pd such that
  • θ ^ ( p d ) - θ ^ π 2 2 ( a ) 1 2 θ ( p d ) - θ ^ π 2 2 - θ ( p d ) - θ ^ ( p d ) 2 2 ( b ) t 2 min ( 1 , d ( π ) ) - 1 n ( c ) t 4 min ( 1 , d ( π ) ) ,
  • Inequality (a) follows from the fact that 2(a−b)2+2(b−c)2≧(a−c)2. Inequality (b) follows from Lemma 9 and (c) follows from the fact that
    Figure US20180089587A1-20180329-P00004
    (π,Sd)≦ndt/4 and n≧4/t.
  • End proof.
  • Corollary 1 and Theorem 5 yield Theorem 1. Note that the above lower bound holds only for communication cost c<
    Figure US20180089587A1-20180329-P00001
    (nd).
  • 7. Example Practical Considerations and Applications
  • Based on the theoretical analysis, the variable-length coding method provides the lowest quantization error asymptotically when using a constant number of bits. However in practice, stochastic rotated quantization may be preferred due to (hidden) constant factors and the fact that it uses a fixed amount of bits per dimension. For example, considering quantizing the vector [−1,1,0,0], stochastic rotated quantization can use 1 bit per dimension and gives zero error, whereas the other two protocols do not. To see this, observe that the naive quantization will quantize 0 to either 1 or −1 and variable length coding cannot achieve 0 error with 1 bit per dimension due to its constant factors.
  • In addition, note that the rotated quantization is preferred when applied on “unbalanced” data, due to the fact that the rotation can correct the unbalancedness. This is demonstrated by generating a dataset where the value of the last feature dimension entry is much larger than others. As an example of such dataset, 1000 datapoints were generated each with 256 dimensions. The first 255 dimensions are generated i.i.d. from N(0,1), and the last dimension is generated from N(100,1). As shown in FIG. 1, the rotated stochastic quantization has the best performance on this example dataset. The improvement is especially significant for low bit rate cases. In particular, FIG. 1 illustrates distributed mean estimation on data generated from a Gaussian distribution.
  • Two example applications are demonstrated in the rest of this section. The experiments are performed on the MNIST (d=2304) and CIFAR (d=512) datasets.
  • Distributed Lloyd's Algorithm.
  • In the distributed Lloyd's (k-means) algorithm, each client has access to a subset of data points. In each iteration, the server broadcasts the cluster centers to all the clients. Each client updates the centers based on its local data, and sends the centers back to the server. The server then updates the centers by computing the weighted average of the centers sent from all clients. In the quantized setting proposed by the present disclosure, the client compresses the new centers before sending to the server. This saves the uplink communication cost, which is often the bottleneck of distributed learning. Furthermore, in this setting, the downlink is a broadcast, and therefore its cost can be reduced by a factor of O(n/log n) without quantization, where n is the number of clients.
  • FIGS. 2A-D show example results in which both the number of centers and the number of clients are set to 10. In particular, FIGS. 2A-D illustrate Lloyd's algorithm with different types of quantizations. Two settings are illustrated as tested: 16 quantization levels and 32 quantizations levels. The x-axis is the average number of bits sent for each data dimensions, and the y-axis is the global objective of Lloyd's algorithm.
  • Distributed Power Iteration.
  • Power iteration is a widely used method to compute the top eigenvector of a matrix. In the distributed setting, each client has access to a subset of data. In each iteration, the server broadcasts the current estimate of the eigenvector to all clients. Each client then updates the eigenvector based on one power iteration on its local data, and sends the updated eigenvector back to the server. The server updates the eigenvector by computing the weighted average of the eigenvectors sent by all clients.
  • As one example of power iteration, given a set of vectors A1, A2, . . . Am, at each round t, compute:
  • v t + 1 = 1 m i = 1 m A i A i T v t
  • Data points A1, A2, . . . Am are distributed across n clients. At each round of power iteration t, a client sends:
  • j Client i A j A j T v t
  • The server can average the received information to compute vt+1.
  • Similar to the above distributed Lloyd's algorithm, in the quantized setting proposed by the present disclosure, the client compresses the estimated eigenvector before sending to the server. FIGS. 3A-D show the results. The dataset is distributed over 100 clients.
  • In particular, FIGS. 3A-D illustrate power iteration with different types of quantizations. Two settings were tested: 16 quantization levels and 32 quantization levels. The x-axis is the average number of bits sent for each data dimension and they y-axis is the l2 distance between the computed eigenvector and the ground-truth vector.
  • For both distributed Lloyd's and distributed power iteration applications, variable-length coding achieves the lowest quantization error in most of the settings. Furthermore, for low-bit rate, stochastic rotated quantization is competitive with variable-length coding.
  • 8. Proof of Lemma 7
  • The equality follows from the symmetry in HD. To prove the upper bound, observe that:

  • Figure US20180089587A1-20180329-P00006
    [(Z i max)2]=Var(Z i max)+(
    Figure US20180089587A1-20180329-P00006
    [Z i max])2.
  • Let D(j) be the jth diagonal entry of D. To bound the first term observe that Zi max is a function of d independent random variables D(1), D (2), . . . D(d). Changing D(j) changes the Zi max by at most
  • 2 X i ( j ) d .
  • Hence, applying Efron-Stein variance bound yields
  • - 0.1 cm Var ( Z i ma x ) j = 1 d 4 X i 2 ( j ) 2 d = 2 X i 2 2 d .
  • To bound the second term, observe that for every β>0.
  • - 0.1 cm β Z i ma x = log exp ( β Z i ma x ) log ( j = 1 d e β Z i ( j ) ) .
  • Note that
  • Z i ( k ) = 1 d j = 1 d D ( j ) H ( k , j ) X i ( j ) .
  • Since the D(j)'s are Radamacher random variables and |H(k,j)|=1 for all k, j, the distributions of Zi(k) is same for all k. Hence by Jensen's inequality,
  • [ Z i ma x ] 1 β [ log ( j = 1 d e β Z i ( j ) ) ] 1 β log ( j = 1 d [ e β Z i ( j ) ] ) = 1 β log ( d [ e β Z i ( 1 ) ] ) . Since Z i ( 1 ) = 1 d j = 1 d D ( j ) X i ( j ) , [ e β Z i ( 1 ) ] = [ e β j D ( j ) X i ( j ) d ] = ( a ) j = 1 d [ e β D ( j ) X i ( j ) d ] = j = 1 d e - β X i ( j ) / d + e β X i ( j ) / d 2 ( b ) j = 1 d e β 2 X 2 ( j ) / 2 d = e β 2 X i 2 2 / 2 d ,
  • where (a) follows from the fact that the D(i)'s are independent and (b) follows from the fact that ea+e−a≦2ea 2 /2 for any a. Hence,
  • [ Z i ma x ] min β 0 log d β + β X i 2 2 2 d 2 X i 2 log d 2 d .
  • 9. Example Computing Systems
  • FIG. 4 depicts an example system 200 for distributed computing. System 200 can include a server computing device 210 and a plurality of client computing devices 230.
  • In some implementations, and as one example application, the server computing device 210 can be configured to access a global machine-learned model and to provide the global model to the plurality of client devices 230. The model can be, for instance, a linear regression model, logistic regression model, a support vector machine model, a neural network (e.g. convolutional neural network, recurrent neural network, etc.), or other machine-learned model. In some implementations, the sever 210 can be configured to communicate with client devices 230 over one or more networks 280.
  • Client devices 230 can each be configured to determine one or more local updates associated with the model based at least in part on locally stored data 236. The locally stored data 236 can include audio files, image files, video files, log entries, and/or various other suitable data. In some implementations, the data 236 can be any data derived through a user interaction with a client device 230. Thus, the data 236 across the plurality of devices 230 includes data that is respectively stored at each device 230. Thus, in some implementations, the collective data 236 across all devices 230 is highly unbalanced and not independent and identically distributed.
  • Client devices 230 can be configured to provide the local updates to the server 210. The data 236 may be privacy sensitive. In this manner, the local updates can be performed and provided to server 210 without compromising the privacy of data 236. For instance, in such implementations, data 236 is not itself provided to server 210 since the local update does not include the actual data 236. In some implementations, one or more of encryption techniques, random noise techniques, and/or other security techniques can be added to the training process to obscure any inferable information from the local updates.
  • As indicated above, server 210 can receive each local update from client device 230, and can aggregate the local updates to determine a global update to the machine-learned model. In some implementations, server 210 can determine a weighted average or other mean of the local updates and determine the global update based at least in part on the average.
  • In some implementations, scaling or other techniques can be applied to the local updates to determine the global update. For instance, a local step size can be applied for each client device 230, the aggregation can be performed proportionally to various data partition sizes of client devices 230, and/or one or more scaling factors can be applied to the local and/or aggregated updates. It will be appreciated that various other techniques can be applied without deviating from the scope of the present disclosure.
  • More generally, FIG. 4 depicts an example computing system 200 that can be used to implement the methods and systems of the present disclosure. The system 200 can be implemented using a client-server architecture that includes a server 210 that communicates with one or more client devices 230 over a network 280.
  • The system 200 includes a server 210, such as a web server. The server 210 can be implemented using any suitable computing device(s). The server 210 can have one or more processors 212 and one or more memory devices 214. The server 210 can be implemented using one server device or a plurality of server devices. In implementations in which a plurality of devices are used, such plurality of devices can operate according to a parallel computing architecture, a sequential computing architecture, or a combination thereof.
  • The server 210 can also include a network interface 236 used to communicate with the client devices 230 over the network 280. The network interface can include any suitable components for interfacing with one more networks, including for example, transmitters, receivers, ports, controllers, antennas, or other suitable components.
  • The one or more processors 212 can include any suitable processing device, such as a microprocessor, microcontroller, integrated circuit, logic device, or other suitable processing device. The one or more memory devices 214 can include one or more computer-readable media, including, but not limited to, non-transitory computer-readable media, RAM, ROM, hard drives, flash drives, or other memory devices. The one or more memory devices 214 can store information accessible by the one or more processors 212, including computer-readable instructions 218 that can be executed by the one or more processors 212.
  • The instructions 218 can be any set of instructions that when executed by the one or more processors 212, cause the one or more processors 212 to perform operations.
  • The server 210 can further include a de-rotator 220. The de-rotator 220 can de-rotate a vector that has been rotated by a client device 230. For example, the de-rotater can use an inverse random rotation matrix that is an inverse of a random rotation matrix used by the client device 230 to rotate the vector. Alternatively, a mean of several rotated vectors can be determined prior to de-rotation and the de-rotator can de-rotated the determined mean. In some implementations, the de-rotator 220 can determine one or more inverse random rotation matrices based at least in part on one or more seeds that are shared (e.g., respectively shared or universally shared) with the client devices 230.
  • The server 210 can further include a decoder 222. The decoder 222 can decode a vector that has been encoded by a client device 230 (e.g., according to one of the encoding techniques discussed above). For example, the decoder can decode a vector that has been encoded according to variable length coding techniques such as, for example, Huffman coding or arithmetic coding.
  • The server 210 can further include a mean calculator 224. The mean calculator 224 can be configured to receive a plurality of vectors (e.g., decoded vectors, rotated vectors, and/or de-rotated vectors) and to determine a mean of the vectors (e.g., a mean vector). In some implementations in which the mean is determined prior to de-rotation, the mean vector can subsequently be de-rotated by the de-rotator 220.
  • As shown in FIG. 4, the one or more memory devices 214 can also store data 216 that can be retrieved, manipulated, created, or stored by the one or more processors 212. The data 216 can include, for instance, local updates, global parameters, and other data. The data 216 can be stored in one or more databases. The one or more databases can be connected to the server 210 by a high bandwidth LAN or WAN, or can also be connected to server 210 through network 280. The one or more databases can be split up so that they are located in multiple locales.
  • The server 210 can exchange data with client devices 230 over the network 280. Any number of client devices 230 can be connected to the server 210 over the network 280. Each of the client devices 230 can be any suitable type of computing device, such as a general purpose computer, special purpose computer, laptop, desktop, mobile device, navigation system, intelligent speaker or home assistant, home appliance, smartphone, tablet, computing device that is able to be worn, gaming console, worker device in a datacenter, a display with one or more processors, or other suitable computing device.
  • Similar to the server 210, a client device 230 can include one or more processor(s) 232 and a memory 234. The one or more processor(s) 232 can include, for example, one or more central processing units (CPUs), graphics processing units (GPUs) dedicated to efficiently rendering images or performing other specialized calculations, Tensor processing units (TPUs), and/or other processing devices. The memory 234 can include one or more computer-readable media and can store information accessible by the one or more processors 232, including instructions 238 that can be executed by the one or more processors 232 and data 236.
  • The client computing device 230 can include a vector calculator 240 that is implementable to determine one or more vectors (e.g., local updates) according to example aspects of the present disclosure. For example, in some example applications, the vector calculator 240 can perform one or more training techniques such as, for example, backwards propagation of errors to re-train or otherwise update a machine-learned model based on the locally stored data 236, thereby generating an update vector (e.g., a gradient). The vector calculator 240 can be included in an application or can be included in the operating system of the device 230. In other implementations, the vector calculator 240 can be any component or system that determines a vector to be transmitted to the server computing device 210.
  • The client computing device 230 can further include a rotater 242. The rotater 242 can rotate a vector by a random rotation matrix (e.g., by multiplying the vector by the matrix). In some implementations, the rotater 242 can determine the random rotation matrix based on a seed.
  • The client computing device 230 can further include a quantizer 246. The quantizer 246 can quantize a vector. For example, the quantizer 246 can perform stochastic binary quantization or stochastic k-level quantization as described above to quantize the vector.
  • The client computing device 230 can further include an encoder 248. For example, the encoder 248 can perform one or more of the encoding techniques described above (e.g., variable length coding such as, for example, Huffman coding or arithmetic coding).
  • The data 236 can include data examples to be used in solving one or more optimization problems. In some applications, the data examples of each client device 230 can be distributed unevenly among the client devices, such that no client device 230 includes a representative sample of the overall distribution of the training data examples. The data 236 can further include a vector to be communicated to the server 210.
  • The client device 230 of FIG. 4 can include various input/output devices for providing and receiving information from a user, such as a touch screen, touch pad, data entry keys, speakers, and/or a microphone suitable for voice recognition.
  • The client device 230 can also include a network interface 250 used to communicate with one or more remote computing devices (e.g. server 210) over the network 280. The network interface 250 can include any suitable components for interfacing with one more networks, including for example, transmitters, receivers, ports, controllers, antennas, or other suitable components.
  • The network 280 can be any type of communications network, such as a local area network (e.g. intranet), wide area network (e.g. Internet), cellular network, or some combination thereof. The network 280 can also include a direct connection between a client device 230 and the server 210. In general, communication between the server 210 and a client device 230 can be carried via network interface using any type of wired and/or wireless connection, using a variety of communication protocols (e.g. TCP/IP, HTTP, SMTP, FTP), encodings or formats (e.g. HTML, XML), and/or protection schemes (e.g. VPN, secure HTTP, SSL).
  • The vector calculator 240, the rotater 242, the quantizer 246, the encoder 248, the de-rotater 220, the decoder 222, and the mean calculator 224 can include computer logic utilized to provide desired functionality. Thus, each of vector calculator 240, the rotater 242, the quantizer 246, the encoder 248, the de-rotater 220, the decoder 222, and the mean calculator 224 can be implemented in hardware, firmware and/or software controlling a general purpose processor. In some implementations, each of vector calculator 240, the rotater 242, the quantizer 246, the encoder 248, the de-rotater 220, the decoder 222, and the mean calculator 224 includes program code files stored on the storage device, loaded into memory and executed by a processor or can be provided from computer program products, for example, computer executable instructions that are stored in a tangible computer-readable storage medium such as, for example, a RAM disk or card or other computer-readable optical or magnetic media.
  • Furthermore, while the techniques of the present disclosure are discussed primarily with reference to vectors to be transmitted or uploaded, the techniques described herein can also be applied to other data structures as well. As an example, for any other type of data to be transmitted by the client computing device 230 (e.g., a matrix (2D), a tensor (3D and above), or other data types or structures), the client computing device 230 (e.g., the vector calculator 240) can first flatten the data structure to form a vector. The techniques described herein (e.g., random rotation, probabilistic quantization, and/or variable length coding) can then be applied to the vector. After the entire process (e.g., after de-rotation, decoding, and/or mean estimation), the server computing device 210 can re-shape the resulting vector (e.g., a mean vector) back to the original dimension(s).
  • Thus, the present disclosure can be generalized to other data structures through conversion (e.g., flattening) of the other data structure to vector format prior to rotation, quantization, encoding, etc. After de-rotation, decoding, etc., the de-rotated or decoded vector can be reshaped back into the original dimension(s).
  • 10. Example Methods
  • FIG. 5 depicts a swim lane flow diagram of an example method 500 to perform a stochastic rotated quantization technique according to example embodiments of the present disclosure. In particular, the left-hand side of FIG. 5 illustrates operations performed by each of one or more client computing devices while the right-hand side of FIG. 5 illustrates operations performed by a server computing device.
  • At 502, a client computing device obtains a vector. The vector can be any vector that is to be transmitted to the server computing device. Example vectors include a machine-learned model update vector that describes one or more parameters of a machine-learned model or one or more updates to the one or more parameters of the machine-learned model; a cluster vector that describes a plurality of cluster centers or a plurality of updates to the plurality of cluster centers; and a power iteration vector that describes an eigenvector. In some implementations, obtaining the vector can include computing the vector based on a local dataset that is stored locally at the client computing device.
  • At 504, the client computing device rotates the vector by a random rotation matrix to obtain a rotated vector. As an example, in some implementations, the random rotation matrix can be a product of a Walsh-Hadamard matrix with a diagonal matrix. In some implementations, the diagonal matrix can include independent and identically distributed Rademacher entries.
  • In some implementations, the method 500 can further include obtaining a seed that is shared with the server computing device. The method 500 can include generating the random rotation matrix based at least in part on the seed.
  • At 506, the client computing device performs probabilistic quantization of the rotated vector to obtain a quantized rotated vector. For example, in some implementations, performing probabilistic quantization at 506 can include performing stochastic binary quantization or performing stochastic k-level quantization.
  • In some implementations, performing probabilistic quantization at 506 can include determining a value for each of a number of quantization levels based at least in part on a magnitude of the rotated vector and a minimum coordinate value included in the rotated vector; and quantizing each coordinate of the rotated vector into one of the number of quantization levels.
  • At 508, the client computing device transmits the quantized rotated vector to the server computing device. In some implementations, various additional encodings, cryptography, and/or privacy preserving manipulations can be performed prior to transmission.
  • At 510, the server computing device receives the quantized rotated vectors from the client computing devices. In some implementations, at 510, the server computing device can decode or unquantize each quantized rotated vector to the extent possible. For example, this may include transforming references or codings into particular data entries to which the references refer.
  • At 512, the server computing device determines a mean of all the quantized rotated vectors received from all of the client computing devices to obtain a mean rotated vector.
  • At 514, the server computing device de-rotates the mean rotated vector by an inverse random rotation matrix to obtain a mean de-rotated vector. For example, the inverse random rotation matrix can be the inverse of the random rotation matrix used at 504.
  • In some implementations, method 500 can include obtaining, by the server computing device, a seed, where the seed is shared with at least one of the client computing devices from which a quantized rotated vector is received. For example, each client computing device can have a different seed or seeds can be used by multiple client computing devices. The method 500 can include generating, by the server computing device, the inverse random rotation matrix based at least in part on the seed.
  • In some implementations, the operations illustrated at 512 and 514 can be performed in reverse order, such that the vectors are individually de-rotated prior to taking the mean.
  • In some implementations, the method 500 can further include performing, by the server computing device, a global update based on the mean de-rotated vector.
  • FIG. 6 depicts a swim lane flow diagram of an example method 600 to perform a variable length coding technique according to example embodiments of the present disclosure. In particular, the left-hand side of FIG. 6 illustrates operations performed by each of one or more client computing devices while the right-hand side of FIG. 6 illustrates operations performed by a server computing device.
  • At 602, a client computing device obtains a vector. The vector can be any vector that is to be transmitted to the server computing device. Example vectors include a machine-learned model update vector that describes one or more parameters of a machine-learned model or one or more updates to the one or more parameters of the machine-learned model; a cluster vector that describes a plurality of cluster centers or a plurality of updates to the plurality of cluster centers; and a power iteration vector that describes an eigenvector. In some implementations, obtaining the vector can include computing the vector based on a local dataset that is stored locally at the client computing device.
  • At 604, the client computing device performs probabilistic quantization of the vector to obtain a quantized vector. For example, in some implementations, performing probabilistic quantization at 604 can include performing stochastic binary quantization or performing stochastic k-level quantization.
  • In some implementations, performing probabilistic quantization at 604 can include determining a value for each of a number of quantization levels based at least in part on a magnitude of the vector to be quantized and a minimum coordinate value included in the vector; and quantizing each coordinate of the vector into one of the number of quantization levels.
  • At 606, the client computing device encodes the quantized vector according to a variable length coding scheme to obtain an encoded quantized vector. For example, the variable length coding scheme can include Huffman coding or arithmetic coding.
  • At 608, the client computing device transmits the encoded quantized vector to the server computing device.
  • At 610, the server computing device receives the encoded quantized vectors from the client computing devices.
  • At 612, the server computing device decodes each encoded quantized vector according to the variable length coding scheme to obtain the quantized vector.
  • In addition, in some implementations, at 612 and after decoding with the variable length coding scheme, the server computing device can decode or unquantize each quantized rotated vector to the extent possible. For example, this may include transforming references or codings into particular data entries to which the references refer.
  • At 614, the server computing device determines a mean of all the quantized vectors received from all of the client computing devices to obtain a mean vector.
  • In some implementations, the method 600 can further include performing, by the server computing device, a global update based on the mean de-rotated vector.
  • 11. Additional Disclosure
  • The technology discussed herein makes reference to servers, databases, software applications, and other computer-based systems, as well as actions taken and information sent to and from such systems. The inherent flexibility of computer-based systems allows for a great variety of possible configurations, combinations, and divisions of tasks and functionality between and among components. For instance, processes discussed herein can be implemented using a single device or component or multiple devices or components working in combination. Databases and applications can be implemented on a single system or distributed across multiple systems. Distributed components can operate sequentially or in parallel.
  • While the present subject matter has been described in detail with respect to various specific example embodiments thereof, each example is provided by way of explanation, not limitation of the disclosure. Those skilled in the art, upon attaining an understanding of the foregoing, can readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, the subject disclosure does not preclude inclusion of such modifications, variations and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. For instance, features illustrated or described as part of one embodiment can be used with another embodiment to yield a still further embodiment. Thus, it is intended that the present disclosure cover such alterations, variations, and equivalents.

Claims (20)

What is claimed is:
1. A computing system to facilitate transmission of machine-learned model updates from client devices to a centralized server computing device, the computing system comprising:
one or more client computing devices, wherein each client computing device comprises one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the client computing device to perform operations, the operations comprising:
determining an update to a machine-learned model based at least in part on a local dataset stored at the client computing device;
rotating the update by a random rotation matrix to obtain a rotated update;
performing probabilistic quantization of the rotated update to obtain a quantized rotated update; and
transmitting the quantized rotated update to the centralized server computing device.
2. The computing system of claim 1, further comprising:
the centralized server computing device, the centralized server computing device comprising one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the centralized server computing device to perform operations, the operations comprising:
receiving the one or more quantized rotated updates respectively transmitted by the one or more client computing devices; and
determining a mean rotated update of all quantized rotated updates obtained for all of the one or more client computing devices;
de-rotating the mean rotated update by an inverse random rotation matrix to obtain a mean de-rotated update;
updating a global machine-learned model based at least in part on the mean de-rotated update.
3. The computing system of claim 1, wherein the update to the machine-learned model comprises a gradient associated with training of the machine-learned model.
4. The computing system of claim 1, wherein the random rotation matrix comprises a product of a Walsh-Hadamard matrix with a diagonal matrix.
5. The computing system of claim 4, wherein the diagonal matrix comprises independent and identically distributed Rademacher entries.
6. A computing system, comprising:
one or more client computing devices, wherein each client computing device comprises one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the client computing device to perform operations, the operations comprising:
obtaining a vector;
rotating the vector by a random rotation matrix to obtain a rotated vector;
performing probabilistic quantization of the rotated vector to obtain a quantized rotated vector; and
transmitting the quantized rotated vector.
7. The computing system of claim 6, further comprising:
a server computing device that comprises one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the server computing device to perform operations, the operations comprising:
receiving the quantized rotated vectors transmitted by the client computing devices; and
determining a mean of all quantized rotated vectors obtained for all of the one or more client computing devices; and
de-rotating the mean by an inverse random rotation matrix.
8. The computing system of claim 6, wherein the random rotation matrix comprises a product of a Walsh-Hadamard matrix with a diagonal matrix.
9. The computing system of claim 8, wherein the diagonal matrix comprises independent and identically distributed Rademacher entries.
10. The computing system of claim 6, wherein the operations further comprise:
obtaining a seed, wherein the seed is shared with a server computing device to which the quantized rotated vector is transmitted; and
generating the random rotation matrix based at least in part on the seed.
11. The computing system of claim 6, wherein performing probabilistic quantization of the rotated vector to obtain a quantized rotated vector comprises:
performing stochastic binary quantization of the rotated vector to obtain the quantized rotated vector; or
performing stochastic k-level quantization of the rotated vector to obtain the quantized rotated vector.
12. The computing system of claim 6, wherein performing probabilistic quantization of the rotated vector comprises:
determining a value for each of a number of quantization levels based at least in part on a magnitude of the rotated vector and a minimum coordinate value included in the rotated vector; and
quantizing each coordinate of the rotated vector into one of the number of quantization levels.
13. The computing system of claim 6, wherein the vector comprises one or more of:
a machine-learned model update vector that describes one or more parameters of a machine-learned model or one or more updates to the one or more parameters of the machine-learned model;
a cluster vector that describes a plurality of cluster centers or a plurality of updates to the plurality of cluster centers; and
a power iteration vector that describes an eigenvector.
14. The computing system of claim 6, wherein obtaining the vector comprises computing the vector based on a local dataset that is stored locally at the client computing device.
15. The computing system of claim 1, wherein each of the client computing devices comprises:
a mobile computing device;
a worker device in a datacenter;
an embedded computing device; or
a connected smart device.
16. A computing system, comprising:
one or more client computing devices, wherein each client computing device comprises one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the client computing device to perform operations, the operations comprising:
obtaining a vector;
performing probabilistic quantization of the vector to obtain a quantized vector, wherein performing probabilistic quantization of the vector comprises:
determining a value for each of a number of quantization levels based at least in part on a magnitude of the vector and a minimum coordinate value included in the vector; and
quantizing each coordinate of the vector into one of the number of quantization levels.
17. The computing system of claim 16, wherein the operations further comprise:
encoding the quantized vector according to a variable length coding scheme; and
transmitting the encoded quantized vector.
18. The computing system of claim 17, further comprising:
a server computing device that comprises one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors cause the server computing device computing to perform operations, the operations comprising:
for each of the one or more client computing devices:
receiving the encoded quantized vector transmitted by the client computing device; and
decoding the encoded quantized vector according to the variable length coding scheme to obtain the quantized vector for such client computing device; and
determining a mean of all quantized vectors obtained for all of the one or more client computing devices.
19. The computing system of claim 17, wherein the variable length coding scheme comprises Huffman coding or arithmetic coding.
20. The computing system of claim 17, wherein the vector comprises one or more of:
a machine-learned model update vector that describes one or more parameters of a machine-learned model or one or more updates to the one or more parameters of the machine-learned model;
a cluster vector that describes a plurality of cluster centers or a plurality of updates to the plurality of cluster centers; and
a power iteration vector that describes an eigenvector.
US15/676,076 2016-09-26 2017-08-14 Systems and Methods for Communication Efficient Distributed Mean Estimation Abandoned US20180089587A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US15/676,076 US20180089587A1 (en) 2016-09-26 2017-08-14 Systems and Methods for Communication Efficient Distributed Mean Estimation
US15/708,793 US11196800B2 (en) 2016-09-26 2017-09-19 Systems and methods for communication efficient distributed mean estimation
US17/502,794 US11785073B2 (en) 2016-09-26 2021-10-15 Systems and methods for communication efficient distributed mean estimation
US18/240,799 US12219004B2 (en) 2016-09-26 2023-08-31 Systems and methods for communication efficient distributed mean estimation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662400019P 2016-09-26 2016-09-26
US15/676,076 US20180089587A1 (en) 2016-09-26 2017-08-14 Systems and Methods for Communication Efficient Distributed Mean Estimation

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/708,793 Continuation-In-Part US11196800B2 (en) 2016-09-26 2017-09-19 Systems and methods for communication efficient distributed mean estimation

Publications (1)

Publication Number Publication Date
US20180089587A1 true US20180089587A1 (en) 2018-03-29

Family

ID=59982468

Family Applications (4)

Application Number Title Priority Date Filing Date
US15/676,076 Abandoned US20180089587A1 (en) 2016-09-26 2017-08-14 Systems and Methods for Communication Efficient Distributed Mean Estimation
US16/335,695 Active US10657461B2 (en) 2016-09-26 2017-09-07 Communication efficient federated learning
US16/850,053 Active 2039-03-05 US11763197B2 (en) 2016-09-26 2020-04-16 Communication efficient federated learning
US18/365,734 Pending US20230376856A1 (en) 2016-09-26 2023-08-04 Communication Efficient Federated Learning

Family Applications After (3)

Application Number Title Priority Date Filing Date
US16/335,695 Active US10657461B2 (en) 2016-09-26 2017-09-07 Communication efficient federated learning
US16/850,053 Active 2039-03-05 US11763197B2 (en) 2016-09-26 2020-04-16 Communication efficient federated learning
US18/365,734 Pending US20230376856A1 (en) 2016-09-26 2023-08-04 Communication Efficient Federated Learning

Country Status (6)

Country Link
US (4) US20180089587A1 (en)
EP (3) EP4276711A3 (en)
CN (2) CN113837357B (en)
DE (2) DE202017105829U1 (en)
GB (1) GB2556981A (en)
WO (1) WO2018057302A1 (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180336166A1 (en) * 2017-05-22 2018-11-22 Government Of The United States, As Represented By The Secretary Of The Air Force Method for Predicting Stochastic Output Performance or Scaling Stochastic Inputs
CN109034398A (en) * 2018-08-10 2018-12-18 深圳前海微众银行股份有限公司 Feature selection approach, device and storage medium based on federation's training
CN109165683A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Sample predictions method, apparatus and storage medium based on federation's training
CN110263936A (en) * 2019-06-14 2019-09-20 深圳前海微众银行股份有限公司 Laterally federation's learning method, device, equipment and computer storage medium
CN110472745A (en) * 2019-08-06 2019-11-19 深圳前海微众银行股份有限公司 Information transferring method and device in a kind of federal study
GB202001468D0 (en) 2020-02-04 2020-03-18 Tom Tom Navigation B V Navigation system
CN111126613A (en) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 Methods, apparatus and computer program products for deep learning
KR20200062014A (en) * 2018-11-26 2020-06-03 삼성전자주식회사 Apparatus for accelerating neural network using weight with dyadic matrix form and operation method thereof
CN111324813A (en) * 2020-02-20 2020-06-23 深圳前海微众银行股份有限公司 Recommended method, apparatus, apparatus, and computer-readable storage medium
CN111352799A (en) * 2020-02-20 2020-06-30 中国银联股份有限公司 Inspection method and device
WO2020134704A1 (en) * 2018-12-28 2020-07-02 深圳前海微众银行股份有限公司 Model parameter training method based on federated learning, terminal, system and medium
WO2020188004A1 (en) * 2019-03-18 2020-09-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Methods and apparatuses for compressing parameters of neural networks
WO2020225772A1 (en) * 2019-05-07 2020-11-12 Imagia Cybernetics Inc. Method and system for initializing a neural network
US20210357723A1 (en) * 2018-11-06 2021-11-18 Nippon Telegraph And Telephone Corporation Distributed Processing System and Distributed Processing Method
US20210383197A1 (en) * 2020-06-04 2021-12-09 EMC IP Holding Company LLC Adaptive stochastic learning state compression for federated learning in infrastructure domains
US20210382962A1 (en) * 2015-10-16 2021-12-09 Google Llc Systems and Methods of Distributed Optimization
US11438348B2 (en) 2020-03-27 2022-09-06 Interset Software, Inc. Efficient determination of expected maximum for anomaly detection
US11455572B2 (en) * 2018-08-24 2022-09-27 Servicenow (Canada) Inc. Machine learning model hardware configuration based optimization
US20220391664A1 (en) * 2021-06-03 2022-12-08 Samsung Electronics Co., Ltd. Method and apparatus with neural network quantization
US11551083B2 (en) 2019-12-17 2023-01-10 Soundhound, Inc. Neural network training from private data
US11562046B2 (en) 2018-11-26 2023-01-24 Samsung Electronics Co., Ltd. Neural network processor using dyadic weight matrix and operation method thereof
US11681913B2 (en) 2019-08-02 2023-06-20 Samsung Electronics Co., Ltd. Method and system with neural network model updating
WO2024007156A1 (en) * 2022-07-05 2024-01-11 华为技术有限公司 Communication method, and apparatus
WO2024025444A1 (en) * 2022-07-25 2024-02-01 Telefonaktiebolaget Lm Ericsson (Publ) Iterative learning with adapted transmission and reception
US11948096B2 (en) 2020-03-13 2024-04-02 International Business Machines Corporation Adaptively adjusting influence in federated learning model updates
CN119254240A (en) * 2024-12-04 2025-01-03 江苏意源科技有限公司 A multivariate big data optimization storage method

Families Citing this family (185)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089587A1 (en) * 2016-09-26 2018-03-29 Google Inc. Systems and Methods for Communication Efficient Distributed Mean Estimation
US10769549B2 (en) * 2016-11-21 2020-09-08 Google Llc Management and evaluation of machine-learned models based on locally logged data
US10305923B2 (en) * 2017-06-30 2019-05-28 SparkCognition, Inc. Server-supported malware detection and protection
EP3518156A1 (en) * 2018-01-29 2019-07-31 Siemens Aktiengesellschaft A method for collaborative machine learning of analytical models
CN108520303A (en) * 2018-03-02 2018-09-11 阿里巴巴集团控股有限公司 Method and device for constructing a recommendation system
EP3794515A1 (en) * 2018-05-17 2021-03-24 FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. Concepts for distributed learning of neural networks and/or transmission of parameterization updates therefor
US11593634B2 (en) * 2018-06-19 2023-02-28 Adobe Inc. Asynchronously training machine learning models across client devices for adaptive intelligence
CN110738323B (en) * 2018-07-03 2022-06-28 百度在线网络技术(北京)有限公司 Method and device for establishing machine learning model based on data sharing
CN109325584B (en) * 2018-08-10 2021-06-25 深圳前海微众银行股份有限公司 Neural network-based federated modeling method, device and readable storage medium
CN109165515A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Model parameter acquisition methods, system and readable storage medium storing program for executing based on federation's study
CN109255444B (en) * 2018-08-10 2022-03-29 深圳前海微众银行股份有限公司 Federal modeling method and device based on transfer learning and readable storage medium
CN109165725B (en) * 2018-08-10 2022-03-29 深圳前海微众银行股份有限公司 Neural network federal modeling method, equipment and storage medium based on transfer learning
CN109145984B (en) * 2018-08-20 2022-03-25 联想(北京)有限公司 Method and apparatus for machine training
US11244242B2 (en) 2018-09-07 2022-02-08 Intel Corporation Technologies for distributing gradient descent computation in a heterogeneous multi-access edge computing (MEC) networks
CN109460826A (en) * 2018-10-31 2019-03-12 北京字节跳动网络技术有限公司 For distributing the method, apparatus and model modification system of data
CN109657055A (en) * 2018-11-09 2019-04-19 中山大学 Title party article detection method and federal learning strategy based on level hybrid network
US11989634B2 (en) * 2018-11-30 2024-05-21 Apple Inc. Private federated learning with protection against reconstruction
US11610110B2 (en) 2018-12-05 2023-03-21 Bank Of America Corporation De-conflicting data labeling in real time deep learning systems
WO2020115273A1 (en) * 2018-12-07 2020-06-11 Telefonaktiebolaget Lm Ericsson (Publ) Predicting network communication performance using federated learning
CN109711556B (en) * 2018-12-24 2020-11-03 中国南方电网有限责任公司 Machine patrol data processing method and device, network-level server and provincial-level server
US11138327B2 (en) 2018-12-27 2021-10-05 Industrial Technology Research Institute Privacy data integration method and server
KR102247322B1 (en) * 2018-12-28 2021-05-03 연세대학교 산학협력단 Method for Operating Machine Learning Based Federated Distillation, Web Server and Terminal
ES2870706T3 (en) 2019-01-11 2021-10-27 Advanced New Technologies Co Ltd A multi-part distributed security model training framework for privacy protection
WO2020147965A1 (en) * 2019-01-18 2020-07-23 Huawei Technologies Co., Ltd. Enhanced privacy federated learning system
CN111612153B (en) * 2019-02-22 2024-06-14 华为技术有限公司 Method and device for training model
WO2020180424A1 (en) * 2019-03-04 2020-09-10 Iocurrents, Inc. Data compression and communication using machine learning
CN111680798A (en) * 2019-03-11 2020-09-18 人工智能医生股份有限公司 Joint learning model system and method, apparatus, and computer-readable storage medium
FR3094109A1 (en) 2019-03-21 2020-09-25 Roofstreet Process and system for processing digital data from connected equipment while ensuring data security and protection of privacy
US12275146B2 (en) * 2019-04-01 2025-04-15 Nvidia Corporation Simulation of tasks using neural networks
CN112166445B (en) * 2019-04-16 2025-02-25 华为技术有限公司 Joint learning method and joint learning device based on blockchain network
WO2020229684A1 (en) * 2019-05-16 2020-11-19 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Concepts for federated learning, client classification and training data similarity measurement
EP3742669B1 (en) 2019-05-20 2023-11-08 Nokia Technologies Oy Machine learning in radio access networks
US11227187B1 (en) 2019-05-23 2022-01-18 Augustus Intelligence Inc. Generating artificial intelligence solutions using raw data and simulated data
US12265888B1 (en) * 2019-05-31 2025-04-01 United Services Automobile Association (Usaa) Edge computing for machine learning
CN110288094B (en) * 2019-06-10 2020-12-18 深圳前海微众银行股份有限公司 Model parameter training method and device based on federated learning
US11562228B2 (en) 2019-06-12 2023-01-24 International Business Machines Corporation Efficient verification of machine learning applications
US11983608B2 (en) 2019-06-12 2024-05-14 International Business Machines Corporation Efficient verification of machine learning applications
US11694110B2 (en) 2019-06-12 2023-07-04 International Business Machines Corporation Aggregated machine learning verification for database
CN110348241B (en) * 2019-07-12 2021-08-03 之江实验室 A multi-center collaborative prognosis prediction system under a data sharing strategy
US11494671B2 (en) 2019-07-14 2022-11-08 Olivia Karen Grabmaier Precision hygiene using reinforcement learning
EP3767511B1 (en) 2019-07-19 2021-08-25 Siemens Healthcare GmbH Securely performing parameter data updates
CN110443378B (en) * 2019-08-02 2023-11-03 深圳前海微众银行股份有限公司 Feature correlation analysis method, device and readable storage medium in federated learning
US10803184B2 (en) * 2019-08-09 2020-10-13 Alibaba Group Holding Limited Generation of a model parameter
US12093837B2 (en) 2019-08-09 2024-09-17 International Business Machines Corporation Building a federated learning framework
US20220294606A1 (en) * 2019-08-16 2022-09-15 Telefonaktiebolaget Lm Ericsson (Publ) Methods, apparatus and machine-readable media relating to machine-learning in a communication network
US20220292398A1 (en) * 2019-08-16 2022-09-15 Telefonaktiebolaget Lm Ericsson (Publ) Methods, apparatus and machine-readable media relating to machine-learning in a communication network
CN110380917B (en) * 2019-08-26 2022-01-07 深圳前海微众银行股份有限公司 Control method and device of federal learning system, terminal equipment and storage medium
CN110503207A (en) * 2019-08-28 2019-11-26 深圳前海微众银行股份有限公司 Federal learning credit management method, device, equipment and readable storage medium
CN110674528B (en) * 2019-09-20 2024-04-09 深圳前海微众银行股份有限公司 Federal learning privacy data processing method, device, system and storage medium
CN110795477A (en) * 2019-09-20 2020-02-14 平安科技(深圳)有限公司 Data training method, device and system
US11836615B2 (en) * 2019-09-20 2023-12-05 International Business Machines Corporation Bayesian nonparametric learning of neural networks
US12131258B2 (en) * 2019-09-24 2024-10-29 Qualcomm Incorporated Joint pruning and quantization scheme for deep neural networks
CN110601814B (en) * 2019-09-24 2021-08-27 深圳前海微众银行股份有限公司 Federal learning data encryption method, device, equipment and readable storage medium
EP3798934A1 (en) * 2019-09-27 2021-03-31 Siemens Healthcare GmbH Method and system for scalable and decentralized incremental machine learning which protects data privacy
US20220351039A1 (en) * 2019-10-04 2022-11-03 Telefonaktiebolaget Lm Ericsson (Publ) Federated learning using heterogeneous model types and architectures
WO2021070189A1 (en) * 2019-10-07 2021-04-15 Telefonaktiebolaget Lm Ericsson (Publ) Moderator for federated learning
WO2021071399A1 (en) * 2019-10-09 2021-04-15 Telefonaktiebolaget Lm Ericsson (Publ) Developing machine-learning models
US20210117829A1 (en) * 2019-10-16 2021-04-22 International Business Machines Corporation Learning pattern dictionary from noisy numerical data in distributed networks
US20210125105A1 (en) * 2019-10-23 2021-04-29 The United States Of America, As Represented By The Secretary Of The Navy System and Method for Interest-focused Collaborative Machine Learning
CN110837527B (en) * 2019-11-14 2022-03-22 深圳市超算科技开发有限公司 A security application method and system of a machine learning model
CN110909865B (en) * 2019-11-18 2022-08-30 福州大学 Federated learning method based on hierarchical tensor decomposition in edge calculation
CN110995793B (en) * 2019-11-19 2022-07-05 北京奇艺世纪科技有限公司 Information flow control element updating system, method and device
US11461593B2 (en) 2019-11-26 2022-10-04 International Business Machines Corporation Federated learning of clients
CN110990870A (en) * 2019-11-29 2020-04-10 上海能塔智能科技有限公司 Operation and maintenance, processing method, device, equipment and medium using model library
WO2021111456A1 (en) * 2019-12-05 2021-06-10 Telefonaktiebolaget Lm Ericsson (Publ) Moderator for identifying deficient nodes in federated learning
US11588621B2 (en) 2019-12-06 2023-02-21 International Business Machines Corporation Efficient private vertical federated learning
CN111062044B (en) * 2019-12-09 2021-03-23 支付宝(杭州)信息技术有限公司 Model joint training method and device based on block chain
SE545545C2 (en) * 2019-12-12 2023-10-17 Assa Abloy Ab Device and method for processing an input media feed for monitoring a person using an artificial intelligence (AI) engine
EP4073714A1 (en) * 2019-12-13 2022-10-19 Qualcomm Technologies, Inc. Federated mixture models
US20230010095A1 (en) 2019-12-18 2023-01-12 Telefonaktiebolaget Lm Ericsson (Publ) Methods for cascade federated learning for telecommunications network performance and related apparatus
US11941520B2 (en) * 2020-01-09 2024-03-26 International Business Machines Corporation Hyperparameter determination for a differentially private federated learning process
CN110874650B (en) * 2020-01-16 2020-04-24 支付宝(杭州)信息技术有限公司 Alliance learning method, device and system fusing public domain data and private data
WO2021144803A1 (en) * 2020-01-16 2021-07-22 Telefonaktiebolaget Lm Ericsson (Publ) Context-level federated learning
WO2021149845A1 (en) * 2020-01-21 2021-07-29 연세대학교 산학협력단 Federated distillation-based learning driving method, learning driving server, and learning driving terminal
GB2591496A (en) * 2020-01-30 2021-08-04 Vision Semantics Ltd De-centralised learning for re-identification
EP4100892A4 (en) * 2020-02-03 2024-03-13 Intel Corporation DISTRIBUTED LEARNING SYSTEMS AND METHODS FOR WIRELESS EDGE DYNAMICS
CN111310932B (en) * 2020-02-10 2024-09-27 深圳前海微众银行股份有限公司 Method, device, equipment and readable storage medium for optimizing transverse federal learning system
CN111651263B (en) * 2020-02-12 2023-10-13 北京小米移动软件有限公司 Resource processing method and device of mobile terminal, computer equipment and storage medium
WO2021161069A1 (en) * 2020-02-12 2021-08-19 Telefonaktiebolaget Lm Ericsson (Publ) Method and system for privacy preserving information exchange
CN113312543B (en) * 2020-02-27 2024-11-01 华为技术有限公司 Personalized model training method based on joint learning, electronic equipment and medium
US12045695B2 (en) * 2020-02-28 2024-07-23 Shanghai United Imaging Intelligence Co., Ltd. System and methods for privacy preserving cross-site federated learning
CN113326946A (en) * 2020-02-29 2021-08-31 华为技术有限公司 Method, device and storage medium for updating application recognition model
EP4115360A4 (en) * 2020-03-02 2023-06-28 Telefonaktiebolaget Lm Ericsson (Publ) Synthetic data generation in federated learning systems
CN113496291A (en) * 2020-03-18 2021-10-12 索尼公司 Apparatus, method and storage medium for federal learning
CN113497785B (en) * 2020-03-20 2023-05-12 深信服科技股份有限公司 Malicious encryption traffic detection method, system, storage medium and cloud server
US20210312336A1 (en) * 2020-04-03 2021-10-07 International Business Machines Corporation Federated learning of machine learning model features
US11645538B2 (en) * 2020-04-17 2023-05-09 Applied Engineering Concepts, Inc. Physical layer authentication of electronic communication networks
WO2021213667A1 (en) * 2020-04-24 2021-10-28 Huawei Technologies Co., Ltd. Devices, methods, and system for heterogeneous data-adaptive federated learning
KR102544531B1 (en) * 2020-04-27 2023-06-16 한국전자기술연구원 Federated learning system and method
CN111538598B (en) * 2020-04-29 2024-11-08 深圳前海微众银行股份有限公司 Federated learning modeling method, device, equipment and readable storage medium
CN111553484B (en) * 2020-04-30 2023-09-08 同盾控股有限公司 Federal learning method, device and system
CN111553483B (en) * 2020-04-30 2024-03-29 同盾控股有限公司 Federal learning method, device and system based on gradient compression
KR20210138994A (en) * 2020-05-13 2021-11-22 삼성전자주식회사 Electric apparatus control method for Federated learning and Electric apparatus
CN111553486B (en) * 2020-05-14 2025-02-18 深圳前海微众银行股份有限公司 Information transmission method, device, equipment and computer readable storage medium
CN111582503B (en) * 2020-05-14 2025-07-11 深圳前海微众银行股份有限公司 Information transmission method, device, equipment and computer readable storage medium
US11755951B2 (en) * 2020-05-15 2023-09-12 Vmware, Inc. Machine learning with an intelligent continuous learning service in a big data environment
CN112036580B (en) * 2020-05-15 2022-05-20 支付宝(杭州)信息技术有限公司 Method and device for league learning and league learning system
EP4158558A4 (en) * 2020-06-01 2024-06-05 Intel Corporation FEDERATED LEARNING OPTIMIZATIONS
EP4118590A1 (en) 2020-06-05 2023-01-18 Google LLC Server efficient enhancement of privacy in federated learning
US11664033B2 (en) 2020-06-15 2023-05-30 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
CN111678696A (en) * 2020-06-17 2020-09-18 南昌航空大学 A Machine Intelligence Fault Diagnosis Method Based on Federated Learning
WO2021261611A1 (en) * 2020-06-23 2021-12-30 엘지전자 주식회사 Method and device for performing federated learning in wireless communication system
DE102020117638A1 (en) 2020-07-03 2022-01-05 Bayerische Motoren Werke Aktiengesellschaft Computer-implemented method and system for validating a sensor-based vehicle function
WO2022014732A1 (en) * 2020-07-14 2022-01-20 엘지전자 주식회사 Method and apparatus for performing federated learning in wireless communication system
CN111898484A (en) * 2020-07-14 2020-11-06 华中科技大学 Method, apparatus, readable storage medium, and electronic device for generating a model
CN111856934B (en) * 2020-07-16 2022-11-15 南京大量数控科技有限公司 Federal learning data processing algorithm between isomorphic intelligent workshops
US20230297844A1 (en) * 2020-07-17 2023-09-21 Telefonaktiebolaget Lm Ericsson (Publ) Federated learning using heterogeneous labels
CN113988254B (en) * 2020-07-27 2023-07-14 腾讯科技(深圳)有限公司 Method and device for determining neural network model for multiple environments
US12136034B2 (en) 2020-07-31 2024-11-05 Microsoft Technology Licensing, Llc Dynamic gradient aggregation for training neural networks
CN111882133B (en) * 2020-08-03 2022-02-01 重庆大学 Prediction-based federated learning communication optimization method and system
US12346808B2 (en) * 2020-08-06 2025-07-01 Nec Corporation Federated learning for anomaly detection
CN111967609B (en) * 2020-08-14 2021-08-06 深圳前海微众银行股份有限公司 Model parameter verification method, device and readable storage medium
CN111970277B (en) * 2020-08-18 2022-09-27 中国工商银行股份有限公司 Flow identification method and device based on federal learning
US11909482B2 (en) * 2020-08-18 2024-02-20 Qualcomm Incorporated Federated learning for client-specific neural network parameter generation for wireless communication
US12236370B2 (en) 2020-08-24 2025-02-25 Samsung Electronics Co., Ltd Method and apparatus for federated learning
EP4165828A4 (en) 2020-09-03 2023-11-29 Samsung Electronics Co., Ltd. WIRELESS COMMUNICATION METHODS AND NETWORKS FOR MANAGING A DATA-DRIVEN MODEL
US11620583B2 (en) * 2020-09-08 2023-04-04 International Business Machines Corporation Federated machine learning using locality sensitive hashing
US12131231B2 (en) 2020-09-16 2024-10-29 International Business Machines Corporation Federated learning technique for applied machine learning
WO2022060264A1 (en) * 2020-09-18 2022-03-24 Telefonaktiebolaget Lm Ericsson (Publ) Methods and systems for updating machine learning models
US11914678B2 (en) 2020-09-23 2024-02-27 International Business Machines Corporation Input encoding for classifier generalization
US12393869B2 (en) 2020-09-24 2025-08-19 Electronics And Telecommunications Research Institute Distributed training method between terminal and edge cloud server
US20220101204A1 (en) * 2020-09-25 2022-03-31 Qualcomm Incorporated Machine learning component update reporting in federated learning
CN116324820A (en) * 2020-09-28 2023-06-23 高通股份有限公司 Federated Machine Learning with Induced Sparsity
US12061956B1 (en) 2020-09-29 2024-08-13 Amazon Technologies, Inc. Federated learning service in a provider network and training machine learning models using devices external to the provider network
TWI775170B (en) * 2020-09-30 2022-08-21 新漢股份有限公司 Method for cpu to execute artificial intelligent related processes
JP7647048B2 (en) * 2020-10-02 2025-03-18 住友ゴム工業株式会社 tire
CN112235062A (en) * 2020-10-10 2021-01-15 中国科学技术大学 A Federated Learning Method and System Against Communication Noise
CN112307331B (en) * 2020-10-14 2023-11-24 湖南天河国云科技有限公司 Intelligent recruitment information pushing method, system and terminal equipment for college graduates based on blockchain
CN112232519B (en) * 2020-10-15 2024-01-09 成都数融科技有限公司 Joint modeling method based on federal learning
US12356219B2 (en) * 2020-10-15 2025-07-08 Qualcomm Incorporated Update resolution signaling in federated learning
CN112329947A (en) * 2020-10-28 2021-02-05 广州中国科学院软件应用技术研究所 A Federated Learning Incentive Method and System Based on Differential Evolution
WO2022089748A1 (en) * 2020-10-29 2022-05-05 Telefonaktiebolaget Lm Ericsson (Publ) Energy aware communication identification in telecommunications network
CN112288101A (en) * 2020-10-29 2021-01-29 平安科技(深圳)有限公司 GBDT and LR fusion method, device, equipment and storage medium based on federal learning
CN112418446B (en) * 2020-11-18 2024-04-09 脸萌有限公司 Model processing method, system, device, medium and electronic equipment
CN112464278B (en) * 2020-11-24 2023-07-21 平安科技(深圳)有限公司 Federal modeling method based on non-uniformly distributed data and related equipment
EP4009220A1 (en) * 2020-12-03 2022-06-08 Fujitsu Limited Method and apparatus for decentralized supervised learning in nlp applications
US20220180251A1 (en) * 2020-12-03 2022-06-09 Qualcomm Incorporated Sidelink-assisted update aggregation in federated learning
CN112734050B (en) * 2020-12-11 2025-05-23 平安科技(深圳)有限公司 Text model training method, recognition method, device, equipment and storage medium
US12182771B2 (en) 2020-12-15 2024-12-31 International Business Machines Corporation Federated learning for multi-label classification model for oil pump management
US11847504B2 (en) 2020-12-16 2023-12-19 Nexcom International Co., Ltd. Method for CPU to execute artificial intelligence related processes
EP4248378A2 (en) * 2020-12-21 2023-09-27 Huawei Technologies Co., Ltd. System and method of federated learning with diversified feedback
CA3143855A1 (en) * 2020-12-30 2022-06-30 Atb Financial Systems and methods for federated learning on blockchain
CN112814854B (en) * 2020-12-31 2022-04-29 新智数字科技有限公司 Joint learning-based turbine fan maintenance method and device
CN112819177B (en) * 2021-01-26 2022-07-12 支付宝(杭州)信息技术有限公司 Personalized privacy protection learning method, device and equipment
US11017322B1 (en) * 2021-01-28 2021-05-25 Alipay Labs (singapore) Pte. Ltd. Method and system for federated learning
WO2022162677A1 (en) * 2021-01-29 2022-08-04 Telefonaktiebolaget Lm Ericsson (Publ) Distributed machine learning with new labels using heterogeneous label distribution
CN116830122A (en) * 2021-02-02 2023-09-29 三星电子株式会社 Method, system and apparatus for joint learning
US20220261697A1 (en) * 2021-02-15 2022-08-18 Devron Corporation Federated learning platform and methods for using same
US12147879B2 (en) 2021-02-22 2024-11-19 International Business Machines Corporation Federated learning with dataset sketch commitment based malicious participant identification
US11711348B2 (en) 2021-02-22 2023-07-25 Begin Ai Inc. Method for maintaining trust and credibility in a federated learning environment
US12229280B2 (en) * 2021-03-16 2025-02-18 Accenture Global Solutions Limited Privacy preserving cooperative learning in untrusted environments
CN112966832B (en) * 2021-03-31 2022-10-11 上海嗨普智能信息科技股份有限公司 Federated Learning System Based on Multi-Server
CN114118180B (en) * 2021-04-02 2025-04-18 京东科技控股股份有限公司 Clustering method, device, electronic device and storage medium
US20220335312A1 (en) * 2021-04-15 2022-10-20 EMC IP Holding Company LLC System and method for distributed model adaptation
CN113255928B (en) * 2021-04-29 2022-07-05 支付宝(杭州)信息技术有限公司 Model training method and device and server
EP4083838A1 (en) * 2021-04-30 2022-11-02 Hochschule Karlsruhe Method and system to collaboratively train data analytics model parameters
US20240223407A1 (en) * 2021-05-21 2024-07-04 Lg Electronics Inc. Method and device for performing federated learning in wireless communication system
US12033074B2 (en) * 2021-05-25 2024-07-09 International Business Machines Corporation Vertical federated learning with compressed embeddings
CN113361598B (en) * 2021-06-04 2022-10-11 重庆大学 Model training method based on distributed learning, server and distributed system
US12231490B2 (en) * 2021-06-09 2025-02-18 Intel Corporation Uses of coded data at multi-access edge computing server
CN115526365A (en) * 2021-06-24 2022-12-27 中兴通讯股份有限公司 Index optimization method, server and computer-readable storage medium
KR102620697B1 (en) * 2021-07-12 2024-01-02 주식회사 카카오뱅크 Method and device for determining transfer information in messages using deep learning based natural language processing
CN113537512B (en) * 2021-07-15 2024-03-15 卡奥斯工业智能研究院(青岛)有限公司 Model training method, device, system, equipment and medium based on federal learning
US20230027145A1 (en) * 2021-07-22 2023-01-26 EMC IP Holding Company LLC K-quant gradient compressor for federated learning
US11443245B1 (en) 2021-07-22 2022-09-13 Alipay Labs (singapore) Pte. Ltd. Method and system for federated adversarial domain adaptation
WO2023009803A1 (en) 2021-07-30 2023-02-02 Epiphany Systems, Inc. Graphics processing unit optimization
EP4377855A1 (en) 2021-07-30 2024-06-05 Reveald Holdings, Inc. Systems and methods for applying reinforcement learning to cybersecurity graphs
WO2023022251A1 (en) * 2021-08-18 2023-02-23 엘지전자 주식회사 Method and apparatus for transmitting signal in wireless communication system
US12400104B1 (en) * 2021-09-30 2025-08-26 Amazon Technologies, Inc. Updating machine learning models
KR102358186B1 (en) * 2021-10-05 2022-02-08 주식회사 도어오픈 Apparatus and method for generating virtual human interaction based on artificial intelligence
US11829239B2 (en) 2021-11-17 2023-11-28 Adobe Inc. Managing machine learning model reconstruction
US12361332B2 (en) * 2021-12-07 2025-07-15 Capital One Services, Llc Systems and methods for federated learning optimization via cluster feedback
WO2023113401A1 (en) * 2021-12-17 2023-06-22 주식회사 하렉스인포텍 Training method using user-centered artificial intelligence
JP7605097B2 (en) 2021-12-23 2024-12-24 トヨタ自動車株式会社 Information processing method, information processing device, and server device
KR102803211B1 (en) * 2021-12-28 2025-04-30 고려대학교 산학협력단 Method and apparatus for scalable training deep neural network
CN114492837A (en) * 2022-01-18 2022-05-13 京东科技信息技术有限公司 Federated model training method and device
DE102022000473A1 (en) 2022-02-08 2023-08-10 Mercedes-Benz Group AG Method for training neural networks set up to control a vehicle function and a vehicle and system for carrying out the method
US20230325652A1 (en) * 2022-04-06 2023-10-12 Qualcomm Incorporated Gradient grouping for compression in federated learning for machine learning models
US20250086474A1 (en) * 2022-12-12 2025-03-13 Rakuten Mobile, Inc. Collaborative training with buffered activations
US20240265269A1 (en) * 2023-02-06 2024-08-08 Google Llc System(s) and method(s) to reduce a transferable size of language model(s) to enable decentralized learning thereof
US12067041B1 (en) 2023-10-06 2024-08-20 Armada Systems, Inc. Time series data to statistical natural language interaction
US12086557B1 (en) 2023-10-06 2024-09-10 Armada Systems, Inc. Natural language statistical model with alerts
US12141541B1 (en) 2023-10-06 2024-11-12 Armada Systems, Inc. Video to narration
US11960515B1 (en) 2023-10-06 2024-04-16 Armada Systems, Inc. Edge computing units for operating conversational tools at local sites
US11995412B1 (en) 2023-10-06 2024-05-28 Armada Systems, Inc. Video based question and answer

Family Cites Families (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6708163B1 (en) 1999-02-24 2004-03-16 Hillol Kargupta Collective data mining from distributed, vertically partitioned feature space
US6879944B1 (en) 2000-03-07 2005-04-12 Microsoft Corporation Variational relevance vector machine
EP1326189A3 (en) * 2001-12-12 2005-08-17 Microsoft Corporation Controls and displays for acquiring preferences, inspecting behaviour, and guiding the learning and decision policies of an adaptive communications prioritization and routing systems
US7016529B2 (en) * 2002-03-15 2006-03-21 Microsoft Corporation System and method facilitating pattern recognition
US7069256B1 (en) 2002-05-23 2006-06-27 Oracle International Corporation Neural network module for data mining
US7107207B2 (en) * 2002-06-19 2006-09-12 Microsoft Corporation Training machine learning by sequential conditional generalized iterative scaling
US6687653B1 (en) 2002-08-13 2004-02-03 Xerox Corporation Systems and methods for distributed algorithm for optimization-based diagnosis
US20050138571A1 (en) 2003-12-18 2005-06-23 Keskar Dhananjay V. Dynamic detection of device characteristics
US7664249B2 (en) 2004-06-30 2010-02-16 Microsoft Corporation Methods and interfaces for probing and understanding behaviors of alerting and filtering systems based on models and simulation from logs
US20060224579A1 (en) 2005-03-31 2006-10-05 Microsoft Corporation Data mining techniques for improving search engine relevance
US20080209031A1 (en) 2007-02-22 2008-08-28 Inventec Corporation Method of collecting and managing computer device information
EP2281364B1 (en) 2008-05-30 2012-08-08 Telecom Italia S.p.A. Method and devices for multicast distribution optimization
US20100132044A1 (en) 2008-11-25 2010-05-27 International Business Machines Corporation Computer Method and Apparatus Providing Brokered Privacy of User Data During Searches
US8239396B2 (en) 2009-03-20 2012-08-07 Oracle International Corporation View mechanism for data security, privacy and utilization
US8018874B1 (en) 2009-05-06 2011-09-13 Hrl Laboratories, Llc Network optimization system implementing distributed particle swarm optimization
TWI396105B (en) 2009-07-21 2013-05-11 Univ Nat Taiwan Digital data processing method for personalized information retrieval and computer readable storage medium and information retrieval system thereof
JP5584914B2 (en) 2010-07-15 2014-09-10 株式会社日立製作所 Distributed computing system
US8612368B2 (en) 2011-03-01 2013-12-17 International Business Machines Corporation Systems and methods for processing machine learning algorithms in a MapReduce environment
US8954357B2 (en) 2011-05-12 2015-02-10 Xerox Corporation Multi-task machine learning using features bagging and local relatedness in the instance space
US8898096B2 (en) 2011-05-31 2014-11-25 Oracle International Corporation Application configuration generation
US8429103B1 (en) 2012-06-22 2013-04-23 Google Inc. Native machine learning service for user adaptation on a mobile platform
US9390370B2 (en) 2012-08-28 2016-07-12 International Business Machines Corporation Training deep neural network acoustic models using distributed hessian-free optimization
US9093069B2 (en) 2012-11-05 2015-07-28 Nuance Communications, Inc. Privacy-sensitive speech model creation via aggregation of multiple user models
US9275398B1 (en) 2012-12-10 2016-03-01 A9.Com, Inc. Obtaining metrics for client-side display of content
CN103093445B (en) * 2013-01-17 2015-04-08 西安电子科技大学 Unified feature space image super-resolution reconstruction method based on joint sparse constraint
US9390383B2 (en) 2013-01-28 2016-07-12 Georges Harik Method for an optimizing predictive model using gradient descent and conjugate residuals
US9190055B1 (en) 2013-03-14 2015-11-17 Amazon Technologies, Inc. Named entity recognition with personalized models
US9400955B2 (en) * 2013-12-13 2016-07-26 Amazon Technologies, Inc. Reducing dynamic range of low-rank decomposition matrices
US9734457B2 (en) 2013-12-31 2017-08-15 Cisco Technology, Inc. Learning data processor for distributing learning machines across large-scale network infrastructures
US9426040B2 (en) 2014-01-06 2016-08-23 Cisco Technology, Inc. Mixed distributed/centralized routing techniques based on closed-loop feedback from a learning machine to avoid dark zones
US9563854B2 (en) 2014-01-06 2017-02-07 Cisco Technology, Inc. Distributed model training
US20150242760A1 (en) 2014-02-21 2015-08-27 Microsoft Corporation Personalized Machine Learning System
US20150324690A1 (en) * 2014-05-08 2015-11-12 Microsoft Corporation Deep Learning Training System
US9336483B1 (en) 2015-04-03 2016-05-10 Pearson Education, Inc. Dynamically updated neural network structures for content distribution networks
CN105528620B (en) * 2015-12-11 2019-12-06 苏州大学 method and system for combined robust principal component feature learning and visual classification
US11922313B2 (en) * 2016-02-11 2024-03-05 William Marsh Rice University Partitioned machine learning architecture
US10649794B2 (en) * 2016-08-11 2020-05-12 Twitter, Inc. Aggregate features for machine learning
US11188821B1 (en) * 2016-09-15 2021-11-30 X Development Llc Control policies for collective robot learning
US20180075347A1 (en) * 2016-09-15 2018-03-15 Microsoft Technology Licensing, Llc Efficient training of neural networks
US20180089587A1 (en) * 2016-09-26 2018-03-29 Google Inc. Systems and Methods for Communication Efficient Distributed Mean Estimation

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210382962A1 (en) * 2015-10-16 2021-12-09 Google Llc Systems and Methods of Distributed Optimization
US10902089B2 (en) * 2017-05-22 2021-01-26 United States Of America As Represented By The Secretary Of The Air Force Method for predicting stochastic output performance or scaling stochastic inputs
US20180336166A1 (en) * 2017-05-22 2018-11-22 Government Of The United States, As Represented By The Secretary Of The Air Force Method for Predicting Stochastic Output Performance or Scaling Stochastic Inputs
CN109034398A (en) * 2018-08-10 2018-12-18 深圳前海微众银行股份有限公司 Feature selection approach, device and storage medium based on federation's training
CN109165683A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Sample predictions method, apparatus and storage medium based on federation's training
US11455572B2 (en) * 2018-08-24 2022-09-27 Servicenow (Canada) Inc. Machine learning model hardware configuration based optimization
US11651221B2 (en) 2018-10-31 2023-05-16 EMC IP Holding Company LLC Method, device, and computer program product for deep learning
CN111126613A (en) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 Methods, apparatus and computer program products for deep learning
US20210357723A1 (en) * 2018-11-06 2021-11-18 Nippon Telegraph And Telephone Corporation Distributed Processing System and Distributed Processing Method
US11562046B2 (en) 2018-11-26 2023-01-24 Samsung Electronics Co., Ltd. Neural network processor using dyadic weight matrix and operation method thereof
KR20200062014A (en) * 2018-11-26 2020-06-03 삼성전자주식회사 Apparatus for accelerating neural network using weight with dyadic matrix form and operation method thereof
KR102811042B1 (en) 2018-11-26 2025-05-23 삼성전자주식회사 Apparatus for accelerating neural network using weight with dyadic matrix form and operation method thereof
US11947680B2 (en) * 2018-12-28 2024-04-02 Webank Co., Ltd Model parameter training method, terminal, and system based on federation learning, and medium
US20210248244A1 (en) * 2018-12-28 2021-08-12 Webank Co., Ltd Model parameter training method, terminal, and system based on federation learning, and medium
WO2020134704A1 (en) * 2018-12-28 2020-07-02 深圳前海微众银行股份有限公司 Model parameter training method based on federated learning, terminal, system and medium
WO2020188004A1 (en) * 2019-03-18 2020-09-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Methods and apparatuses for compressing parameters of neural networks
EP3942700A1 (en) * 2019-03-18 2022-01-26 FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. Methods and apparatuses for compressing parameters of neural networks
WO2020225772A1 (en) * 2019-05-07 2020-11-12 Imagia Cybernetics Inc. Method and system for initializing a neural network
CN110263936A (en) * 2019-06-14 2019-09-20 深圳前海微众银行股份有限公司 Laterally federation's learning method, device, equipment and computer storage medium
US11681913B2 (en) 2019-08-02 2023-06-20 Samsung Electronics Co., Ltd. Method and system with neural network model updating
CN110472745A (en) * 2019-08-06 2019-11-19 深圳前海微众银行股份有限公司 Information transferring method and device in a kind of federal study
US11551083B2 (en) 2019-12-17 2023-01-10 Soundhound, Inc. Neural network training from private data
GB202001468D0 (en) 2020-02-04 2020-03-18 Tom Tom Navigation B V Navigation system
WO2021156363A1 (en) 2020-02-04 2021-08-12 Tomtom Navigation B.V. Navigation system
CN111324813A (en) * 2020-02-20 2020-06-23 深圳前海微众银行股份有限公司 Recommended method, apparatus, apparatus, and computer-readable storage medium
CN111352799A (en) * 2020-02-20 2020-06-30 中国银联股份有限公司 Inspection method and device
US11948096B2 (en) 2020-03-13 2024-04-02 International Business Machines Corporation Adaptively adjusting influence in federated learning model updates
US11438348B2 (en) 2020-03-27 2022-09-06 Interset Software, Inc. Efficient determination of expected maximum for anomaly detection
US20210383197A1 (en) * 2020-06-04 2021-12-09 EMC IP Holding Company LLC Adaptive stochastic learning state compression for federated learning in infrastructure domains
US20220391664A1 (en) * 2021-06-03 2022-12-08 Samsung Electronics Co., Ltd. Method and apparatus with neural network quantization
WO2024007156A1 (en) * 2022-07-05 2024-01-11 华为技术有限公司 Communication method, and apparatus
WO2024025444A1 (en) * 2022-07-25 2024-02-01 Telefonaktiebolaget Lm Ericsson (Publ) Iterative learning with adapted transmission and reception
CN119254240A (en) * 2024-12-04 2025-01-03 江苏意源科技有限公司 A multivariate big data optimization storage method

Also Published As

Publication number Publication date
EP3660754B1 (en) 2023-11-01
CN113837357B (en) 2025-04-29
CN113837357A (en) 2021-12-24
GB2556981A (en) 2018-06-13
GB201715517D0 (en) 2017-11-08
WO2018057302A1 (en) 2018-03-29
DE202017105829U1 (en) 2018-01-02
US10657461B2 (en) 2020-05-19
US20230376856A1 (en) 2023-11-23
EP3660754A1 (en) 2020-06-03
US20200242514A1 (en) 2020-07-30
US20190340534A1 (en) 2019-11-07
US11763197B2 (en) 2023-09-19
DE102017122240A1 (en) 2018-03-29
EP4276711A3 (en) 2024-01-17
CN107871160B (en) 2021-09-10
CN107871160A (en) 2018-04-03
EP4276711A2 (en) 2023-11-15
EP3494522A1 (en) 2019-06-12
EP3494522B1 (en) 2020-01-08

Similar Documents

Publication Publication Date Title
US20180089587A1 (en) Systems and Methods for Communication Efficient Distributed Mean Estimation
US12219004B2 (en) Systems and methods for communication efficient distributed mean estimation
Amiri et al. Over-the-air machine learning at the wireless edge
US11023561B2 (en) Systems and methods of distributed optimization
Amiri et al. Federated learning over wireless fading channels
Mills et al. Communication-efficient federated learning for wireless edge intelligence in IoT
Wang et al. Spatiotemporal modeling and prediction in cellular networks: A big data enabled deep learning approach
US20230196205A1 (en) Distributed artificial intelligence system using transmission of compressed gradients and model parameter, and learning apparatus and method therefor
Oh et al. FedVQCS: Federated learning via vector quantized compressed sensing
CN116128070B (en) Federated learning method based on wireless over-the-air computing and multi-bit quantized compressed sensing
Yao et al. A color image compression and encryption algorithm combining compressed sensing, sudoku matrix, and hyperchaotic map
Wang et al. A data augmentation-based channel estimation scheme for intelligent reflecting surface assisted wireless communication system
Prakash et al. Squafl: Sketch-quantization inspired communication efficient federated learning
Hu et al. Communication-efficient federated learning in channel constrained Internet of Things
Lin et al. Channel-adaptive quantization for wireless federated learning
Yang et al. Lightweight decentralized federated learning framework for heterogeneous edge systems
Zhen et al. A Secure and Effective Energy-Aware Fixed-Point Quantization Scheme for Asynchronous Federated Learning.
Ye et al. End-to-end Physical Layer Optimization Scheme Based on Deep Learning Autoencoder
Bo et al. Distributed Image Semantic Communication via Nonlinear Transform Coding
US20240281564A1 (en) Private Federated Learning with Reduced Communication Cost
US12072987B1 (en) Private counting from anonymous messages: near-optimal accuracy with vanishing communication overhead
CN116527217A (en) A data decoding method, system and related equipment
Chen et al. Quantization for Federated Learning
WO2025168212A1 (en) Apparatus, system and method for compressed communication of distributed machine learning
Gez et al. A Masked Pruning-Based Algorithm for Dimensionality Reduction in Federated Learning Systems

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SURESH, ANANDA THEERTHA;KUMAR, SANJIV;MCMAHAN, HUGH BRENDAN;AND OTHERS;SIGNING DATES FROM 20170909 TO 20170911;REEL/FRAME:043791/0125

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044567/0001

Effective date: 20170929

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION