[go: up one dir, main page]

WO2021262139A1 - Modèles d'apprentissage automatique distribués - Google Patents

Modèles d'apprentissage automatique distribués Download PDF

Info

Publication number
WO2021262139A1
WO2021262139A1 PCT/US2020/038978 US2020038978W WO2021262139A1 WO 2021262139 A1 WO2021262139 A1 WO 2021262139A1 US 2020038978 W US2020038978 W US 2020038978W WO 2021262139 A1 WO2021262139 A1 WO 2021262139A1
Authority
WO
WIPO (PCT)
Prior art keywords
machine learning
training
devices
learning model
examples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2020/038978
Other languages
English (en)
Inventor
Christian Makaya
Madhu Sudan ATHREYA
Carlos HAAS COSTA
David Murphy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to PCT/US2020/038978 priority Critical patent/WO2021262139A1/fr
Publication of WO2021262139A1 publication Critical patent/WO2021262139A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Computing devices are a kind of electronic device that include electronic circuitry for performing processing. As processing capabilities have expanded, computing devices have been utilized to perform more functions. For example, a variety of computing devices are used for work, communication, and entertainment. Computing devices may be linked to a network to facilitate communication between computing devices.
  • Figure 1 is a flow diagram illustrating an example of a method for distributing a machine learning model
  • Figure 2 is a flow diagram illustrating an example of a method for distributing a machine learning model
  • Figure 3 is a block diagram of an example of an apparatus and training devices that may be used in machine learning model distribution and/or training.
  • Figure 4 is a block diagram illustrating an example of a computer- readable medium for distributing a machine learning model.
  • Machine learning is a technique where a machine learning model is trained to perform a task or tasks based on a set of examples (e.g., data).
  • training machine learning models may be computationally demanding for processors, such as central processing units (CPUs) and graphics processing units (GPUs).
  • Training a machine learning model may include determining weights corresponding to structures of the machine learning model.
  • Artificial neural networks are a kind of machine learning model that are structured with nodes, layers, and/or connections. Deep learning is a kind of machine learning that utilizes multiple layers.
  • a deep neural network is a neural network that utilizes deep learning.
  • Some examples of artificial intelligence may be implemented with machine learning.
  • Machine learning may be utilized in various products, devices, services, and/or applications.
  • Some examples of machine learning models may perform image classification, image captioning, object detection, object locating, object segmentation, audio classification, text classification, regression, sentiment analysis, recommendations, and/or predictive maintenance, etc.
  • Some examples of machine learning may be implemented using multiple devices. For instance, portions of machine learning models may be distributed and/or trained by devices that are linked to a network or networks. In some examples, distributing portions of machine learning models may spread computational loads for training and/or executing machine learning models. [0008] Communicating large amounts of data over a network for machine learning model training may be inefficient. For example, moving collected data to a centralized location (e.g., a data center or cloud server) to perform machine learning training and/or inferencing may be cost-ineffective in terms of bandwidth usage and/or may present security and privacy risks. In distributed machine learning approaches, some aspects of machine learning may be performed more efficiently when performed closer to a data source or sources.
  • Some aspects of machine learning may be performed by edge devices.
  • An edge device is a non-central device in a network topology. Examples of edge devices may include smartphones, desktop computers, tablet devices, Internet of Things (loT) devices, routers, gateways, etc. Processing data by edge devices may enhance privacy and latency.
  • Some examples of distributed machine learning may include federated learning, which may provide distributed machine learning on edge devices while preserving privacy of the data.
  • Some implementations of distributed machine learning may include a network of edge devices and a central device or devices (e.g., server(s)). In some examples, the edge devices may be diverse and heterogeneous, with different connectivity to the central device(s).
  • a central device is a device that is central in a network topology.
  • a central device may be a device that coordinates training operations of the edge devices.
  • a central device may be a device that combines and/or aggregates portions of a machine learning model that are trained by the edge devices.
  • More edge devices on a network may increase source data and computational capabilities, and may present scaling and efficiency challenges for machine learning models. Coordination between distributed devices may mitigate the scaling and/or efficiency challenges. For example, selecting devices for machine learning model training (e.g., reducing and/or minimizing a global loss function of the machine learning model) may be useful. Some examples of the techniques described herein may spread and/or reduce computation, communication, and/or storage loads for machine learning models. For instance, some examples of the techniques described herein may enable selecting devices for training a machine learning model based on an eligibility metric. An eligibility metric is a value indicating favorability for performing training. Some examples of the techniques described herein may using pruning to enhance machine learning model training and efficiency in terms of communication, computation, and/or storage. For instance, machine learning model pruning may be utilized during training procedures, thereby providing joint training and pruning the machine learning models in a distributed fashion.
  • selecting devices for machine learning model training e.g., reducing and/or minimizing a global loss function of the machine learning model
  • Figure 1 is a flow diagram illustrating an example of a method 100 for distributing a machine learning model.
  • the method 100 and/or a method 100 element or elements may be performed by an apparatus (e.g., electronic device, computing device, server, etc.).
  • the method 100 may be performed by the apparatus 302 described in connection with Figure 3.
  • the apparatus may prune 102 first weights from a machine learning model to produce a pruned model.
  • a weight is a value that scales a contribution corresponding to a component (e.g., node, connection, etc.). For instance, a weight may scale an input value to a node.
  • the term “weight” may refer to a gradient. A gradient may indicate an adjustment to a weight.
  • pruning weights from a machine learning model may include removing a weight or weights and/or a corresponding component or components (e.g., node(s), connection(s), layer(s), etc.) from the machine learning model.
  • pruning weights from a machine learning model may include removing weights and/or corresponding components with weights that meet a criterion or criteria. For instance, a proportion (e.g., rate, percentage) of the smallest weights and/or corresponding components may be removed. For example, weights and/or components with the smallest 20% (or smallest, 3%, 5%, 10%, 15%, 25%, etc., for instance) of weight amplitude or magnitude may be removed. In some approaches, weights and/or components (e.g., connections) that have a weight amplitude or magnitude that is less than a threshold may be removed.
  • a proportion e.g., rate, percentage
  • the apparatus may prune 102 the first weights from weights determined by the apparatus and/or weights received from a remote device or remote devices. For instance, the apparatus may perform a training with a relatively small amount of data that is stored locally. For instance, the apparatus may store data to perform an initial training and/or may request data (e.g., non-sensitive data) from a remote device or remote devices.
  • the apparatus may prune 102 (e.g., initially prune) weights and/or corresponding components randomly. For instance, the apparatus may randomly select weights to be removed regardless of amplitude or magnitude.
  • pruning 102 may be performed by the apparatus after receiving trained portions (e.g., weights, gradients, etc.) of a machine learning model from a remote device or devices.
  • pruning weights from a machine learning model may include determining relevance values.
  • a relevance value is a value that indicates a relevance of a weight and/or component.
  • a relevance value may indicate a degree of relevance of a weight and/or component to machine learning model inference and/or prediction.
  • the apparatus may determine relevance values by learning a relative relevance of connections in a neural network.
  • a relevance value may be determined based on multiple weights.
  • a relevance value may be determined as an aggregated sum of weights (e.g., weights based on a structure, weights related to a connection or node, a subset of machine learning model weights, etc.).
  • a relevance value may be determined based on a sparsity of the machine learning model (e.g., network).
  • pruning the machine learning model may be based on the relevance values. For instance, less relevant weights and/or components (e.g., weights and/or components with a relevance value less than a threshold and/or a proportion of weights and/or components with the smallest relevance values) may be removed. Pruning weights and/or components (e.g., connections) from the machine learning model may convert a dense neural network to a sparse neural network.
  • the pruned (e.g., sparse) machine learning model may be retrained to compensate for the removed weights and/or components, and/or may be retrained to determine refined weights.
  • the pruned machine learning model may be retrained by the apparatus and/or by a remote device or devices.
  • the apparatus may distribute 104 the pruned model and relevance values corresponding to second weights of the pruned model to remote devices.
  • the apparatus may send portions (e.g., nodes, connections, layers, weights, etc.) of the pruned model to remote devices.
  • the apparatus may transmit the nodes, connections, layers, weights, and/or relevance values to a remote device or remote devices using a wired link, a wireless link, and/or a network or networks.
  • the second weights may be weights remaining in the machine learning model after pruning 102 the first weights from the machine learning model.
  • the apparatus may send portions (e.g., subsets) of the second weights to different remote devices in some examples.
  • the relevance values corresponding to the second weights may indicate a relevance for each respective second weight.
  • the remote devices may receive the distributed pruned model.
  • the remote devices may train the distributed pruned model. For example, each remote device may train a portion of the pruned model based on data (e.g., sensor data) that is local to the remote device.
  • data e.g., sensor data
  • a remote device may utilize the relevance values. For instance, a remote device may utilize the relevance values to train a portion of the pruned model.
  • the apparatus may select the remote devices. For instance, the apparatus may select the remote devices for distributing 104 the pruned model from a set of candidate remote devices.
  • the apparatus may select the remote devices based on historical usage patterns of the remote devices.
  • a historical usage pattern is data that indicates a usage pattern for a device.
  • a historical usage pattern may indicate time periods in which a remote device is likely to be in use by an end user and/or in which a remote device is likely to have a threshold processing load (e.g., > 30% processing load, > 50% processing load, > 65% processing load, etc.).
  • the apparatus may receive usage data from a remote device or remote devices that indicate times and/or processing load. For instance, the apparatus may request and/or receive data from a remote device indicating that the remote device is likely to be in use by an end user and/or to have a threshold processing load.
  • the data may indicate that the remote device is likely to be in use and/or to have a threshold processing load from 7 am to 1 pm and from 4 pm to 10 pm on weekdays.
  • the apparatus may determine a historical usage pattern based on previous training requests and/or pruned model distributions. For example, the apparatus may record times at which previous training requests and/or model distributions were rejected by a remote device and/or times at which previous training requests were not returned within a threshold amount of time.
  • the apparatus may select a remote device or devices that are not likely to be in use or to have threshold processing load based on the historical usage pattern. For instance, if the apparatus will send a training request and/or distribute the pruned model at 6 pm, the apparatus may select remote devices that are unlikely to be in use or to have a threshold processing load at 6 pm based on the corresponding historical usage patterns.
  • the apparatus may select the remote devices based on environmental contexts.
  • An environmental context is data that indicates an environmental state for a device.
  • the environmental contexts may be based on local sensor data of the remote devices.
  • a remote device may include an environmental sensor or sensors that capture data representing the physical environment of a remote device and/or a physical relationship between the remote device and the local environment. Examples of environmental sensors may include a motion sensor, accelerometer, tilt sensor, microphone, image sensor, light sensor, pressure sensor, contact sensor, etc.
  • an environmental context may indicate the local environment of a remote device.
  • an environmental context may indicate whether a remote device is in motion, is exposed to light, is capturing audio, is being held, etc.
  • an environmental context may not include direct sensor data.
  • the environmental context may not include captured audio, images, motion patterns, etc., which may preserve privacy.
  • the environmental context may include discrete information.
  • the environmental context may include binary indicators indicating whether a remote device is in motion, is exposed to light, is capturing audio, and/or is being carried or touched, etc.
  • the apparatus may receive an environmental context from a remote device or remote devices.
  • the apparatus may request and/or receive data from a remote device indicating the environmental context of the remote device.
  • the apparatus may select a remote device or devices that meet an environmental context criterion or criteria. For instance, the apparatus may select remote devices that are not in motion, that are not capturing audio, that are not capturing images, that are not exposed to light, and/or that are not being carried or touched, etc.
  • the apparatus may select the remote devices based on training success values.
  • a training success value is data that indicates whether a device has successfully performed training. Successful training may be indicated based on a criterion or criteria. For instance, training may be successful if trained data (e.g., trained weight(s), trained gradient(s), trained machine learning model portion(s)) has been received from a remote device, if the trained data has been received from the remote device within a threshold period after a training request and/or model distribution, and/or if trained data provides a threshold change to a machine learning model (e.g., weight change, gradient size, etc.).
  • trained data e.g., trained weight(s), trained gradient(s), trained machine learning model portion(s)
  • a training success value may indicate whether a remote device has successfully performed training and/or a proportion of successful training for a training request or requests and/or model distribution(s).
  • the apparatus may determine training success values corresponding to candidate remote devices. For example, the apparatus may determine whether a candidate remote device has successfully performed training and/or a proportion of successful training by determining whether trained data (e.g., trained weight(s), trained gradient(s), trained machine learning model portion(s)) has been received from a remote device, if the trained data has been received from the remote device within a threshold period after a training request and/or model distribution, and/or if trained data has provided a threshold change to a machine learning model (e.g., weight change, gradient size, etc.).
  • trained data e.g., trained weight(s), trained gradient(s), trained machine learning model portion(s)
  • the apparatus may select a remote device or devices based on the training success values. For instance, the apparatus may select a remote device with a positive training success value and/or with a training success value that satisfies a success threshold (e.g., 50% success, 60% success, 75% success, etc.).
  • a success threshold e.g. 50% success, 60% success, 75% success, etc.
  • the apparatus may select the remote devices based on capability metrics.
  • a capability metric is data that indicate a capability of a device.
  • a capability metric may include data indicating processor speed, memory size, storage size, processor type(s) (e.g., CPU, GPU, and/or tensor processing unit (TPU)), etc.
  • the apparatus may request and/or receive the capability metrics and/or elements of the capability metrics from remote devices. For instance, the apparatus may request and/or receive data from a remote device indicating processor speed, memory size, storage size, processor type(s), etc.
  • the apparatus may select the remote devices based on operational statuses.
  • An operational status is data that indicate a state of operation of a device.
  • an operational status may include data indicating current processor load, memory load, battery charge, charging status (e.g., whether the remote device is being currently charged), connectivity (e.g., communication bandwidth), stored data (e.g., whether the remote device is storing data for training), etc.
  • the apparatus may request and/or receive the operational statuses and/or elements of the operational statuses from remote devices. For instance, the apparatus may request and/or receive data from a remote device indicating current processor load, memory load, battery charge, connectivity (e.g., communication bandwidth), etc.
  • the apparatus may determine an eligibility metric or metrics corresponding to a remote device or devices.
  • an eligibility metric may be determined based on a historical usage pattern, an environmental context, a training success value, a capability metric, and/or an operational status corresponding to a remote device.
  • an eligibility metric may be the historical usage pattern, the environmental context, the training success value, the capability metric, the operational status, or any combination thereof.
  • the apparatus may determine a historical usage pattern score, an environmental context score, a training success value score, a capability metric score, and/or an operational status score.
  • the historical usage pattern score may indicate a value in a range (e.g., a probability) of remote device availability based on the historical usage pattern (and/or a time for training).
  • the environmental context score may indicate a value in a range (e.g., a probability) of remote device availability based on the environmental context.
  • the training success value score may indicate a value in a range (e.g., a probability) of training success based on the training success value.
  • the capability metric score may indicate a value in a range (e.g., a probability) of remote device training capability based on the capability metric.
  • a capability metric score may be determined based on a training load for a portion of the machine learning model. For example, the apparatus may evaluate the capabilities of a remote device relative to a training load. For instance, if a remote device has capabilities that meet or exceed the training load (e.g., that can process the training load within a threshold period), the capability metric score may be increased. If a remote device has capabilities that do not meet the training load, the capability metric score may be decreased.
  • the operational status score may indicate a value in a range (e.g., a probability) of available resources of a remote device based on the operational status metric.
  • an operational status score may be determined based on a training load for a portion of the machine learning model. For example, the apparatus may evaluate the operational status of a remote device relative to a training load. For instance, if a remote device has an operational status to (e.g., currently available resources that) meet or exceed the training load (e.g., that can process the training load within a threshold period), the operational status score may be increased. If a remote device has capabilities that do not meet the training load, the operational status score may be decreased.
  • the apparatus may determine an average or weighted average of the historical usage pattern score, environmental context score, training success value score, capability metric score, and/or operational status score to determine the eligibility metric in some examples.
  • the apparatus may select a remote device or devices based on the eligibility metrics. For instance, the apparatus may select a remote device with an eligibility metric that satisfies an eligibility threshold (e.g., 50%, 60%, 75%, etc.).
  • an eligibility threshold e.g. 50%, 60%, 75%, etc.
  • the apparatus may receive trained model portions (e.g., weights, gradients, etc.) from the remote devices. For example, the apparatus may receive third weights (e.g., trained weights and/or gradients) from the remote devices. In some examples, the apparatus may receive trained model portions from the selected remote devices.
  • the apparatus may update 106 the pruned model based on the third weights received from the remote devices. For example, the apparatus may update (e.g., adjust and/or replace) weights of the pruned model in accordance with the third weights. In some examples, the apparatus may aggregate the received trained model portions (e.g., weights, gradients, etc.) to update the pruned model.
  • the apparatus may update (e.g., adjust and/or replace) weights of the pruned model in accordance with the third weights.
  • the apparatus may aggregate the received trained model portions (e.g., weights, gradients, etc.) to update the pruned model.
  • the apparatus may repeat and/or iterate the method 100. For instance, the apparatus may prune weights from the updated model, may distribute the pruned updated model, etc. For example, the apparatus may perform pruning after receiving trained model portions (e.g., third weights) from the remote device and/or updating the model. For instance, the apparatus may remove weights with an amplitude or magnitude below a threshold, may determine updated relevance values, and/or may select a subset of the remote devices, which may be repeated until a threshold size of the machine learning model is reached.
  • trained model portions e.g., third weights
  • the apparatus may select a subset of the remote devices based on the third weights. For example, the apparatus may select remote devices corresponding to third weights that satisfy a criterion (e.g., that have returned third weights that satisfy a criterion). In some examples, selecting the subset of remote devices may include selecting remote devices corresponding to weights that are greater than or at least a threshold. For instance, if a third weight or weights returned by a remote device have an amplitude or amplitudes (or a magnitude or magnitudes) at or above a threshold, the remote device may be selected for the subset of remote devices.
  • a criterion e.g., that have returned third weights that satisfy a criterion.
  • selecting the subset of remote devices may include selecting remote devices corresponding to weights that are greater than or at least a threshold. For instance, if a third weight or weights returned by a remote device have an amplitude or amplitudes (or a magnitude or magnitudes) at or above
  • the remote device may not be selected.
  • portions of the machine learning model with low weights may be less significant, and remote devices assigned to train portions with low weights may be ignored.
  • the apparatus may distribute the updated model to the subset of remote devices. In some examples, the apparatus may distribute updated relevance values to the subset of remote devices.
  • Figure 2 is a flow diagram illustrating an example of a method 200 for distributing a machine learning model.
  • the method 200 and/or a method 200 element or elements may be performed by an apparatus (e.g., electronic device, computing device, server, etc.).
  • the method 200 may be performed by the apparatus 302 described in connection with Figure 3.
  • the method 200 or element(s) thereof described in connection with Figure 2 may be an example of the method 100 or element(s) thereof described in connection with Figure 1.
  • the apparatus may select 202 remote devices based on historical usage patterns, environmental contexts, training success values, capability metrics, operational statuses, and/or received weights. In some examples, selecting 202 the remote devices may be performed as described in relation to Figure 1. For example, the apparatus may utilize an eligibility metric to determine whether to select a remote device for training. In some examples, a set of remote devices may be selected from a set of candidate devices to perform training. In some examples, not all candidate remote devices may be selected. For instance, selection of some remote devices may be performed to address a high number of remote devices and convergence latency targets for distributed machine learning models. In some examples, eligibility metrics may be used to select the remote devices.
  • an eligibility metric may be based on historical usage patterns, environmental contexts, training success values, capability metrics, and/or operational statuses. For example, the eligibility metric may be based on whether a remote device is idle, network connectivity of a remote device, remote device charging status, data stored on the remote device, time of the day, etc. For instance, the eligibility metric may be utilized to opportunistically select remote device resources to perform distributed machine learning model training. In some examples, some remote device may be executing other tasks. By profiling remote device capabilities, the apparatus may determine what type of machine learning tasks to assign and when.
  • the apparatus may select 202 remote devices based on the relevance (e.g., relevance values) of machine learning model components (e.g., connections) and/or weights (when aggregated by the apparatus, for example). For instance, weights that impact the aggregation less may be utilized for the selection of remote devices (e.g., remote devices with weights or a combination of weights less than a threshold may not be selected or may be de-selected). In some examples, an overall weights matrix may indicate the relevance of a remote device or devices.
  • relevance e.g., relevance values
  • machine learning model components e.g., connections
  • weights when aggregated by the apparatus, for example. For instance, weights that impact the aggregation less may be utilized for the selection of remote devices (e.g., remote devices with weights or a combination of weights less than a threshold may not be selected or may be de-selected).
  • an overall weights matrix may indicate the relevance of a remote device or devices.
  • the apparatus may prune 204 weights from a machine learning model to produce a pruned model.
  • pruning 204 the weights may be performed as described in relation to Figure 1.
  • the apparatus may remove weights from the machine learning model and/or may determine relevance values corresponding to weights of the machine learning model.
  • the apparatus may distribute 206 the pruned model and relevance values corresponding to remaining weights of the pruned model to the remote devices.
  • distributing 206 the pruned model and the relevance values may be performed as described in relation to Figure 1.
  • the apparatus may update 208 the pruned model based on weights received from the remote devices.
  • updating 208 the pruned model may be performed as described in relation to Figure 1.
  • the apparatus may determine 210 whether training is complete. In some examples, determining 210 whether training is complete may be performed as described in relation to Figure 1. In some examples, the apparatus may determine whether the machine learning model has reached a threshold size (e.g., has less than a threshold number of components, weights, connections, nodes, etc.) to determine whether training is complete. In some examples, the apparatus may determine whether a threshold amount of iterations (e.g., iterations of pruning and/or distribution, etc.) has been performed. For instance, the threshold amount of iterations may be 50, 100, 500, 1000, 2000, etc.
  • the apparatus may return to select 202 remote devices, prune 204 weights, distribute 206 the pruned model, and/or update 208 the pruned model, etc.
  • operation may end 212.
  • operation(s), function(s), and/or element(s) of the method 200 may be omitted and/or combined.
  • FIG. 3 is a block diagram of an example of an apparatus 302 and training devices 328 that may be used in machine learning model distribution and/or training.
  • the apparatus 302 may be an electronic device, such as a central device, a server computer, a personal computer, a laptop computer, etc.
  • the apparatus 302 may include and/or may be coupled to a processor 304 and/or a memory 306.
  • the apparatus 302 may include additional components (not shown) and/or some of the components described herein may be removed and/or modified without departing from the scope of this disclosure.
  • the processor 304 may be any of a CPU, a digital signal processor (DSP), a semiconductor-based microprocessor, GPU, TPU field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), and/or other hardware device suitable for retrieval and execution of instructions stored in the memory 306.
  • the processor 304 may fetch, decode, and/or execute instructions stored in the memory 306.
  • the processor 304 may include an electronic circuit or circuits that include electronic components for performing a function or functions of the instructions.
  • the processor 304 may be implemented to perform one, some, or all of the operations, elements, etc., described in connection with one, some, or all of Figures 1-4.
  • the memory 306 may be any electronic, magnetic, optical, or other physical storage device that contains or stores electronic information (e.g., instructions and/or data).
  • the memory 306 may be, for example, Random Access Memory (RAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc, and/or the like.
  • RAM Random Access Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • the memory 306 may be volatile and/or non-volatile memory, such as Dynamic Random Access Memory (DRAM), EEPROM, magnetoresistive random-access memory (MRAM), phase change RAM (PCRAM), memristor, flash memory, and/or the like.
  • DRAM Dynamic Random Access Memory
  • MRAM magnetoresistive random-access memory
  • PCRAM phase change RAM
  • memristor flash memory, and/or the like.
  • the memory 306 may be a non-transitory tangible machine-readable storage medium, where the term “non-transitory” does not encompass transitory propagating signals.
  • the memory 306 may include multiple devices (e.g., a RAM card and a solid-state drive (SSD)).
  • the apparatus 302 may include a communication interface 324 through which the processor 304 may communicate with an external device or devices (e.g., training devices 328).
  • a training device 328 is an electronic device for training a machine learning model or models and/or a portion or portions of a machine learning model(s).
  • the apparatus 302 may be in communication with (e.g., coupled to, have a communication link with) a training device or devices 328 via a network 326.
  • Examples of the training devices 328 may include computing devices, server computers, desktop computers, laptop computers, smartphones, tablet devices, game consoles, smart appliances, vehicles, autonomous vehicles, aircraft, drones, virtual reality devices, augmented reality devices, etc.
  • Examples of the network 326 may include a local area network (LAN), wide area network (WAN), the Internet, cellular network, Long Term Evolution (LTE) network, 5G network, and/or combinations thereof, etc.
  • the apparatus 302 may be a central device or cloud device and the training device(s) 328 may be edge devices.
  • the communication interface 324 may include hardware and/or machine-readable instructions to enable the processor 304 to communicate with the training devices 328.
  • the communication interface 324 may enable a wired and/or wireless connection to the training devices 328.
  • the communication interface 324 may include a network interface card and/or may also include hardware and/or machine-readable instructions to enable the processor 304 to communicate with the training devices 328.
  • the communication interface 324 may include hardware (e.g., circuitry, ports, connectors, antennas, etc.) and/or machine-readable instructions to enable the processor 304 to communicate various input and/or output devices, such as a keyboard, a mouse, a display, another apparatus, electronic device, computing device, etc., through which a user may input instructions and/or data into the apparatus 302.
  • the apparatus 302 e.g., processor 304 may utilize the communication interface 324 to send and/or receive information.
  • the apparatus 302 may utilize the communication interface 324 to distribute a machine learning model (e.g., to send components and/or portions of a machine learning model or models) and/or may utilize the communication interface 324 to receive a machine learning model (e.g., to receive trained components and/or portions of a machine learning model or models).
  • the apparatus 302 may utilize the communication interface 324 to receive a result or results.
  • a result is an output or determination of a machine learning model.
  • a result may be an inference, a prediction, a value, etc., produced by a machine learning model or a portion of machine learning model on a remote device.
  • each training device 328 may include a processor, memory, and/or communication interface (not shown in Figure 3).
  • each of the memories of the training devices 328 may be any electronic, magnetic, optical, or other physical storage device that contains or stores electronic information (e.g., instructions and/or data), such as, for example, RAM, EEPROM, a storage device, an optical disc, and/or the like.
  • each of the processors of the training devices 328 may be any of a CPU, a DSP, a TPU, a semiconductor-based microprocessor, GPU, FPGA, an ASIC, and/or other hardware device suitable for retrieval and execution of instructions stored in corresponding memory.
  • each communication interface of the training devices 328 may include hardware and/or machine-readable instructions to enable the respective training device 328 to communicate with the apparatus 302.
  • Each of the training devices 328 may have similar or different processing capabilities, memory capacities, and/or communication capabilities relative to each other and/or relative to the apparatus 302.
  • the memory 306 of the apparatus 302 may store eligibility metric determination instructions 312, selection instructions 314, distribution instructions 318, model structuring instructions 316, eligibility metric data 322, environmental context data 308, and/or model data 310.
  • the model data 310 may include and/or represent a machine learning model or models, portions of a machine learning model or models, and/or components (e.g., nodes, connections, layers, weights, activation functions, etc.) of a machine learning model or models.
  • the processor 304 may execute the eligibility metric determination instructions 312 to determine eligibility metrics for the training devices 328. In some examples, determining the eligibility metrics may be performed as described in relation to Figure 1 and/or Figure 2. For example, the apparatus 302 may receive environmental contexts from the training devices 328, which may be stored as environmental context data 308. In some examples, the processor 304 may determine the eligibility metrics based on environmental contexts of the training devices 328. In some examples, the eligibility metrics may be stored as eligibility metric data 322 and/or as a portion of eligibility metric data 322.
  • the processor 304 may execute the selection instructions 314 to determine selected training devices based on the eligibility metrics. In some examples, the processor 304 may determine the selected training devices as described in relation to Figure 1 and/or Figure 2. For example, the training devices 328 may be examples of the remote devices described in relation to Figure 1 and/or Figure 2.
  • the processor 304 may execute the distribution instructions 318 to send portions of a machine learning model, weights, and/or relevance values to the selected training devices.
  • sending portions of a machine learning model may be performed as described in relation to Figure 1 and/or Figure 2.
  • the apparatus 302 may send portions of a machine learning model (e.g., nodes, connections, layers, etc.), weights, and/or relevance values to the selected training devices (e.g., a subset of) of the training devices 328.
  • the selected training devices may train the respective portions of the machine learning model sent from the apparatus 302.
  • each of the training devices 328 may include training instructions 320, which may be executed to train a machine learning model or a portion of a machine learning model.
  • a loss function may be utilized to determine and/or refine weights.
  • the selected training devices may send trained portions of the machine learning model to the apparatus 302.
  • the apparatus 302 may receive the trained portions of the machine learning model.
  • the processor 304 may execute the model structuring instructions 316 to combine trained portions of the machine learning model received from the selected training devices.
  • combining trained portions of the machine learning model may be performed as described in relation to Figure 1 and/or Figure 2.
  • the apparatus 302 may adjust weights based on the trained portions and/or aggregate the trained portions of the machine learning model.
  • the processor 304 may execute the model structuring instructions 316 to prune the machine learning model to determine the relevance values. In some examples, determining the relevance values may be performed as described in relation to Figure 1 and/or Figure 2.
  • the apparatus 302 may execute the trained machine learning model to predict and/or infer a result or results based on input data. In some examples, the apparatus 302 may send the trained machine learning model to another device or devices to predict and/or infer a result or results based on input data. In some examples, the apparatus 302 may receive a result or results (e.g., inference(s), prediction(s), etc.). In some examples, the apparatus 302 (e.g., processor 304) may utilize the communication interface 324 to receive the result(s).
  • a result or results e.g., inference(s), prediction(s), etc.
  • the apparatus 302 e.g., processor 304 may utilize the communication interface 324 to receive the result(s).
  • training may be performed by training devices 328 (e.g., selected training devices) concurrently.
  • a first training device and a second training device may train portions of the machine learning model (e.g., different portions of the machine learning model) in an overlapping time period.
  • different portions of the machine learning model may be trained at different times.
  • the apparatus 302 may present the results.
  • the apparatus 302 may present an indication of a result (e.g., text indicating an image classification, an image showing bounding boxes of detected objects, text indicating filtered emails, text indicating a transcription of audio, etc.) on a display.
  • the apparatus 302 may send the results to another device (e.g., server, smartphone, tablet, computer, game console, etc.).
  • Figure 4 is a block diagram illustrating an example of a computer- readable medium 440 for distributing a machine learning model.
  • the computer- readable medium is a non-transitory, tangible computer-readable medium 440.
  • the computer-readable medium 440 may be, for example, RAM, EEPROM, a storage device, an optical disc, and the like.
  • the computer- readable medium 440 may be volatile and/or non-volatile memory, such as DRAM, EEPROM, MRAM, PCRAM, memristor, flash memory, and the like.
  • the memory 306 described in connection with Figure 3 may be an example of the computer-readable medium 440 described in connection with Figure 4.
  • the computer-readable medium 440 may include code (e.g., data and/or instructions or executable code).
  • code e.g., data and/or instructions or executable code.
  • the computer-readable medium 440 may include initial training instructions 442, pruning instructions 444, selection instructions 446, distribution instructions 448, and/or aggregation instructions 450.
  • the initial training instructions 442 may include code to cause a processor to perform an initial training of a machine learning model. In some examples, performing the initial training may be accomplished as described in connection with Figure 1 , Figure 2, and/or Figure 3.
  • the pruning instructions 444 may include code to cause a processor to prune first parameters from the machine learning model and determine relevance values corresponding to second parameters of the machine learning model. This may be accomplished as described in connection with Figure 1 , Figure 2, and/or Figure 3. For example, the first parameters (e.g., first weights) may be pruned from the machine learning model, and relevance values corresponding to second parameters (e.g., second weights) may be determined.
  • the selection instructions 446 may include code to cause a processor to select remote devices. This may be accomplished as described in connection with Figure 1 , Figure 2, and/or Figure 3. For example, selection instructions 446 may be executed to determine a set of selected remote devices.
  • the distribution instructions 448 may include code to cause a processor to distribute the machine learning model, the second parameters, and the relevance values to the set of selected remote devices. This may be accomplished as described in connection with Figure 1 , Figure 2, and/or Figure 3.
  • the aggregation instructions 450 may include code to cause a processor to aggregate trained parameters from the set of selected remote devices to produce a trained machine learning model. This may be accomplished as described in connection with Figure 1 , Figure 2, and/or Figure 3.
  • the selection instructions 446 may include code to cause a processor to remove a remote device from the set of selected remote devices based on the trained parameters. This may be accomplished as described in connection with Figure 1 , Figure 2, and/or Figure 3. For example, if the trained parameters (e.g., weights) corresponding to a selected remote device are below a threshold, the remote device may be removed from the set of selected remote devices. In some examples, a remote device may be removed based on an iteration threshold. For example, if the trained parameters corresponding to a selected remote device have been below a threshold for a number of iterations (e.g., a threshold number of iterations), the remote device may be removed from the set of selected remote devices.
  • a threshold number of iterations e.g., a threshold number of iterations
  • the pruning instructions 444 may include code to cause a processor to prune the trained machine learning model. This may be accomplished as described in connection with Figure 1 , Figure 2, and/or Figure 3. For example, a trained weight or weights may be pruned from the trained machine learning model.
  • the distribution instructions 448 may include code to cause a processor to distribute the trained machine learning model to the set of selected remote devices. This may be accomplished as described in connection with Figure 1 , Figure 2, and/or Figure 3. For example, the trained machine learning model may be distributed to the set of selected remote devices for further training and/or inferencing.
  • edge devices There is a demand for efficient learning on edge devices due to large data volumes. There is also a demand for inferencing on edge devices for real time applications. For example, embedding artificial intelligence accelerators on cameras, consumer electronics devices, autonomous systems, and robots is gaining momentum. Data privacy is also a concern in various areas such as healthcare, finance, enterprises, defense, digital manufacturing, etc.
  • Coordination of training on distributed devices may be useful. For example, coordination may be utilized to determine when machine learning training should be executed and on which devices. The device selection may have an impact on the performance of the training, such as the training latency (e.g., average job completion time) and accuracy of the machine learning models.
  • Some examples of the techniques described herein may utilize eligibility metrics to select a set of devices. In some examples, the eligibility metrics may be used to opportunistically utilize device resources for training. Since some devices might be executing other tasks, by profiling device capabilities, insights may be determined about what type of machine learning tasks can be assigned and when, and how device involvement enhances the machine learning models.
  • Machine learning models such as deep neural networks (DNN) may be large (e.g., having hundreds of millions of parameters). Updating models during learning process may consume a significant amount of network bandwidth. The size of the models may have an impact on device power consumption and compute efficiency.
  • some examples of the techniques described herein may perform model pruning to increase the efficiency of distributed machine learning. For example, model pruning may remove redundant parameters according to parameter relevance. With model pruning, the size of the machine learning models may be reduced and/or training may be accelerated. Model pruning may reduce the number of devices performing training and may reduce power consumption by the devices, which may enhance efficiency.
  • devices that have not been relevant may be removed from the set of devices for training. This may reduce the communication cost and overhead induced by the devices training less relevant portions of a machine learning model.
  • Some examples of the techniques described herein may provide coordination of machine learning workloads (e.g., training and/or inference) on heterogeneous edge devices. Some examples of the coordination may be aware of environmental contexts for coordination of training based on an eligibility metric. Some examples of the techniques described herein may provide joint machine learning model training and pruning for distributed machine learning to enhance computation, communication, and/or storage of machine learning models.
  • machine learning workloads e.g., training and/or inference
  • Some examples of the coordination may be aware of environmental contexts for coordination of training based on an eligibility metric.
  • Some examples of the techniques described herein may provide joint machine learning model training and pruning for distributed machine learning to enhance computation, communication, and/or storage of machine learning models.
  • the techniques described herein may save bandwidth by not sending original data (e.g., entire images, audio, text, etc.) to a central device. Some examples may be useful in privacy-sensitive scenarios, since original data (e.g., user data) may not be transmitted over the network.
  • the term “and/or” may mean an item or items.
  • the phrase “A, B, and/or C” may mean any of: A (without B and C), B (without A and C), C (without A and B), A and B (but not C), B and C (but not A), A and C (but not B), or all of A, B, and C.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Telephonic Communication Services (AREA)

Abstract

La présente invention concerne des exemples de modèles d'apprentissage automatique. Dans certains exemples, un procédé peut consister à élaguer des premières pondérations à partir d'un modèle d'apprentissage automatique pour produire un modèle élagué. Dans certains exemples, le procédé peut consister à distribuer le modèle élagué et des valeurs de pertinence correspondant à des deuxièmes pondérations du modèle élagué à des dispositifs distants. Dans certains exemples, le procédé peut consister à mettre à jour le modèle élagué sur la base de troisièmes pondérations reçues en provenance des dispositifs distants.
PCT/US2020/038978 2020-06-22 2020-06-22 Modèles d'apprentissage automatique distribués Ceased WO2021262139A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2020/038978 WO2021262139A1 (fr) 2020-06-22 2020-06-22 Modèles d'apprentissage automatique distribués

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2020/038978 WO2021262139A1 (fr) 2020-06-22 2020-06-22 Modèles d'apprentissage automatique distribués

Publications (1)

Publication Number Publication Date
WO2021262139A1 true WO2021262139A1 (fr) 2021-12-30

Family

ID=79281646

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/038978 Ceased WO2021262139A1 (fr) 2020-06-22 2020-06-22 Modèles d'apprentissage automatique distribués

Country Status (1)

Country Link
WO (1) WO2021262139A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023220848A1 (fr) * 2022-05-16 2023-11-23 Nvidia Corporation Détection de robustesse d'un réseau de neurones
US20240028235A1 (en) * 2022-07-19 2024-01-25 ECS Partners Limited Neural network memory configuration
US12367395B2 (en) * 2020-08-21 2025-07-22 Huawei Technologies Co., Ltd. System and methods for supporting artificial intelligence service in a network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150193695A1 (en) * 2014-01-06 2015-07-09 Cisco Technology, Inc. Distributed model training
WO2018017467A1 (fr) * 2016-07-18 2018-01-25 NantOmics, Inc. Systèmes, appareils et procédés d'apprentissage automatique distribué
WO2019209059A1 (fr) * 2018-04-25 2019-10-31 Samsung Electronics Co., Ltd. Apprentissage machine sur une chaîne de blocs
WO2020081399A1 (fr) * 2018-10-15 2020-04-23 Nam Sung Kim Architecture centrée sur le réseau et algorithmes pour accélérer l'apprentissage distribué de réseaux neuronaux

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150193695A1 (en) * 2014-01-06 2015-07-09 Cisco Technology, Inc. Distributed model training
WO2018017467A1 (fr) * 2016-07-18 2018-01-25 NantOmics, Inc. Systèmes, appareils et procédés d'apprentissage automatique distribué
WO2019209059A1 (fr) * 2018-04-25 2019-10-31 Samsung Electronics Co., Ltd. Apprentissage machine sur une chaîne de blocs
WO2020081399A1 (fr) * 2018-10-15 2020-04-23 Nam Sung Kim Architecture centrée sur le réseau et algorithmes pour accélérer l'apprentissage distribué de réseaux neuronaux

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12367395B2 (en) * 2020-08-21 2025-07-22 Huawei Technologies Co., Ltd. System and methods for supporting artificial intelligence service in a network
WO2023220848A1 (fr) * 2022-05-16 2023-11-23 Nvidia Corporation Détection de robustesse d'un réseau de neurones
US20240028235A1 (en) * 2022-07-19 2024-01-25 ECS Partners Limited Neural network memory configuration

Similar Documents

Publication Publication Date Title
US12430702B2 (en) Learning robotic tasks using one or more neural networks
US11307864B2 (en) Data processing apparatus and method
US11392829B1 (en) Managing data sparsity for neural networks
US11307865B2 (en) Data processing apparatus and method
US20220083389A1 (en) Ai inference hardware resource scheduling
US12055995B2 (en) Automatic error prediction in data centers
US20230409876A1 (en) Automatic error prediction for processing nodes of data centers using neural networks
CN106502799A (zh) 一种基于长短时记忆网络的主机负载预测方法
US20220343146A1 (en) Method and system for temporal graph neural network acceleration
WO2021262139A1 (fr) Modèles d'apprentissage automatique distribués
US20230229963A1 (en) Machine learning model training
Aminiyeganeh et al. IoT video analytics for surveillance-based systems in smart cities
US20240062042A1 (en) Hardening a deep neural network against adversarial attacks using a stochastic ensemble
US12412108B2 (en) System and method for inference generation via optimization of inference model portions
US12346789B2 (en) System and method for execution of inference models across multiple data processing systems
US11704562B1 (en) Architecture for virtual instructions
CN116932218A (zh) 内存信息确定方法、装置及电子设备
US11086634B2 (en) Data processing apparatus and method
US20230206113A1 (en) Feature management for machine learning system
US20230051713A1 (en) Multiple-task neural networks
CN114118341A (zh) 量化方法、计算装置和计算机可读存储介质
CN117094031B (zh) 工业数字孪生数据隐私保护方法及相关介质
CN117522718B (zh) 基于深度学习的水下图像增强方法
US11816134B1 (en) System and method for reduction of data transmission in dynamic systems using causal graphs
CN111815658A (zh) 一种图像识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20942390

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20942390

Country of ref document: EP

Kind code of ref document: A1