CN109978149B

CN109978149B - Scheduling method and related device

Info

Publication number: CN109978149B
Application number: CN201711467783.4A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2020-10-09
Anticipated expiration: 2037-12-28
Also published as: CN109978149A

Abstract

The embodiment of the application discloses a scheduling method and a related device, wherein the method is based on a server comprising a plurality of computing devices and comprises the following steps: receiving M operation requests; selecting at least one target computing device from the plurality of computing devices according to the operation task of each operation request in the M operation requests, and determining an operation instruction corresponding to each target computing device in the at least one target computing device; calculating the operation data corresponding to the M operation requests according to the operation instruction corresponding to each target computing device in the at least one target computing device to obtain M final operation results; and sending each final operation result in the M final operation results to corresponding electronic equipment. According to the embodiment of the application, the computing device corresponding to the computing request received by the server can be selected for computing, and the operating efficiency of the server is improved.

Description

Scheduling method and related device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a scheduling method and a related apparatus.

Background

The neural network is the basis of many artificial intelligence applications at present, and with the further expansion of the application range of the neural network, various neural network models are stored by adopting a server or cloud computing service, and operation is carried out according to an operation request submitted by a user. In the face of numerous neural network models and large batch of requests, how to improve the operational efficiency of the server is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides a scheduling method and a related device, which can select a computing device corresponding to an operation request received in a server to perform operation, so that the operation efficiency of the server is improved.

In a first aspect, an embodiment of the present application provides a scheduling method, based on a server of multiple computing devices, the method including:

receiving M operation requests, wherein M is a positive integer;

selecting at least one target computing device corresponding to the M operation requests from the plurality of computing devices;

obtaining M final operation results based on the operation of each target computing device in the at least one target computing device executing the corresponding operation request;

and sending each final operation result in the M final operation results to corresponding electronic equipment.

In a second aspect, an embodiment of the present application provides a server, which includes a plurality of computing devices, wherein:

a receiving unit for receiving M operation requests;

a scheduling unit, configured to select at least one target computing device corresponding to the M operation requests from the plurality of computing devices;

the operation unit is used for executing the operation of the corresponding operation request based on each target calculation device in the at least one target calculation device to obtain M final operation results;

and the sending unit is used for sending each final operation result in the M final operation results to the corresponding electronic equipment.

In a third aspect, embodiments provide another server comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for some or all of the steps described in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method of the first aspect.

After the scheduling method and the related device are adopted, the target computing device for executing the M operation requests is selected from the M computing devices in the server based on the received operation requests, the operation is carried out according to the corresponding operation requests based on the target computing device, and the final operation result corresponding to each operation request is sent to the corresponding electronic equipment, namely, the computing resources are uniformly distributed according to the operation requests, so that the multiple computing devices in the server effectively cooperate, and the operation efficiency of the server is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Wherein:

fig. 1 is a schematic structural diagram of a server provided in an embodiment of the present application;

fig. 1a is a schematic structural diagram of a computing unit provided in an embodiment of the present application;

fig. 1b is a schematic structural diagram of a main processing circuit according to an embodiment of the present disclosure;

FIG. 1c is a schematic data distribution diagram of a computing unit according to an embodiment of the present application;

FIG. 1d is a schematic diagram of a data return of a computing unit according to an embodiment of the present application;

fig. 1e is an operational diagram of a neural network structure according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a scheduling method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of another server provided in the embodiment of the present application;

fig. 4 is a schematic structural diagram of another server provided in the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The embodiment of the application provides a scheduling method and a related device, which can select a computing device corresponding to an operation request received in a server to perform operation, so that the operation efficiency of the server is improved. The present application is described in further detail below with reference to specific embodiments and with reference to the attached drawings.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a server according to an embodiment of the present disclosure. As shown in fig. 1, the server includes a plurality of computing devices, and the computing devices include, but are not limited to, server computers, and may be Personal Computers (PCs), network PCs, minicomputers, mainframe computers, and the like.

In the present application, each computing device included in the server establishes a connection and transfers data between them by wire or wirelessly, and each computing device includes at least one computing carrier, such as: a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a processor board card, and the like. The server related to the application can also be a cloud server, and provides cloud computing service for the electronic equipment.

Wherein, each calculation carrier can comprise at least one calculation unit for neural network operation, such as: a processing chip, etc. The specific structure of the computing unit is not limited, please refer to fig. 1a, and fig. 1a is a schematic structural diagram of the computing unit. As shown in fig. 1a, the calculation unit includes: a main processing circuit, a basic processing circuit and a branch processing circuit. Specifically, the main processing circuit is connected with the branch processing circuit, and the branch processing circuit is connected with at least one basic processing circuit.

The branch processing circuit is used for receiving and transmitting data of the main processing circuit or the basic processing circuit.

Referring to fig. 1b, fig. 1b is a schematic structural diagram of a main processing circuit, as shown in fig. 1b, the main processing circuit may include a register and/or an on-chip cache circuit, and the main processing circuit may further include a control circuit, a vector operator circuit, an ALU (arithmetic and logic unit) circuit, an accumulator circuit, a DMA (Direct memory access) circuit, and other circuits.

The main processing circuit further includes a data transmitting circuit, a data receiving circuit or an interface, the data transmitting circuit may integrate the data distributing circuit and the data broadcasting circuit, and certainly in practical application, the data distributing circuit and the data broadcasting circuit may also be separately configured; in practical applications, the data transmitting circuit and the data receiving circuit may be integrated together to form a data transmitting/receiving circuit. For broadcast data, i.e. data that needs to be sent to each of the basic processing circuits. For the distribution data, i.e. the data that needs to be selectively sent to part of the basic processing circuits, the specific selection mode can be specifically determined by the main processing circuit according to the load and the calculation mode. For the broadcast transmission mode, broadcast data is transmitted to each base processing circuit in a broadcast form. (in practical applications, broadcast data is transmitted to each basic processing circuit by one-time broadcasting, or broadcast data is transmitted to each basic processing circuit by multiple-time broadcasting, and the specific embodiments of the present invention do not limit the number of times of broadcasting), the distribution transmission method is to selectively transmit the distribution data to a part of the basic processing circuits.

When data distribution is realized, the control circuit of the main processing circuit transmits data to part or all of the basic processing circuits (the data may be the same or different, specifically, if the data is transmitted in a distribution mode, the data received by each basic processing circuit receiving the data may be different, and certainly, the data received by some basic processing circuits may be the same;

specifically, when broadcasting data, the control circuit of the main processing circuit transmits data to part or all of the basic processing circuits, and each basic processing circuit receiving data may receive the same data, that is, the broadcast data may include data that all the basic processing circuits need to receive. Distributing the data may include: part of the basic processing circuitry requires received data. The main processing circuit may send the broadcast data to all of the branch processing circuits via one or more broadcasts, and the branch processing circuits forward the broadcast data to all of the base processing circuits.

Optionally, the vector operator circuit of the main processing circuit may perform vector operations, including but not limited to: two vectors are added, subtracted, multiplied, divided, the vectors are added, subtracted, multiplied, divided with a constant, or any operation is performed on each element in the vector. The continuous operation may be, for example, addition, subtraction, multiplication, division, activation, accumulation, and the like of the vector and the constant.

Each base processing circuit may include a base register and/or a base on-chip cache circuit; each base processing circuit may further include: an inner product operator circuit, a vector operator circuit, an accumulator circuit, or the like, in any combination. The inner product operator circuit, the vector operator circuit, and the accumulator circuit may be integrated circuits, or the inner product operator circuit, the vector operator circuit, and the accumulator circuit may be circuits provided separately.

The connection structure of the branch processing circuit and the base circuit may be arbitrary and is not limited to the H-type structure of fig. 1 b. Optionally, the main processing circuit to the base circuit is a broadcast or distribution structure, and the base circuit to the main processing circuit is a gather structure. The definitions of broadcast, distribution and collection are as follows:

the data transfer mode from the main processing circuit to the basic circuit can comprise:

the main processing circuit is connected with a plurality of branch processing circuits respectively, and each branch processing circuit is connected with a plurality of basic circuits respectively.

The main processing circuit is connected with a branch processing circuit, the branch processing circuit is connected with a branch processing circuit, and the like, a plurality of branch processing circuits are connected in series, and then each branch processing circuit is connected with a plurality of basic circuits respectively.

The main processing circuit is connected with a plurality of branch processing circuits respectively, and each branch processing circuit is connected with a plurality of basic circuits in series.

The main processing circuit is connected with a branch processing circuit, the branch processing circuit is connected with a branch processing circuit, and the like, a plurality of branch processing circuits are connected in series, and then each branch processing circuit is connected with a plurality of basic circuits in series.

When distributing data, the main processing circuit transmits data to part or all of the basic circuits, and the data received by each basic circuit for receiving data can be different;

when broadcasting data, the main processing circuit transmits data to part or all of the basic circuits, and each basic circuit receiving data receives the same data.

When collecting data, some or all of the base circuits transmit data to the main processing circuit. It should be noted that the computing unit shown in fig. 1a may be a single physical chip, and of course, in practical applications, the computing unit may also be integrated into other chips (e.g., CPU, GPU).

Referring to fig. 1c, fig. 1c is a schematic diagram of data distribution of a computing unit, as shown by an arrow in fig. 1c, the arrow is a distribution direction of data, as shown in fig. 1c, after receiving external data, a main processing circuit splits the external data and distributes the split data to a plurality of branch processing circuits, and the branch processing circuits send the split data to a basic processing circuit.

Referring to fig. 1d, fig. 1d is a schematic diagram of data return of a computing unit, as shown by an arrow in fig. 1d, the arrow is a return direction of the data, as shown in fig. 1d, a basic processing circuit returns the data (e.g., inner product calculation result) to a branch processing circuit, and the branch processing circuit returns the data to a main processing circuit.

For the input data, it may be specifically vector, matrix, multidimensional (three-dimensional or four-dimensional and above) data, and for a specific value of the input data, it may be referred to as an element of the input data.

The embodiment of the present disclosure further provides a computing method of a computing unit as shown in fig. 1a, where the computing method is applied to neural network computing, and specifically, the computing unit may be used to perform operations on input data and weight data of one or more layers in a multi-layer neural network.

Specifically, the computing unit is configured to perform an operation on input data and weight data of one or more layers of the trained multi-layer neural network;

or the computing unit is used for executing operation on the input data and the weight data of one or more layers in the multilayer neural network of forward operation.

The above operations include, but are not limited to: one or any combination of convolution operation, matrix multiplication matrix operation, matrix multiplication vector operation, bias operation, full connection operation, GEMM operation, GEMV operation and activation operation.

The GEMM calculation means: the operation of matrix-matrix multiplication in the BLAS library. The general representation of this operation is: c ═ alpha _ op (S) op (P) + beta _ C, where S and P are two input matrices, C is an output matrix, alpha and beta are scalars, op represents some operation on matrix S or P, and there are some additional integers as parameters to account for the width and height of matrix S and P;

the GEMV calculation means: the operation of matrix-vector multiplication in the BLAS library. The general representation of this operation is: c ═ alpha _ op (S) _ P + beta _ C, where S is the input matrix, P is the vector of inputs, C is the output vector, alpha and beta are scalars, and op represents some operation on the matrix S.

The connection relation between the computing carriers in the computing device is not limited, the computing carriers can be isomorphic or heterogeneous computing carriers, the connection relation between the computing units in the computing carriers is not limited, and the computing efficiency can be improved by the heterogeneous computing carriers or the computing units executing parallel tasks.

The computing apparatus shown in fig. 1 includes at least one computing carrier, where the computing carrier includes at least one computing unit, that is, a target computing apparatus selected in this application depends on a connection relationship between the computing apparatuses and specific physical hardware support conditions such as a neural network model and network resources deployed in each computing apparatus and attribute information of an operation request, and then the computing carriers of the same type may be deployed in the same computing apparatus, for example, the computing carrier for forward propagation is deployed in the same computing apparatus, but not in different computing apparatuses, which effectively reduces overhead of communication between the computing apparatuses and facilitates improving operation efficiency; the specific neural network model can also be deployed on a specific calculation carrier, that is, when the server receives an operation request for a specific neural network, the server calls the calculation carrier corresponding to the specific neural network to execute the operation request, so that the time for determining a processing task is saved, and the operation efficiency is improved.

In the present application, neural network models that are disclosed and widely used are referred to as "neural network models" (e.g., LeNet, AlexNet, ZFNet, GoogleNet, VGG, ResNet) in Convolutional Neural Networks (CNN).

Optionally, the operation requirement of each designated neural network model in the designated neural network model set and the hardware attribute of each computing device in the plurality of computing devices are obtained to obtain a plurality of operation requirements and a plurality of hardware attributes; and deploying the corresponding appointed neural network model on the appointed computing device corresponding to each appointed neural network model in the appointed neural network model set according to the plurality of operation requirements and the plurality of hardware attributes.

The designated neural network model set comprises a plurality of designated neural network models, the hardware attributes of the computing device comprise the network bandwidth, the storage capacity, the processor main frequency and the like of the computing device, and the hardware attributes of a computing carrier or a computing unit in the computing device are also included. That is, the computing device corresponding to the operation requirement of the designated neural network model is selected according to the hardware attribute of each computing device, thereby avoiding the server failure caused by untimely processing and improving the operation support capability of the server.

The input neurons and the output neurons mentioned in the application do not refer to the neurons in the input layer and the neurons in the output layer of the whole neural network, but for any two adjacent layers in the network, the neurons in the lower layer of the network feedforward operation are the input neurons, and the neurons in the upper layer of the network feedforward operation are the output neurons. Taking a convolutional neural network as an example, let a convolutional neural network have L layers, where K is 1, 2.., L-1, and for the K-th layer and the K + 1-th layer, the K-th layer is referred to as an input layer, where neurons are the input neurons, and the K + 1-th layer is referred to as an output layer, where neurons are the output neurons. That is, each layer except the topmost layer can be used as an input layer, and the next layer is a corresponding output layer.

The operations mentioned above are operations in one layer of the neural network, and for the multi-layer neural network, the implementation process is shown in fig. 1e, in which the arrow with the dotted line indicates the inverse operation, and the arrow with the solid line indicates the forward operation. In the forward operation, after the execution of the artificial neural network of the previous layer is completed, the output neuron obtained from the previous layer is used as the input neuron of the next layer to perform operation (or the output neuron is subjected to some operation and then used as the input neuron of the next layer), and meanwhile, the weight value is replaced by the weight value of the next layer. In the inverse operation, after the inverse operation of the artificial neural network of the previous layer is completed, the input neuron gradient obtained by the previous layer is used as the output neuron gradient of the next layer for operation (or the input neuron gradient is subjected to some operation and then used as the output neuron gradient of the next layer), and meanwhile, the weight value is replaced by the weight value of the next layer.

The forward operation of the neural network is an operation process of inputting input data to final output data, the propagation direction of the reverse operation is opposite to that of the forward operation, and the operation process of reversely passing through the forward operation is a loss function corresponding to the loss or the loss of the final output data and expected output data. Through repeated forward and backward information calculation, the weights of all layers are corrected in a loss or loss function gradient descending mode, the weights of all layers are adjusted, the neural network learning training process is also realized, and the loss of network output can be reduced.

Referring to fig. 2, fig. 2 is a flowchart illustrating a scheduling method according to an embodiment of the present application, and as shown in fig. 2, the method is applied to a server as shown in fig. 1, and the method relates to the electronic device that allows access to the server, where the electronic device may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), Mobile Stations (MS), terminal devices (terminal device), and so on.

201: m operation requests are received.

In the present application, M is a positive integer, the server receives M computation requests transmitted by the electronic devices that are allowed to access, and the number of the electronic devices and the number of the computation requests transmitted by each electronic device are not limited, that is, the M computation requests may be transmitted by one electronic device or may be transmitted by a plurality of electronic devices.

The operation request includes attribute information such as an operation task (whether a training task or a testing task) and a target neural network model related to the operation. The training task is used for training a target neural network model, namely performing forward operation and reverse operation on the neural network model until the training is finished; and the test task is used for carrying out forward operation once according to the target neural network model.

The target neural network models may be neural network models uploaded when a user sends an operation request through the electronic device, or neural network models stored in the server, and the like.

202: selecting at least one target computing device corresponding to the M operation requests from a plurality of computing devices.

The present application is not limited to how to select the target computing device, and may be selected according to the number of operation requests and the number of target neural network models, for example: if an operation request exists and the operation request corresponds to a target neural network model, the operation instructions corresponding to the operation request can be classified to obtain parallel instructions and serial instructions, the parallel instructions are distributed to different target computing devices for operation, and the serial instructions are distributed to the target computing devices which are skilled in processing for operation, so that the operation efficiency of each instruction is improved, and the operation efficiency is improved; if a plurality of operation requests exist and correspond to one target neural network model, a target computing device comprising the target neural network model can be adopted to carry out batch processing on operation data corresponding to the operation requests, so that time waste caused by repeated operation is avoided, and extra overhead caused by communication among different computing devices is avoided, thereby improving the operation efficiency; if a plurality of operation requests exist and correspond to a plurality of target neural network models, the computing device which is skilled in processing the target neural network models or the computing device which is deployed with the target neural network models before can be searched respectively to finish the operation requests, so that the time for initializing the network is saved, and the operation efficiency is improved.

Optionally, if the operation task of the target operation request is a test task, selecting a forward operation computing device including a target neural network model corresponding to the operation task from the plurality of computing devices to obtain a first target computing device; and if the operation task of the target operation request is a training task, selecting a calculation device comprising forward operation and backward training of a target neural network model corresponding to the operation task from the plurality of calculation devices to obtain the first target calculation device.

The target calculation request is any one calculation request in the M calculation requests, and the first target calculation device is a target calculation device corresponding to the target calculation request in the at least one target calculation device.

That is, if the operation task of the target operation request is a test task, the first target computing device is a computing device that can be used to perform a forward operation of the target neural network model; when the operation task is a training task, the first target computing device is a computing device which can be used for executing forward operation and backward training of the target neural network model, namely, the accuracy and the efficiency of operation can be improved by processing the operation request through a special computing device.

For example, the server includes a first computing device and a second computing device, wherein the first computing device only includes a forward operation for specifying the neural network model, and the second computing device can perform both the forward operation and the backward training operation of the specified neural network model. And when the target neural network model in the received target operation request is the specified neural network model and the operation task is a test task, determining that the first computing device executes the target operation request.

Optionally, selecting an auxiliary scheduling algorithm from an auxiliary scheduling algorithm set according to attribute information of each operation request in the M operation requests; selecting the at least one target computing device from the plurality of computing devices according to the secondary scheduling algorithm.

Wherein the set of secondary scheduling algorithms includes, but is not limited to, one of: Round-Robin scheduling (Round-Robin scheduling) algorithm, Weighted Round-Robin (Weighted Round-Robin) algorithm, Least-links (LeastConnections) algorithm, Weighted Least-links (Weighted Least Connections) algorithm, Locality-Based Least-links (Localness-Based Lecalness) algorithm, Locality-Based Least-links with Replication (Localness-Based Least Connections) algorithm, Destination address Hashing (Destination Hashing) algorithm, and Source address Hashing (Source Hashing) algorithm.

The method and the device do not limit how to select the auxiliary scheduling algorithm according to the attribute information, for example, if a plurality of target computing devices process the same operation request, the auxiliary scheduling algorithm can be a polling scheduling algorithm; if the compression resistance of different target computing devices is different and more operation requests should be allocated to the target computing devices with high configuration and low load, the auxiliary scheduling algorithm may be a weighted polling algorithm; if the workload allocated to each target computing device is different, the auxiliary scheduling algorithm may be a minimum link scheduling algorithm, and dynamically select one target computing device with the least number of currently backlogged connections to process the current request, so as to improve the utilization efficiency of the target computing device as much as possible, or may be a weighted minimum link scheduling algorithm.

That is, on the basis of the scheduling method in the above embodiment, the computing device that finally executes the operation request is selected in combination with the auxiliary scheduling algorithm, thereby further improving the operation efficiency of the server.

203: m final operation results are obtained based on the operation of each target computing device in the at least one target computing device executing the corresponding operation request.

The present application does not limit the operation data corresponding to each operation request, and may be image data used for image recognition, or voice data used for voice recognition, or the like; when the operation task is a test task, the operation data is data uploaded by the user, and when the operation task is a training task, the operation data can be a training set uploaded by the user or a training set stored in the server and corresponding to the target neural network model.

The calculation process of the operation instruction can generate a plurality of intermediate operation results, and the final operation result corresponding to each operation request can be obtained according to the intermediate operation results. The calculation method of the target calculation device in the embodiment of the present application is not limited, and the calculation method of the calculation unit shown in fig. 1a to 1d may be adopted.

204: and sending each final operation result in the M final operation results to corresponding electronic equipment.

It can be understood that, a target computing device executing M operation requests is selected from M computing devices included in the server based on the received operation requests, and the operation is performed according to the corresponding operation request by the target computing device, and a final operation result corresponding to each operation request is sent to the corresponding electronic device, that is, the computing resources are uniformly allocated according to the operation requests, so that the computing devices in the server effectively cooperate, thereby improving the operation efficiency of the server.

Optionally, the method further includes: waiting for a first preset time length, detecting whether each target computing device in the at least one target computing device obtains a corresponding final computing result, and if not, taking the target computing device which does not obtain the final computing result as a delay computing device; selecting a standby computing device corresponding to the operation request corresponding to the delay computing device from the idle computing devices of the plurality of computing devices; and executing the operation of the operation request corresponding to the delay calculation device based on the standby calculation device.

That is, when the first preset time length is reached, the computing device which does not complete the operation instruction is used as the delay computing device, the spare computing device is selected from the idle computing devices of the plurality of computing devices in the server according to the operation request executed by the delay computing device, and the operation of the operation request corresponding to the delay computing device is completed based on the spare computing device, so that the operation efficiency is improved.

Optionally, after the operation based on the operation request corresponding to the delay computing device is executed by the standby computing device, the method further includes: obtaining a final operation result obtained firstly between the delay calculation device and the standby calculation device; and sending a pause instruction to a computing device which does not obtain a final operation result between the delay computing device and the standby computing device.

The pause instruction is used for indicating the computing device which does not return the final operation result between the delay computing device and the standby computing device to pause the execution of the corresponding operation instruction. That is, the standby computing device executes the operation of the operation request corresponding to the delay computing device, selects the first final operation result obtained between the standby computing device and the delay computing device as the final operation result corresponding to the operation request, and sends a pause instruction to the computing device which passes through the delay computing device and the standby computing device and does not obtain the final operation result, namely, the operation of the computing device which does not complete the operation request is paused, thereby saving the power consumption.

Optionally, the method further includes: and waiting for a second preset time length, detecting whether the delay calculation device obtains a corresponding final operation result, and if not, taking the delay calculation device as a fault calculation device to send a fault instruction.

The fault instruction is used for informing the operation and maintenance personnel that the fault calculation device has a fault, and the second preset time length is longer than the first preset time length. That is, when the second preset time period is reached, if the final operation result obtained by the delay calculation device is not received, it is determined that the delay calculation device is in fault, and the corresponding operation and maintenance personnel are notified, so that the fault processing capability is improved.

Optionally, the method further includes: updating a hash table of the plurality of computing devices every target time threshold.

Among them, a Hash table (also called Hash table) is a data structure directly accessed from a Key value (Key value). In the present application, IP addresses of a plurality of computing devices are used as key values and are mapped to a position in a hash table through a hash function (mapping function), that is, after a target computing device is determined, physical resources allocated by the target computing device can be quickly found. The specific form of the hash table is not limited, and the hash table may be a static hash table set manually, or may be a hardware resource allocated according to an IP address. And the hash tables of the plurality of computing devices are updated every other target time threshold, so that the searching accuracy and the searching efficiency are improved.

Referring to fig. 3, fig. 3 is a schematic structural diagram of another server provided in the present application, consistent with the embodiment in fig. 2, where the server includes a plurality of computing devices. As shown in fig. 3, the server 300 includes:

a receiving unit 301, configured to receive M operation requests, where M is a positive integer;

a scheduling unit 302, configured to select at least one target computing device corresponding to the M operation requests from the plurality of computing devices;

an operation unit 303, configured to perform an operation of a corresponding operation request based on each target computing device in the at least one target computing device, to obtain M final operation results;

a sending unit 304, configured to send each final operation result of the M final operation results to a corresponding electronic device.

Optionally, the scheduling unit 302 is specifically configured to, if an operation task of a target operation request is a test task, select a forward operation computation device including a target neural network model corresponding to the operation task from the multiple computation devices to obtain a first target computation device, where the target operation request is any one of the M operation requests, and the first target computation device is a target computation device corresponding to the target operation request in the at least one target computation device; and if the operation task of the target operation request is a training task, selecting a calculation device comprising forward operation and backward training of a target neural network model corresponding to the operation task from the plurality of calculation devices to obtain the first target calculation device.

Optionally, the scheduling unit 302 is specifically configured to select an auxiliary scheduling algorithm from an auxiliary scheduling algorithm set according to attribute information of each operation request in the M operation requests, where the auxiliary scheduling algorithm set includes at least one of the following: a polling scheduling algorithm, a weighted polling algorithm, a minimum link algorithm, a weighted minimum link algorithm, a locality-based minimum link algorithm with replication, a target address hashing algorithm, and a source address hashing algorithm; selecting the at least one target computing device from the plurality of computing devices according to the secondary scheduling algorithm.

Optionally, the server further includes a detecting unit 306, configured to wait for a first preset duration, detect whether each target computing device in the at least one target computing device obtains a corresponding final operation result, and if not, use the target computing device that does not obtain the final operation result as a delay computing device; selecting, by the scheduling unit 302, a spare computing device corresponding to the operation request corresponding to the delayed computing device from among the idle computing devices of the plurality of computing devices; the arithmetic unit 303 executes an arithmetic operation of an arithmetic request corresponding to the delay calculation means based on the spare calculation means.

Optionally, the obtaining unit 305 is further configured to obtain a final operation result obtained first between the delay calculating device and the standby calculating device; a pause instruction is sent by the sending unit 304 to a computing device between the delay computing device and the standby computing device that does not obtain the final operation result.

Optionally, the detecting unit 306 is further configured to wait for a second preset duration, detect whether the delay calculating device obtains a corresponding final operation result, and if not, take the delay calculating device that does not return the final operation result as a fault calculating device; and sending a fault instruction by the sending unit 304, wherein the fault instruction is used for informing operation and maintenance personnel that the fault calculation device has a fault, and the second preset time length is longer than the first preset time length.

Optionally, the server further includes an updating unit 307, configured to update the hash table of the server every target time threshold.

Optionally, the obtaining unit 305 is further configured to obtain an operation requirement of each specified neural network model in the specified neural network model set and a hardware attribute of each computing device in the plurality of computing devices to obtain a plurality of operation requirements and a plurality of hardware attributes;

the server further includes a deployment unit 308 configured to deploy a corresponding designated neural network model on a designated computing device corresponding to each designated neural network model in the designated neural network model set according to the plurality of operation requirements and the plurality of hardware attributes.

Optionally, the computing device comprises at least one computing carrier comprising at least one computing unit.

Optionally, the computing device includes at least one computing carrier, where the computing carrier includes at least one computing unit, and the computing unit performs an operation on input data and weight data of one or more layers in the trained multi-layer neural network, or performs an operation on input data and weight data of one or more layers in the multi-layer neural network that performs a forward operation, where the operation includes: one or any combination of convolution operation, matrix multiplication matrix operation, matrix multiplication vector operation, bias operation, full connection operation, GEMM operation, GEMV operation and activation operation.

Optionally, the computing unit includes: main processing circuit, branch processing circuit and basic processing circuit, main processing circuit is connected with branch processing circuit, basic processing circuit is connected with branch processing circuit, wherein:

the main processing circuit is used for acquiring data except the computing unit, dividing the data into broadcast data and distribution data, sending the broadcast data to all branch processing circuits in a broadcast mode, and selectively distributing the distribution data to different branch processing circuits;

the branch processing circuitry to forward data between the main processing circuitry and the base processing circuitry;

the basic processing circuit is used for receiving the broadcast data and the distribution data forwarded by the branch processing circuit, performing operation on the broadcast data and the distribution data to obtain an operation result, and sending the operation result to the branch processing circuit;

the main processing circuit is further configured to receive the and operation result of the basic processing circuit forwarded by the branch processing circuit, and process the operation result to obtain a calculation result.

Optionally, the main processing circuit is specifically configured to send the broadcast data to all branch processing circuits in one broadcast or multiple broadcasts.

Optionally, the basic processing circuit is specifically configured to perform inner product operation, or vector operation on the broadcast data and the distribution data to obtain an operation result.

In one embodiment, as shown in fig. 4, the present application discloses another server 400 comprising a processor 401, a memory 402, a communication interface 403, and one or more programs 404, wherein the one or more programs 404 are stored in the memory 402 and configured to be executed by the processor, and the program 404 comprises instructions for performing some or all of the steps described in the scheduling method.

In another embodiment of the present invention, a computer-readable storage medium is provided, which stores a computer program comprising program instructions, which when executed by a processor, cause the processor to perform the implementation described in the scheduling method.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the terminal and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed terminal and method can be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It is to be noted that, in the attached drawings or in the description, the implementation modes not shown or described are all the modes known by the ordinary skilled person in the field of technology, and are not described in detail. Further, the above definitions of the various elements and methods are not limited to the various specific structures, shapes or arrangements of parts mentioned in the examples, which may be easily modified or substituted by those of ordinary skill in the art.

The above embodiments are further described in detail for the purpose of illustrating the invention, and it should be understood that the above embodiments are only for illustrative purposes and are not to be construed as limiting the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present application.

Claims

1. A method of scheduling, the method being based on a server comprising a plurality of computing devices, the method comprising:

receiving M operation requests, wherein the M operation requests comprise at least one serial instruction, and M is a positive integer;

selecting at least one target computing device corresponding to the M operation requests from the plurality of computing devices based on the corresponding relationship between the operation requests and the target computing devices, wherein the target computing devices comprise computing devices corresponding to the serial instructions, and the corresponding relationship between the operation requests and the target computing devices is the corresponding relationship between a neural network model deployed in the target computing devices and the operation requests or the corresponding relationship between the target computing devices and attribute information of the operation requests;

obtaining a plurality of intermediate operation results of each operation request based on the operation of each target computing device executing the corresponding operation request in the at least one target computing device, and obtaining M final operation results corresponding to the M operation requests according to the plurality of intermediate operation results;

sending each final operation result in the M final operation results to corresponding electronic equipment;

the selecting, from the plurality of computing devices, at least one target computing device corresponding to the M operation requests based on a correspondence between the operation requests and the target computing devices, includes:

if M is larger than 1 and the M operation requests correspond to a plurality of target neural network models, selecting a plurality of target computing devices corresponding to each target neural network model in the plurality of target neural network models from the plurality of computing devices, wherein the serial instruction executed corresponding to each target neural network model in the plurality of target neural network models is the operation request corresponding to the target neural network model corresponding to the target computing device.

2. The method of claim 1, further comprising:

waiting for a first preset time length, detecting whether each target computing device in the at least one target computing device obtains a corresponding final computing result, and if not, taking the target computing device which does not obtain the final computing result as a delay computing device;

selecting a standby computing device corresponding to the operation request corresponding to the delay computing device from the idle computing devices of the plurality of computing devices;

and executing the operation of the operation request corresponding to the delay calculation device based on the standby calculation device.

3. The method of claim 2, wherein after the executing of the operation request corresponding to the deferred computing device based on the standby computing device, the method further comprises:

obtaining a final operation result obtained firstly between the delay calculation device and the standby calculation device;

and sending a pause instruction to a computing device which does not obtain a final operation result between the delay computing device and the standby computing device.

4. The method of claim 3, further comprising:

waiting for a second preset time, detecting whether the delay calculation device obtains a corresponding final operation result, if not, taking the delay calculation device as a fault calculation device, and sending a fault instruction, wherein the fault instruction is used for informing operation and maintenance personnel that the fault calculation device breaks down, and the second preset time is longer than the first preset time.

5. The method of claim 1, further comprising:

and updating the hash table of the server every target time threshold.

6. The method of claim 1, further comprising:

acquiring the operation requirement of each appointed neural network model in an appointed neural network model set and the hardware attribute of each computing device in the plurality of computing devices to obtain a plurality of operation requirements and a plurality of hardware attributes;

and deploying the corresponding appointed neural network model on the appointed computing device corresponding to each appointed neural network model in the appointed neural network model set according to the plurality of operation requirements and the plurality of hardware attributes.

7. The method according to any one of claims 3 to 6, wherein the computing device comprises at least one computing carrier, the computing carrier comprises at least one computing unit, and the computing unit performs operations on the input data and the weight data of one or more layers of the trained multi-layer neural network or performs operations on the input data and the weight data of one or more layers of the multi-layer neural network which performs forward operations, and the operations comprise: one or any combination of convolution operation, matrix multiplication matrix operation, matrix multiplication vector operation, bias operation, full connection operation, GEMM operation, GEMV operation and activation operation.

8. The method of claim 7, wherein the computing unit comprises: main processing circuit, branch processing circuit and basic processing circuit, main processing circuit is connected with branch processing circuit, basic processing circuit is connected with branch processing circuit, wherein:

the main processing circuit acquires data except the computing unit, divides the data into broadcast data and distribution data, sends the broadcast data to all branch processing circuits in a broadcast mode, and selectively distributes the distribution data to different branch processing circuits;

the branch processing circuitry forwards data between the main processing circuitry and the base processing circuitry;

the basic processing circuit receives the broadcast data and the distribution data forwarded by the branch processing circuit, performs operation on the broadcast data and the distribution data to obtain an operation result, and sends the operation result to the branch processing circuit;

and the main processing circuit receives the AND operation result of the basic processing circuit forwarded by the branch processing circuit, and processes the operation result to obtain a calculation result.

9. The method of claim 8, wherein the main processing circuit sends the broadcast data to all branch processing circuits in a broadcast manner, comprising:

the main processing circuit transmits the broadcast data to all branch processing circuits in one broadcast or multiple broadcasts.

10. The method of claim 8, wherein the base processing circuit performs operations on the broadcast data and the distribution data to obtain operation results, comprising:

the basic processing circuit executes inner product operation, product operation or vector operation on the broadcast data and the distribution data to obtain an operation result.

11. A server, comprising a plurality of computing devices, the server further comprising: means for performing the method of any of claims 1-10.

12. A server, comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps of the method of any of claims 1-10.

13. A computer-readable storage medium, having stored thereon a computer program comprising program instructions, which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-10.