[go: up one dir, main page]

WO2025055495A1 - Memory management method and apparatus for neural network model, device, medium and product - Google Patents

Memory management method and apparatus for neural network model, device, medium and product Download PDF

Info

Publication number
WO2025055495A1
WO2025055495A1 PCT/CN2024/103342 CN2024103342W WO2025055495A1 WO 2025055495 A1 WO2025055495 A1 WO 2025055495A1 CN 2024103342 W CN2024103342 W CN 2024103342W WO 2025055495 A1 WO2025055495 A1 WO 2025055495A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory block
size
tensor
memory
allocated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/103342
Other languages
French (fr)
Chinese (zh)
Inventor
周刘成
蒋荣琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of WO2025055495A1 publication Critical patent/WO2025055495A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • a 100M memory block can be allocated to the neural network, and then when the neural network model applies for a 10M memory block, it can be determined whether the 10M memory block application can reuse the above-mentioned allocated 100M memory block. If it can, no new memory block is allocated for the applied 10M memory block, and the 10M memory block application reuses the above-mentioned 100M memory block.
  • the neural network model applies for a 50M memory block, it also determines whether the 50M memory block application can reuse the above-mentioned allocated 100M memory block. If it can be reused, the 50M memory block is allocated to reuse the above-mentioned allocated 100M memory block. Otherwise, a new 50M memory block is allocated for the 50M memory block application.
  • the present application provides a memory management method, device, equipment, medium and product for a neural network model, and the technical solution is described as follows.
  • a memory management method for a neural network model is provided, the method being executed by a computer device and comprising the following steps.
  • a computation graph corresponding to a neural network model is obtained, wherein the computation graph includes at least two network layer operators, and the network layer operators are used to represent network layers in the neural network model.
  • a memory size to be allocated to the network layer operator is determined, where the memory size is used to represent the memory size that the network layer operator needs to occupy when the neural network model is running.
  • An allocated memory block matching the memory size is obtained from a free memory block list, and the allocated memory block is allocated to the network layer operator.
  • the free memory block list is used to store free memory blocks that have been allocated but released, and the allocated memory blocks refer to memory blocks allocated to the network layer operator for storing data.
  • a memory management device for a neural network model includes the following steps.
  • a determination module is used to determine the memory size to be allocated to the network layer operator based on the calculation graph, and the memory size is used to represent the memory size that the network layer operator needs to occupy when the neural network model is running.
  • the allocation module is used to obtain an allocated memory block matching the memory size from a free memory block list, and allocate the allocated memory block to the network layer operator.
  • the free memory block list is used to store free memory blocks that have been allocated but released, and the allocated memory blocks refer to memory blocks allocated to the network layer operator for storing data.
  • a computer device which includes: a processor and a memory, wherein at least one computer program is stored in the memory, and the at least one computer program is loaded and executed by the processor to implement the memory management method of the neural network model as described above.
  • a computer storage medium in which at least one computer program is stored.
  • the at least one computer program is loaded and executed by a processor to implement the memory management method of the neural network model as described above.
  • a computer program product which includes a computer program stored in a computer-readable storage medium; the computer program is read and executed from the computer-readable storage medium by a processor of a computer device, so that the computer device executes the memory management method of the neural network model as described above.
  • This application uses the free memory blocks in the free memory block list to allocate the allocated memory block matching the memory size to the network layer operator, thereby reducing the memory allocated during the operation of the neural network model and improving the memory utilization.
  • the computer device can also achieve a neural network model running effect that is similar or the same as that with a larger memory configuration, which helps to reduce the hardware requirements of the sports neural network model for the computer device.
  • FIG1 is a schematic diagram of a memory management method for a neural network model provided by an exemplary embodiment of the present application
  • FIG2 is a schematic diagram of the architecture of a computer system provided by an exemplary embodiment of the present application.
  • FIG3 is a flow chart of a memory management method for a neural network model provided by an exemplary embodiment of the present application
  • FIG4 is a flow chart of another memory management method of a neural network model provided by an exemplary embodiment of the present application.
  • FIG5 is a schematic diagram of a calculation graph provided by an exemplary embodiment of the present application.
  • FIG6 is a schematic diagram of a method for determining an allocated memory block provided by an exemplary embodiment of the present application.
  • FIG7 is a schematic diagram of another method for determining an allocated memory block provided by an exemplary embodiment of the present application.
  • FIG8 is a schematic diagram of unallocated memory provided by an exemplary embodiment of the present application.
  • FIG9 is a schematic diagram of releasing a memory block provided by an exemplary embodiment of the present application.
  • FIG10 is a schematic diagram of reshaping performed by a shape reshaping operator provided by an exemplary embodiment of the present application.
  • FIG11 is a schematic diagram of splicing performed by a splicing operator provided by an exemplary embodiment of the present application.
  • FIG12 is a schematic diagram of splitting performed by a splitting operator provided by an exemplary embodiment of the present application.
  • FIG13 is a schematic diagram of another splicing operator for splicing provided by an exemplary embodiment of the present application.
  • FIG14 is a schematic diagram of another splitting operator performing splitting provided by an exemplary embodiment of the present application.
  • FIG15 is a flowchart of a memory management method of another neural network model provided by an exemplary embodiment of the present application.
  • FIG16 is a structural diagram of an AI chip provided by an exemplary embodiment of the present application.
  • FIG17 is a block diagram of a memory management device for a neural network model provided by an exemplary embodiment of the present application.
  • FIG. 18 is a schematic diagram of the structure of a computer device provided by an exemplary embodiment of the present application.
  • a computational graph in a broad sense is a directed graph used to represent the computational relationship between input, output, and intermediate variables, where each node in the graph represents a mathematical operation.
  • the computational graph of the neural network model in the embodiment of the present application is used to characterize the execution order between different network layers and the data flow between network layers during the calculation process of the neural network model.
  • the computational graph also includes the life cycle of the input and output tensors of each network layer.
  • the computational graph of a neural network model consists of network layer operators and the edges between network layer operators.
  • the network layer operators correspond to the network layers in the neural network model and contain the input and output tensor sizes of the network layer operators; the edges between network layer operators are used to represent the data flow between network layers.
  • the life cycle in the embodiments of the present application refers to the life cycle of a tensor, which can be an input or output tensor of a network layer.
  • the tensor is used by the current network layer, and accordingly, the memory block storing the tensor is occupied; at the end of the life cycle, the tensor will not be used by other network layers, and the memory block storing the tensor will be released.
  • Free Memory Block List & Allocated Memory Block List The free memory block list is used to record the list of free memory blocks that have been allocated but released.
  • the allocated memory block list is used to record the list of memory blocks that have been allocated and occupied. In some embodiments, both lists are maintained by the memory management unit.
  • Data processing layer operator refers to the operator corresponding to the network layer in the computational graph that is used to adjust the data format in the neural network model but does not change the data content.
  • the data format adjustment includes adjusting the dimension of the tensor, such as splitting a multi-channel three-dimensional tensor into multiple single-channel two-dimensional tensors, or splicing multiple single-channel two-dimensional tensors into a single-channel two-dimensional tensor, or combining multiple single-channel two-dimensional tensors into a multi-channel three-dimensional tensor.
  • An embodiment of the present application provides a schematic diagram of a memory management method for a neural network model, as shown in Figure 1.
  • the method can be executed by a computer device, which can be a terminal or a server.
  • the method can be executed by a memory management unit (MMU) in the computer device.
  • MMU memory management unit
  • the computer device obtains a computational graph 10 corresponding to the neural network model; based on the computational graph 10, the computer device determines the memory size to be allocated to the network layer operator 40; the computer device obtains an allocated memory block matching the memory size from the free memory block list 20, and allocates the allocated memory block to the network layer operator 40.
  • the memory management unit obtains a calculation graph 10 corresponding to the neural network model; based on the calculation graph 10, the memory management unit determines the memory size to be allocated to the network layer operator 40; the memory management unit obtains an allocated memory block that matches the memory size from the free memory block list 20, and allocates the allocated memory block to the network layer operator 40.
  • the computational graph 10 is used to represent the computational process of the neural network model.
  • the computation graph 10 includes at least two network layer operators 40 and edges 50 between at least two network layer operators, the network layer operators 40 are used to represent network layers in the neural network model, and the edges 50 are used to represent data flow between network layers.
  • the computer device obtains the computational graph 10 corresponding to the neural network model.
  • the computational graph 10 includes at least three network layer operators 40, namely: network layer operator G0, network layer operator G1 and network layer operator G2.
  • the operation order of the network layer operators 40 is network layer operator G0 ⁇ network layer operator G1 ⁇ network layer operator G2.
  • the memory size to be allocated to the network layer operator G0 is 16M, that is, the network layer operator G0 needs to occupy 16M when running.
  • the memory size to be allocated to the network layer operator G1 is 10M, that is, the network layer operator G1 needs to occupy 10M of memory during runtime (used to store the input and output tensors of the network layer corresponding to the network layer operator G1); the memory size to be allocated to the network layer operator G2 is 5M, that is, the network layer operator G2 needs to occupy 5M of memory during runtime (used to store the input and output tensors of the network layer corresponding to the network layer operator G2).
  • the memory size is used to indicate the memory size that the network layer operator 40 needs to occupy when the neural network model is running.
  • the free memory block list 20 is used to store free memory blocks that have been allocated but released.
  • the allocated memory block refers to a memory block allocated to the network layer operator 40 for storing data.
  • the input tensor is used to represent the multi-dimensional array input to the network layer operator 40.
  • the tensor After the current network layer operator is executed and the tensor is not called by other network layer operators, the tensor ends its life cycle and the memory block occupied by the tensor can be released.
  • the method for determining the allocated memory block includes at least one of the following methods, but is not limited thereto.
  • unallocated memory refers to the memory in the storage space that has not been allocated or occupied.
  • the tensor size threshold refers to the size of the smallest memory block that can be reused.
  • the tensor size threshold may adopt at least one of a custom value and a default value, but is not limited to this, and the embodiments of the present application do not make specific limitations on this.
  • an allocated memory block matching the size of the input tensor is directly obtained from the unallocated memory, thereby avoiding memory fragmentation.
  • the first free memory block is directly allocated as an allocated memory block to the corresponding network layer operator 40 for storing the input tensor.
  • the free memory block list 20 includes a second free memory block that is larger than the size of the input tensor, a third memory block that matches the size of the input tensor is divided from the second free memory block, and the third memory block is allocated as an allocated memory block to the corresponding network layer operator 20 for storing the input tensor.
  • an allocated memory block that matches the size (10M) of the input tensor corresponding to the network layer operator G1 is obtained from the free memory block list 20, that is, the allocated memory block corresponding to the network layer operator G1 obtained is (10M).
  • the allocated memory block that matches the size of the input tensor corresponding to the network layer operator G2 (5M) is obtained from the free memory block list 20, that is, the allocated memory block corresponding to the obtained network layer operator G2 is (5M).
  • the remaining 1M memory block that is reused is recorded in the free memory block list 20, and the allocated memory block (10M) of the network layer operator G1 and the allocated memory block (5M) of the network layer operator G2 are recorded in the allocated memory block list 30.
  • the fourth free memory block is merged with the merged memory block to obtain an allocated memory block.
  • the merged memory block is a memory block divided from the unallocated memory, and the size of the merged memory block is the difference between the size of the input tensor and the size of the fourth free memory block.
  • the size of the current input tensor is 10MB, and there are two free memory blocks of 2MB and 4MB in the free memory block list, among which the 4MB free memory block is at the end of the free memory block list. Then, the 4MB free memory block is taken out and a 6MB memory block is divided from the unallocated memory to merge, and a 10MB memory block is generated as an allocated memory block and allocated to the corresponding network layer operator 40 for storing the input tensor.
  • the method provided in this embodiment obtains the calculation graph corresponding to the neural network model; based on the calculation graph, determines the memory size to be allocated to the network layer operator; based on the memory size, obtains the allocated memory block that matches the memory size from the free memory blocks in the free memory block list, and allocates the allocated memory block to the network layer operator.
  • This application uses the free memory blocks in the free memory block list to allocate the allocated memory blocks that match the memory size to the network layer operator, thereby reducing the memory allocated by the neural network model during operation and improving memory utilization.
  • Fig. 2 shows a schematic diagram of the architecture of a computer system provided by an embodiment of the present application.
  • the computer system may include: a terminal 100 and a server 200.
  • the terminal 100 may be an electronic device such as a mobile phone, a tablet computer, a vehicle terminal (vehicle computer), a wearable device, a personal computer (PC), a vehicle terminal, an aircraft, an unmanned vending terminal, etc.
  • the terminal 100 may be installed with a client that runs a target application, and the target application may be an application for memory management of a reference neural network model, or may be other applications that provide a memory management function of a neural network model, and this application does not limit this.
  • this application does not limit the form of the target application, including but not limited to an application (Application, App) installed in the terminal 100, a mini-program, etc., and may also be in the form of a web page.
  • Server 200 may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides cloud computing services, cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), and cloud servers for basic cloud computing services such as big data and artificial palm image recognition platforms.
  • Server 200 may be a backend server of the above-mentioned target application, used to provide backend services for the client of the target application.
  • Cloud technology refers to a hosting technology that unifies hardware, software, network and other resources within a wide area network or local area network to achieve data computing, storage, processing and sharing.
  • Cloud computing technology will become an important support.
  • the background services of technical network systems require a large amount of computing and storage resources, such as video websites, picture websites and more portal websites.
  • each item may have its own identification mark, which needs to be transmitted to the background system for logical processing. Data of different levels will be processed separately. All kinds of industry data need strong system backing support, which can only be achieved through cloud computing.
  • the above-mentioned server can also be implemented as a node in a blockchain system.
  • Blockchain is a new application mode of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanism, encryption algorithm, etc.
  • Blockchain is essentially a decentralized database, a string of data blocks generated by cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify the validity of its information (anti-counterfeiting) and generate the next block.
  • Blockchain can include the underlying blockchain platform, the platform product service layer, and the application service layer.
  • the terminal 100 and the server 200 may communicate with each other via a network, such as a wired or wireless network.
  • the executor of each step can be a computer device, and a computer device refers to an electronic device with data calculation, processing and storage capabilities.
  • the computer device is a device having the requirements for running a neural network model.
  • the computer device is provided with an AI chip, which includes an AI processor, a memory, and a memory management unit, wherein the memory management unit is used to allocate memory blocks to the network layers in the running neural network model.
  • the memory management unit is used to implement the memory management method of the neural network model described in various embodiments of the present application.
  • the memory management method of the neural network model can be executed by the terminal 100 (such as the client of the target application installed and running in the terminal 100 executes the memory management method of the neural network model), or the memory management method of the neural network model can be executed by the server 200, or the terminal 100 and the server 200 can interact and cooperate to execute it, and this application does not limit this.
  • FIG3 is a flow chart of a memory management method for a neural network model provided by an exemplary embodiment of the present application.
  • the method can be executed by a computer device, which can be a terminal or a server.
  • the method includes the following steps.
  • Step 302 Obtain a computational graph corresponding to the neural network model.
  • the computational graph is used to represent the computational process of the neural network model.
  • the computational graph includes at least two network layer operators and edges between at least two network layer operators, the network layer operators correspond to network layers in the neural network model, one network layer operator corresponds to one network layer, and the edges are used to indicate the data flow between network layers.
  • the neural network model includes at least one of a deep learning neural network model (Deep Neural Network, DNN), a convolutional neural network model (Convolutional Neural Network, CNN), an extreme learning machine model (Extreme Learning Machine, ELM) or other neural network models, but is not limited to this, and the embodiments of the present application do not make specific limitations on this.
  • a deep learning neural network model (Deep Neural Network, DNN)
  • a convolutional neural network model Convolutional Neural Network, CNN
  • ELM Extrem Learning Machine model
  • other neural network models but is not limited to this, and the embodiments of the present application do not make specific limitations on this.
  • the computation graph is determined by a computer device by analyzing the structure of the neural network model.
  • the computation graph is constructed in real time during the forward propagation of the neural network model.
  • the calculation graph is generated in advance and input into the computer device together with the neural network model.
  • the calculation graph can be generated manually or by a program.
  • Step 304 Based on the computation graph, determine the memory size to be allocated to the network layer operator.
  • Memory size is used to indicate the memory size that the network layer operator needs to occupy when the neural network model is running.
  • the computer device determines the memory size to be allocated to the network layer operator based on the computation graph.
  • the memory size includes the size of the input tensor and/or the size of the output tensor.
  • Input tensors are used to represent multidimensional arrays that are input to network layer operators.
  • Output tensors are multidimensional arrays representing the output from network layer operators.
  • Step 306 Obtain an allocated memory block that matches the memory size from the free memory block list, and allocate the allocated memory block to the network layer operator.
  • a free memory block is a memory block that has been allocated but released.
  • the free memory block list is used to store free memory blocks that have been allocated but released. That is, the free memory blocks are not currently occupied by the network layer.
  • Allocated memory blocks refer to memory blocks allocated to network layer operators for storing data.
  • the computer device adjusts the free memory blocks in the free memory block list by reusing the free memory blocks in the free memory block list to obtain allocated memory blocks that match the memory size.
  • the method provided in this embodiment obtains the calculation graph corresponding to the neural network model; based on the calculation graph, determines the memory size to be allocated to the network layer operator; based on the memory size, obtains the allocated memory block that matches the memory size from the free memory blocks in the free memory block list, and allocates the allocated memory block to the network layer operator.
  • This application allocates the allocated memory block that matches the memory size to the network layer operator by reusing the free memory blocks in the free memory block list, thereby reducing the memory allocated during the operation of the neural network model and improving the memory utilization. Since the memory utilization is improved, when a smaller memory is configured, the computer device can also achieve a neural network model operation effect that is similar or the same as that configured with a larger memory, which helps to reduce the hardware requirements of the motion neural network model for the computer device.
  • FIG4 is a flow chart of a memory management method for a neural network model provided by an exemplary embodiment of the present application.
  • the method can be executed by a computer device, which can be a terminal or a server.
  • the method includes the following steps.
  • Step 401 Obtain a computational graph corresponding to the neural network model.
  • a deep learning framework or a graph compilation device parses the input neural network model and generates a corresponding computational graph.
  • Step 402 Based on the computation graph, determine the memory size to be allocated to the network layer operator.
  • a tensor in a neural network model is a multidimensional array.
  • a tensor is represented by a (rank, shape, data type) triple.
  • the triple of the tensor is shown in Table 1.
  • tensor [9, 10], which represents a two-dimensional matrix with 9 rows and 10 columns.
  • a tensor is a multidimensional array.
  • the rank is used to indicate the dimension of the tensor
  • the shape is a representation of the tensor
  • the data type is used to indicate the type of element data in the tensor shape.
  • a tensor can be represented in the form of an array or a shape.
  • a tensor is represented by a shape.
  • the shape of the tensor is [], which means a scalar with dimension 0; the shape of the tensor is [10], which means a vector with dimension 1 and 10 elements; the shape of the tensor is [9, 10], which means a matrix with dimension 2, with 9 elements in the first dimension and 10 elements in the second dimension, represented as a two-dimensional matrix with 9 rows and 10 columns.
  • the number of numbers in the shape indicates the dimension of the tensor. For example, if there are 4 numbers in [D0, D1, D2, D3], it means that the tensor is a 4-dimensional tensor. The number in the shape of a tensor indicates the number of elements in that dimension.
  • the size of the tensor is finally obtained by multiplying the number of elements and the bits occupied by a single element.
  • the computer device may also determine the arrangement order corresponding to the network layer operators based on the computation graph.
  • the arrangement order is used to indicate the execution order of network layer operators when the neural network model is running.
  • the computer device obtains the arrangement order of the network layer operators; when the size of the input tensor corresponding to the network layer operator is greater than the tensor size threshold, the computer device obtains the allocated memory block matching the input tensor from the free memory block list, and allocates the allocated memory block to the network layer operator in the arrangement order for storing the input tensor.
  • the computation graph also includes the life cycle of the tensor (input or output tensor) corresponding to the network layer operator.
  • the life cycle of the tensor can be expressed in the following form: [the first network layer operator using the tensor, the last network layer operator using the tensor].
  • the computation graph includes five network layer operators, namely G0, G1, G2, G3, and G4.
  • T1 is used as an input tensor by G1 and G3, so the life cycle of T1 ends after G3 is executed. It can be seen that the life cycle of T1 is [G0, G3], and the size and life cycle of other input tensors are the same as T1.
  • the result of sorting the network layer operators in the order of execution is: [G0, G1, G2, G3, G4]. According to the sorting result, the allocated memory block is allocated to the network layer operator for storing the input tensor.
  • the computer device allocates memory blocks according to the sorting result of G0-G1-G2-G3-G4.
  • a 96B memory block one is allocated to the G0 operator to store T0. After the execution of the G0 operator is completed, the memory block one is released.
  • a 96B memory block two is allocated to the G1 operator to store T1. Since the storage period of T1 is [G0, G3], the memory block two is not released after the execution of the G1 operator is completed.
  • a 96B memory block three is allocated to the G2 operator to store T2. After the execution of the G2 operator is completed, the memory block three is released.
  • the G3 operator starts to execute. After the execution of the G3 operator is completed, the memory block two is released.
  • a 96B memory block four is allocated to the G4 operator to store T3. After the execution of the G4 operator is completed, the memory block five is released.
  • Step 403 Determine whether the size of the input tensor is less than the tensor size threshold.
  • the tensor size threshold refers to the size of the smallest memory block that can be reused.
  • the tensor size threshold may adopt at least one of a custom value and a default value, but is not limited to this, and the embodiments of the present application do not make specific limitations on this.
  • step 409 is performed; when the size of the input tensor is greater than or equal to the tensor size threshold, step 404 is performed.
  • Step 404 Determine whether there is a free memory block in the free memory block list.
  • step 405 when the free memory block list includes free memory blocks, step 405 is executed; when the free memory block list does not include free memory blocks, step 408 is executed.
  • Step 405 Determine whether the size of the free memory block is smaller than the size of the input tensor.
  • step 407 when there is at least one free memory block whose size is smaller than the size of the input tensor, step 407 is executed; when the sizes of the free memory blocks are all greater than or equal to the size of the input tensor, step 406 is executed.
  • Step 406 Get an allocated memory block that matches the size of the input tensor from the free memory block list.
  • an allocated memory block that matches the size of the input tensor is obtained from the free memory block list.
  • the first free memory block in the free memory list is used as the allocated memory block; wherein the size of the first free memory block is the same as the size of the input tensor.
  • a third memory block matching the size of the input tensor is divided from the second free memory block in the free memory block class table, and the third memory block is used as the allocated memory block; the size of the second free memory block is greater than the size of the input tensor.
  • FIG6 a schematic diagram of a method for determining an allocated memory block is shown.
  • the input tensor 601 is 2MB, and there is a 10MB free memory block in the free memory block list 602.
  • the computer device divides the 10MB free memory block into two memory blocks of 2MB and 8MB, and allocates the 2MB memory block as an allocated memory block to the corresponding network layer operator for storing the input tensor 601.
  • the 2MB memory block is placed in the allocated memory block list 603, and the 8MB memory block is placed back in the free memory block list 602.
  • Step 407 Get an allocated memory block that matches the size of the input tensor from the free memory block list and the unallocated memory.
  • Unallocated memory refers to the memory in the storage space that has not been allocated or occupied.
  • an allocated memory block matching the size of the input tensor is obtained from the free memory block list and the unallocated memory.
  • the fourth free memory block is merged with the merged memory block to obtain the allocated memory block.
  • the size of the fourth free memory block is smaller than the size of the input tensor.
  • a merged memory block is a memory block divided from unallocated memory.
  • the size of the merged memory block is the difference between the size of the input tensor and the size of the fourth free memory block.
  • the fourth free memory block in the free memory list is merged with the merged memory block in the unallocated memory to obtain an allocated memory block.
  • FIG7 a schematic diagram of a method for determining an allocated memory block is shown.
  • an input tensor 701 is 10MB, and there is a 2MB free memory block and a 4MB free memory block in a free memory block list 702.
  • the computer device takes the 4MB free memory block from the free memory block list 702, and divides it into a 6MB merged memory block from the unallocated memory.
  • the computer device merges the 4MB free memory block from the free memory block list 702 and the 6MB merged memory block from the unallocated memory, and uses the merged memory block as an allocated memory block for storing the input tensor 701. That is, as shown in (b) of FIG7 , a 10MB allocated memory block obtained by merging the 4MB memory block and the 6MB merged memory block is put into the allocated memory block list 703.
  • Step 408 A memory block matching the size of the input tensor is divided from the unallocated memory as an allocated memory block.
  • the unallocated memory includes first-level unallocated memory and second-level unallocated memory, and the allocation priority of the first-level unallocated memory is higher than the allocation priority of the second-level unallocated memory.
  • the first-level unallocated memory belongs to the first memory
  • the second-level unallocated memory belongs to the second memory
  • the storage priority of the first memory is higher than the storage priority of the second memory
  • a memory access speed of the first memory is greater than a memory access speed of the second memory.
  • the capacity of the first memory is smaller than the capacity of the second memory, and the hardware cost per unit storage space in the first memory is higher than the hardware cost per unit storage space in the second memory.
  • the first memory is an L2 cache and the second memory is an L3 cache.
  • a memory block matching the size of the input tensor is allocated from the unallocated memory as an allocated memory block.
  • the neural network model runs on an AI chip, which includes multiple processor core clusters, such as a first processor core cluster 801 and a second processor core cluster 802.
  • processor core clusters such as a first processor core cluster 801 and a second processor core cluster 802.
  • a multi-level storage architecture is usually adopted.
  • the storage level close to the processor has a larger data transmission bandwidth, but the hardware cost is higher, so the storage space is relatively limited, which is called the first-level unallocated memory 803 or L2 cache in the embodiment of the present application.
  • the storage level far from the processor has a smaller data transmission bandwidth, but the hardware cost is low and the storage space is larger, which is called the second-level unallocated memory 804 or L3 cache in the embodiment of the present application.
  • first-level unallocated memory and the second-level unallocated memory each have independent allocated memory block lists and free memory block lists, and both lists are empty in the initial state.
  • the memory blocks in the free memory block list are sorted from small to large, and the largest memory block is arranged at the end. Accordingly, the computer device can determine whether to store the free memory block according to the size of the free memory block at the end of the free memory block list. In a free memory block that is greater than or equal to the input tensor.
  • a memory block matching the size of the input tensor is allocated from the first-level unallocated memory or the second-level unallocated memory as an allocated memory block.
  • a memory block matching the size of the input tensor is divided from the first-level unallocated memory as an allocated memory block.
  • a memory block matching the size of the input tensor is divided from the second-level unallocated memory as an allocated memory block.
  • Step 409 Get an allocated memory block matching the size of the input tensor from the unallocated memory.
  • an allocated memory block matching the size of the input tensor is obtained from the unallocated memory.
  • the memory block is not reused, and the allocated memory block matching the size of the input tensor is directly obtained from the unallocated memory. If a memory block matching the size of the input tensor can be divided from the first-level unallocated memory, it is allocated from the first-level unallocated memory first. If a memory block matching the size of the input tensor cannot be divided from the first-level unallocated memory, it is allocated from the second-level unallocated memory.
  • the setting of the tensor size threshold is used to reduce the generation of memory blocks corresponding to smaller input tensors.
  • the life cycles corresponding to the input tensors and output tensors of the network layer operator are determined; in response to a memory block in the allocated memory block list reaching its life cycle, the computer device releases the memory block and places the memory block in the free memory block list.
  • the allocated memory block list is used to store occupied memory blocks.
  • the lifecycle is used to indicate the time that a tensor occupies a memory block, that is, the input tensor and output tensor are no longer used by other network layer operators after reaching their lifecycle.
  • the computer device in response to a memory block in a list of allocated memory blocks reaching its life cycle, releases a memory block; if there is a released memory block in an adjacent position of the currently released memory block, the currently released memory block is merged with the released memory block to obtain a merged released memory block; and the merged released memory block is placed in a list of free memory blocks.
  • the merged released memory block refers to a memory block obtained by merging the currently released memory block and the released memory block.
  • the life cycle of a tensor can be expressed in the following form: [the first network layer operator that uses the tensor, the last network layer operator that uses the tensor].
  • the computer device determines whether the currently running network operation layer is the last network layer operator that uses the tensor in the life cycle. If so, it is determined that the life cycle has been reached, and if not, it is determined that the life cycle has not been reached.
  • the memory blocks of size 2M and 6M (located at the end of the list) in the primary/secondary free memory block list 902 are about to be released.
  • the 2M currently released memory block is merged with the 2M and 6M released memory blocks in the primary/secondary allocated memory block list 902 to obtain a 10M merged released memory block; the 10M merged released memory block is put into the primary/secondary free memory block list 902, and only two memory blocks, 2M and 4M, are left in the primary/secondary allocated memory block list 902.
  • the network layer operators include data processing layer operators.
  • the data processing layer operator is used to adjust the data format in the neural network model.
  • the network layer corresponding to the data processing layer operator is called the data transformation layer.
  • the data processing layer operators include at least one of a reshape operator, a concatenation operator, and a split operator, but are not limited thereto, and the embodiments of the present application do not make specific limitations on this.
  • the reshape operator is used to reshape the input tensor to reshape the input tensor into a target shape, but in the process of reshaping the data, the number of elements contained in the data and the arrangement of the elements in the data are not changed.
  • the input tensor of the input reshape operator is represented in the form of a matrix.
  • the size of the matrix of the input reshape operator is [2, 3, 4] (a 3-row 4-column matrix with 2 channels), that is, the matrix of the input reshape operator is a 2 ⁇ 3 ⁇ 4 tensor
  • the size of the matrix output by the reshape operator is [6, 4] (a 6-row 4-column matrix with a single channel), that is, the reshape operator is used to transform a matrix of size [2, 3, 4] into a matrix of size [6, 4].
  • the splicing operator is used to splice at least two input tensors.
  • the split operator is used to split the input tensor according to the split dimension, splitting it into at least two sub-input tensors.
  • the computer device obtains the input tensor and output tensor corresponding to the data processing layer operator; exemplary, the computer device configures the output tensor to reuse the allocated memory block occupied by the input tensor.
  • the reshape operator is used to adjust the shape of the input tensor without changing the data in the input tensor.
  • the reshape tensor refers to the tensor output by the reshape operator.
  • the reshape operator reshapes the input tensor and does not change the data of the input tensor in the memory block. Therefore, when the neural network model is running, the reshape operator copies the memory data of the input tensor to the memory where the output tensor is located (that is, the memory where the input tensor of the next network layer is located).
  • the computer device can eliminate the data copy operation corresponding to the reshape operator when the neural network model is running by making the output tensor of the reshape operator reuse the memory block occupied by the input tensor of the reshape operator.
  • the reshape operator in this scenario does not need to perform data transfer operations, thus avoiding consumption caused by frequent memory accesses.
  • the data processing layer operator includes a concatenation operator.
  • the computer device determines an allocated memory block occupied by the output tensor; and the computer device configures at least two input tensors to offset-reuse the allocated memory block occupied by the output tensor.
  • the concatenation operator concatenates two or more input tensors according to the concatenation dimension.
  • the splicing dimension is the highest dimension or the dimension whose first number of elements is not 1.
  • the output tensor and input tensor of the splicing operator can reuse the same memory block.
  • the computer device can eliminate the data copy operation corresponding to the splicing operator when the neural network model is running by making multiple input tensors of the splicing operator reuse the memory block occupied by the output tensor of the splicing operator according to the offset.
  • tensor A [512, 32, 32]
  • tensor B [256, 32, 32]
  • tensor C [768, 32, 32] (assuming that the data types of tensor A, tensor B and tensor C are all Float 32)
  • the memory block C is divided into two sub-memory blocks A and sub-memory blocks B with sizes of 2MB and 1MB respectively according to the sizes of the input tensors A and B, and sub-memory blocks A and sub-memory blocks B are used to store tensors A and B respectively.
  • the splicing operator does not need to perform data transfer operations, avoiding consumption caused by frequent memory access.
  • the data processing layer operator includes a splitting operator, and the output tensor includes at least two sub-output tensors.
  • the computer device divides the allocated memory block occupied by the input tensor to obtain sub-memory blocks corresponding to the at least two sub-input tensors respectively; and the computer device allocates the sub-memory blocks to the at least two sub-output tensors.
  • the splitting operator is used to divide the input tensor into at least two sub-input tensors, and the sub-output tensor refers to the tensor output by the data processing layer operator.
  • the splitting operator can be understood as the inverse operation of the splicing operator.
  • the splitting operator splits the input tensor according to the splitting dimension to generate multiple output tensors. If the splitting dimension specified by the splitting operator is the highest dimension or the dimension whose first element number is not 1, the output tensor of the splitting operator reuses the memory block occupied by the input tensor.
  • the computer device can eliminate the data copy filling operation corresponding to the splitting operator when the neural network model is running by making multiple tensors of the splitting operator reuse the memory block occupied by the output tensor of the splitting operator according to the offset.
  • the method provided in this embodiment obtains the calculation graph corresponding to the neural network model; based on the calculation graph, determines the memory size to be allocated to the network layer operator; based on the memory size, obtains the allocated memory block that matches the memory size from the free memory blocks in the free memory block list, and allocates the allocated memory block to the network layer operator.
  • This application allocates the allocated memory block that matches the memory size to the network layer operator by reusing the free memory blocks in the free memory block list, thereby reducing the memory allocated by the neural network model during operation and improving memory utilization.
  • the method provided in this embodiment determines different acquisition methods by judging the size between the size of the input tensor and the tensor size threshold; based on the different acquisition methods, the allocated memory block matching the size of the input tensor is obtained from the free memory block list, thereby reducing the memory allocated during the operation of the neural network model and improving memory utilization.
  • the method provided in this embodiment reduces the memory allocated during the operation of the neural network model and improves memory utilization by combining a free memory block list and unallocated memory to obtain an allocated memory block that matches the size of the input tensor.
  • the method provided in this embodiment directly obtains an allocated memory block that matches the size of the input tensor from unallocated memory when the size of the input tensor is less than or equal to the tensor size threshold, thereby avoiding the generation of small memory blocks and improving memory utilization.
  • the method provided in this embodiment when releasing a memory block, merges the currently released memory block with the released memory block to obtain a large merged released memory block, and puts the merged released memory block into a free memory block list.
  • the free memory blocks in the free memory block list can be applied to a variety of allocation scenarios, thereby improving the allocation efficiency of the memory block.
  • the method provided in this embodiment enables the input and output of the data processing layer operators in the neural network model to reuse the same memory block, thereby reducing the data transfer overhead when the neural network model is running and improving memory utilization.
  • Figure 15 is a flow chart of a memory management method for a neural network model provided by an exemplary embodiment of the present application.
  • the method can be executed by a computer device, which can be a terminal or a server.
  • the method includes the following steps.
  • Step 1501 Get the size of the input tensor to be allocated to the current network layer operator.
  • Input tensors are used to represent multidimensional arrays that are input to network layer operators.
  • a tensor is represented by a triple of (rank, shape, node type).
  • the size of the input tensor is used to indicate the memory size that the input data of the network layer operator needs to occupy when the neural network model is running.
  • the computer device determines the size of the input tensor to be allocated to the network layer operator based on the computation graph.
  • Step 1502 Determine whether the size of the input tensor is less than the tensor size threshold.
  • the tensor size threshold refers to the size of the smallest memory block that can be reused.
  • the computer device determines whether the size of the input tensor is less than the tensor size threshold. If the size of the input tensor is less than the tensor size threshold, execute step 1508; if the size of the input tensor is greater than or equal to the tensor size threshold, execute step 1503.
  • Step 1503 Get the largest free memory block in the free memory block list.
  • the free memory block list is used to store free memory blocks that have been allocated but released.
  • the computer device obtains the largest free memory block in the free memory block list.
  • Step 1504 Determine whether the largest free memory block is greater than or equal to the size of the input tensor.
  • the computer device determines whether the largest free memory block is greater than or equal to the size of the input tensor. If the largest free memory block is greater than or equal to the size of the input tensor, step 1505 is executed; if the largest free memory block is smaller than the size of the input tensor, step 1506 is executed.
  • Step 1505 Divide the largest free memory block into two memory blocks, one matching the size of the input tensor and allocated to the network layer operator, and the other put back into the free memory block list.
  • the computer device divides the largest free memory block into two memory blocks, one of which matches the size of the input tensor to obtain an allocated memory block that matches the size of the input tensor, and allocates the allocated memory block to the network layer operator; the other is put back into the free memory block list for next use.
  • Step 1506 Determine whether the largest free memory block is at the end.
  • the free memory blocks in the free memory block list are arranged in order from small to large.
  • the largest free memory block is smaller than the size of the input tensor, determine whether the largest free memory block is at the end, that is, determine whether there are free memory blocks in the free memory block list; when the largest free memory block is at the end, execute step 1507; when the largest free memory block is not at the end, execute step 1508.
  • Step 1507 Take out the largest free memory block, divide it from the unallocated memory to obtain a merged memory block, merge the largest free memory block and the merged memory block, and allocate them to the network layer operator.
  • the largest free memory block is taken out and divided from the unallocated memory to obtain a merged memory block, and the largest free memory block and the merged memory block are merged and allocated to the network layer operator.
  • Step 1508 Determine whether the remaining memory in the first-level unallocated memory is greater than/equal to the size of the input tensor.
  • step 1509 is executed; when the remaining memory in the first-level unallocated memory is less than the size of the input tensor, step 1510 is executed.
  • Step 1509 Divide a memory block that matches the size of the input tensor from the first-level unallocated memory as an allocated memory block, and allocate it to the network layer operator.
  • a memory block matching the size of the input tensor is directly divided from the first-level unallocated memory as an allocated memory block and allocated to the network layer operator.
  • Step 1510 Divide a memory block that matches the size of the input tensor from the secondary unallocated memory as an allocated memory block, and allocate it to the network layer operator.
  • a memory block matching the size of the input tensor is divided from the second-level unallocated memory as an allocated memory block and allocated to the network layer operator.
  • FIG16 is a structural diagram of an AI chip provided by an exemplary embodiment of the present application.
  • the AI chip includes an AI processor 1601 , a memory 1602 , and a memory management unit 1603 .
  • the AI processor 1601 is used to run the neural network model.
  • the AI processor 1601 may include multiple processor core clusters, each of which includes multiple processor cores.
  • the memory 1602 may be composed of multiple levels of cache.
  • the memory 1602 may be composed of L2 and L3 caches.
  • the memory management unit 1603 is used to allocate memory blocks to the network layers in the running neural network model.
  • the memory management unit 1603 is used to implement the memory management method of the neural network model described in various embodiments of the present application.
  • Fig. 17 shows a schematic diagram of the structure of a memory management device for a neural network model provided by an exemplary embodiment of the present application.
  • the device includes the following modules.
  • the acquisition module 1701 is used to obtain a calculation graph corresponding to the neural network model, wherein the calculation graph includes at least two network layer operators, and the network layer operators are used to represent the network layers in the neural network model.
  • Determination module 1702 is used to determine the memory size to be allocated to the network layer operator based on the calculation graph, and the memory size is used to represent the memory size that the network layer operator needs to occupy when the neural network model is running.
  • the allocation module 1703 is used to obtain an allocated memory block matching the memory size from the free memory block list, and allocate the allocated memory block to the network layer operator.
  • the free memory block list is used to store free memory blocks that have been allocated but released, and the allocated memory blocks refer to memory blocks allocated to the network layer operator for storing data.
  • the acquisition module 1701 is used to obtain the arrangement order of the network layer operators, and the arrangement order is used to represent the execution order of the network layer operators when the neural network model is running.
  • the allocation module 1703 is used to obtain the allocated memory block matching the size of the input tensor from the free memory block list when the size of the input tensor of the network layer operator is greater than a tensor size threshold, and the tensor size threshold is the minimum memory block size for memory reuse.
  • the allocation module 1703 is used to allocate the allocated memory blocks to the network layer operators for storing the input tensors in accordance with the arrangement order.
  • the input tensor refers to a multidimensional array input into the network layer operator.
  • the allocation module 1703 is used to obtain the allocated memory block that matches the size of the input tensor from the free memory block list when the size of the input tensor is greater than the tensor size threshold and there is at least one free memory block in the free memory block list whose size is greater than or equal to the size of the input tensor.
  • the allocation module 1703 is used to use the first free memory block in the free memory block list as the allocated memory block when the size of the input tensor is greater than the tensor size threshold.
  • the allocation module 1703 is used to divide a third memory block matching the size of the input tensor from the second free memory block in the free memory block list when the size of the input tensor is greater than the tensor size threshold, and use the third memory block as the allocated memory block.
  • the size of the first free memory block is the same as the size of the input tensor, and the size of the second free memory block is larger than the size of the input tensor.
  • the allocation module 1703 is used to obtain the allocated memory block matching the size of the input tensor from the free memory block list and the unallocated memory when the size of the input tensor is greater than the tensor size threshold and the size of the free memory block in the free memory block list is smaller than the size of the input tensor.
  • the unallocated memory refers to the memory in the storage space that has not been allocated or occupied.
  • the allocation module 1703 is used to merge the fourth free memory block in the free memory block list with the merged memory block in the unallocated memory to obtain the allocated memory block when the size of the input tensor is greater than the tensor size threshold.
  • the size of the fourth free memory block is smaller than the size of the input tensor
  • the merged memory block is a memory block obtained by dividing the unallocated memory
  • the size of the merged memory block is the sum of the size of the input tensor and the size of the fourth free memory block. The difference in the sizes of free memory blocks.
  • the allocation module 1703 is used to allocate a memory block matching the size of the input tensor from the unallocated memory as the allocated memory block when the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list.
  • the unallocated memory includes first-level unallocated memory and second-level unallocated memory, and the allocation priority of the first-level unallocated memory is higher than the allocation priority of the second-level unallocated memory.
  • the allocation module 1703 is used to allocate a memory block matching the size of the input tensor from the first-level unallocated memory or the second-level unallocated memory as the allocated memory block when the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list.
  • the allocation module 1703 is used to allocate a memory block matching the size of the input tensor from the first-level unallocated memory as the allocated memory block when the size of the input tensor is greater than the tensor size threshold, there are no free memory blocks in the free memory block list, and the remaining memory in the first-level unallocated memory is greater than or equal to the size of the input tensor.
  • the allocation module 1703 is used to allocate a memory block matching the size of the input tensor from the secondary unallocated memory as the allocated memory block when the size of the input tensor is greater than the tensor size threshold, there is no free memory block in the free memory block list, and the remaining memory in the first-level unallocated memory is less than the size of the input tensor.
  • the allocation module 1703 is used to obtain the allocated memory block matching the size of the input tensor from the unallocated memory when the size of the input tensor of the network layer operator is less than or equal to the tensor size threshold, wherein the unallocated memory refers to the memory in the storage space that has not been allocated and occupied.
  • the determination module 1702 is used to determine the life cycle corresponding to the input tensor and the output tensor of the network layer operator based on the computation graph, wherein the input tensor and the output tensor are no longer used by other network layer operators after reaching the life cycle.
  • the apparatus further comprises a release module 1704, and the release module 1704 is used for releasing the memory block in response to the memory block in the allocated memory block list reaching the storage period, and putting the memory block into the free memory block list.
  • the allocated memory block list is used to store occupied memory blocks.
  • the release module 1704 is used to release the memory block in response to the memory block in the allocated memory block list reaching the life cycle.
  • the device further includes a merging module 1705, which is used to merge the currently released memory block with the released memory block to obtain a merged released memory block when there is a released memory block adjacent to the currently released memory block.
  • a merging module 1705 which is used to merge the currently released memory block with the released memory block to obtain a merged released memory block when there is a released memory block adjacent to the currently released memory block.
  • the merging module 1705 is used to put the merged released memory block into the free memory block list.
  • the acquisition module 1701 is used to obtain the input tensor and output tensor corresponding to the data processing layer operator.
  • the apparatus further comprises a multiplexing module 1706, the multiplexing module 1706 being configured to configure the output tensor to reuse the allocated memory block occupied by the input tensor.
  • the multiplexing module 1706 is configured to allocate the allocated memory block occupied by the input tensor to the reshaped tensor based on the allocated memory block occupied by the input tensor.
  • the reshape operator is used to adjust the shape of the input tensor but does not change the data in the input tensor, and the reshape tensor refers to the tensor output by the reshape operator.
  • the multiplexing module 1706 is used to divide the allocated memory block occupied by the input tensor to obtain sub-memory blocks corresponding to each of the at least two sub-input tensors; and allocate the sub-memory blocks to the at least two sub-output tensors.
  • the splitting operator is used to split the input tensor into at least two sub-input tensors, and the sub-output tensors Refers to the tensor output by the operator of the data processing layer.
  • the multiplexing module 1706 is used to determine the allocated memory block occupied by the output tensor; configure at least two of the input tensors to offset multiplex the allocated memory block occupied by the output tensor;
  • the concatenation operator is used to concatenate at least two of the input tensors.
  • FIG18 shows a block diagram of a computer device 1800 shown in an exemplary embodiment of the present application.
  • the computer device can be implemented as a server in the above-mentioned solution of the present application.
  • the computer device 1800 includes a central processing unit (CPU) 1801, a system memory 1804 including a random access memory (RAM) 1802 and a read-only memory (ROM) 1803, and a system bus 1805 connecting the system memory 1804 and the central processing unit 1801.
  • the computer device 1800 also includes a large-capacity storage device 1806 for storing an operating system 1809, an application program 1810, and other program modules 1811.
  • the mass storage device 1806 is connected to the central processing unit 1801 through a mass storage controller (not shown) connected to the system bus 1805.
  • the mass storage device 1806 and its associated computer readable medium provide non-volatile storage for the computer device 1800. That is, the mass storage device 1806 may include a computer readable medium (not shown) such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.
  • a computer readable medium such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.
  • the computer-readable medium may include computer storage media and communication media.
  • Computer storage media include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information such as computer-readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, Erasable Programmable Read Only Memory (EPROM), Electronically-Erasable Programmable Read-Only Memory (EEPROM) flash memory or other solid-state storage technology, CD-ROM, Digital Versatile Disc (DVD) or other optical storage, tape cassettes, magnetic tapes, disk storage or other magnetic storage devices.
  • EPROM Erasable Programmable Read Only Memory
  • EEPROM Electronically-Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc
  • DVD Digital Versatile Disc
  • tape cassettes magnetic tapes
  • disk storage disk storage or other magnetic storage devices.
  • the memory includes a first-level unallocated memory (not shown) and a second-level unallocated memory (not shown).
  • the central processing unit 1801 usually adopts a multi-level storage architecture.
  • the first-level unallocated memory is a storage layer close to the processor and has a larger data transmission bandwidth, but the hardware cost is higher, so the storage space is relatively limited;
  • the second-level unallocated memory is a storage layer far from the processor and has a smaller data transmission bandwidth, but the hardware cost is low and the storage space is larger.
  • the computer device 1800 can also be connected to a remote computer on the network through a network such as the Internet. That is, the computer device 1800 can be connected to the network 1808 through the network interface unit 1807 connected to the system bus 1805, or the network interface unit 1807 can be used to connect to other types of networks or remote computer systems (not shown).
  • the memory also includes at least one computer program, which is stored in the memory.
  • the central processing unit 1801 implements all or part of the steps in the memory management method of the neural network model shown in the above-mentioned embodiments by executing the at least one program.
  • An embodiment of the present application also provides a computer device, which includes a processor and a memory, wherein the memory stores at least one program, and the at least one program is loaded and executed by the processor to implement the memory management method of the neural network model provided by the above-mentioned method embodiments.
  • An embodiment of the present application also provides a computer-readable storage medium, which stores at least one computer program.
  • the at least one computer program is loaded and executed by a processor to implement the memory management method of the neural network model provided by the above-mentioned method embodiments.
  • An embodiment of the present application also provides a computer program product, which includes a computer program, and the computer program is stored in a computer-readable storage medium; the computer program is read and executed from the computer-readable storage medium by a processor of a computer device, so that the computer device executes to implement the memory management method of the neural network model provided in the above-mentioned method embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System (AREA)

Abstract

The present application relates to the technical field of memory management, and discloses a memory management method and apparatus for a neural network model, a device, a medium and a product. The method comprises: acquiring a computational graph corresponding to a neural network model; on the basis of the computational graph, determining memory sizes to be allocated to network layer operators; and acquiring, from a free memory block list, allocation memory blocks matched with the memory sizes, and allocating the allocation memory blocks to the network layer operators, wherein the free memory block list is used for storing free memory blocks which have been allocated but occupation of which has been released, and the allocation memory blocks are memory blocks allocated to the network layer operators for storing data. By means of free memory blocks in a free memory block list, allocation memory blocks matched with memory sizes are allocated to network layer operators, reducing the memory allocated to a neural network model during operation, improving the utilization rate of memory.

Description

神经网络模型的内存管理方法、装置、设备、介质及产品Memory management method, device, equipment, medium and product for neural network model

本申请要求于2023年09月11日提交,申请号为202311165933.1、发明名称为“神经网络模型的内存管理方法、装置、设备、介质及产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese patent application filed on September 11, 2023, with application number 202311165933.1 and invention name “Memory management method, device, equipment, medium and product for neural network model”, all contents of which are incorporated by reference into this application.

技术领域Technical Field

本申请实施例涉及内存管理技术领域,特别涉及一种神经网络模型的内存管理方法、装置、设备、介质及产品。The embodiments of the present application relate to the field of memory management technology, and in particular to a memory management method, device, equipment, medium and product for a neural network model.

背景技术Background Art

随着人工智能(Artificial Intelligence,AI)技术的发展,神经网络模型的应用越来越频繁。为了取得更好的算法精度,神经网络模型变得越来越复杂,硬件能力限制了神经网络向更深的方向发展。With the development of artificial intelligence (AI) technology, neural network models are being used more and more frequently. In order to achieve better algorithm accuracy, neural network models are becoming more and more complex, and hardware capabilities limit the development of neural networks in a deeper direction.

相关技术中,通过获取神经网络模型中各个网络层需要占用的内存,然后按照整个神经网络模型运行顺序,为整个神经网络模型分配内存。例如,神经网络模型在运行过程中,依次需要占用100M的内存块、10M的内存块和50M的内存块,10M的内存块和50M的内存块的存储周期存在交叉,即,10M的内存块和50M的内存块的被占用时间存在交集。当神经网络模型申请100M的内存块时,可为神经网络分配100M的内存块,然后当神经网络模型申请10M的内存块时,可判断一下该10M的内存块申请是否可用复用上述已分配的100M内存块,如果可以,则不再为所申请的10M内存块分配新的内存块,而使该10M内存块申请复用上述100M内存块。同理,当神经网络模型申请50M的内存块时,也判断一下该50M的内存块申请是否可复用上述已分配的100M内存块,且如果可复用,则分配该50M的内存块复用上述已分配的100M内存块,否则,为该50M内存块申请分配一新的50M内存块。In the related art, the memory required to be occupied by each network layer in the neural network model is obtained, and then the memory is allocated to the entire neural network model according to the running order of the entire neural network model. For example, during the operation of the neural network model, it is necessary to occupy a 100M memory block, a 10M memory block, and a 50M memory block in sequence. The storage cycles of the 10M memory block and the 50M memory block are intersected, that is, the occupied time of the 10M memory block and the 50M memory block has an intersection. When the neural network model applies for a 100M memory block, a 100M memory block can be allocated to the neural network, and then when the neural network model applies for a 10M memory block, it can be determined whether the 10M memory block application can reuse the above-mentioned allocated 100M memory block. If it can, no new memory block is allocated for the applied 10M memory block, and the 10M memory block application reuses the above-mentioned 100M memory block. Similarly, when the neural network model applies for a 50M memory block, it also determines whether the 50M memory block application can reuse the above-mentioned allocated 100M memory block. If it can be reused, the 50M memory block is allocated to reuse the above-mentioned allocated 100M memory block. Otherwise, a new 50M memory block is allocated for the 50M memory block application.

由上述相关技术可知,当神经网络模型申请一内存块时,由于申请的10M的内存块和申请的50M的内存块的存储周期存在交叉,在申请的10M内存块复用已分配的100M内存块后,申请的50M内存块将不能再复用已分配的100M内存块,需要另外申请分配一新的50M内存块,因此整个神经网络模型需共占用150M的内存块,导致整个神经网络模型占用的内存较大。因此,如何合理地管理神经网络模型占用的内存,提高内存的利用率,是亟待解决的重要问题。It can be seen from the above related technologies that when a neural network model applies for a memory block, since the storage cycles of the applied 10M memory block and the applied 50M memory block overlap, after the applied 10M memory block reuses the allocated 100M memory block, the applied 50M memory block can no longer reuse the allocated 100M memory block, and needs to apply for a new 50M memory block. Therefore, the entire neural network model needs to occupy a total of 150M memory blocks, resulting in a large amount of memory occupied by the entire neural network model. Therefore, how to reasonably manage the memory occupied by the neural network model and improve memory utilization is an important issue that needs to be solved urgently.

发明内容Summary of the invention

本申请提供了一种神经网络模型的内存管理方法、装置、设备、介质及产品,所述技术方案如下所述内容。The present application provides a memory management method, device, equipment, medium and product for a neural network model, and the technical solution is described as follows.

根据本申请的一方面,提供了一种神经网络模型的内存管理方法,所述方法由计算机设备执行,所述方法包括以下步骤。According to one aspect of the present application, a memory management method for a neural network model is provided, the method being executed by a computer device and comprising the following steps.

获取神经网络模型对应的计算图,所述计算图中包括至少两个网络层算子,所述网络层算子用于表示所述神经网络模型中的网络层。A computation graph corresponding to a neural network model is obtained, wherein the computation graph includes at least two network layer operators, and the network layer operators are used to represent network layers in the neural network model.

基于所述计算图,确定待分配至所述网络层算子的内存大小,所述内存大小用于表示所述网络层算子在所述神经网络模型运行时需要占用的内存大小。Based on the computation graph, a memory size to be allocated to the network layer operator is determined, where the memory size is used to represent the memory size that the network layer operator needs to occupy when the neural network model is running.

从空闲内存块列表中获取与所述内存大小匹配的分配内存块,将所述分配内存块分配给所述网络层算子。An allocated memory block matching the memory size is obtained from a free memory block list, and the allocated memory block is allocated to the network layer operator.

其中,所述空闲内存块列表用于存放已被分配但被解除占用后的空闲内存块,所述分配内存块是指被分配给所述网络层算子用于存储数据的内存块。The free memory block list is used to store free memory blocks that have been allocated but released, and the allocated memory blocks refer to memory blocks allocated to the network layer operator for storing data.

根据本申请的一方面,提供了一种神经网络模型的内存管理装置,所述装置包括以下步骤。According to one aspect of the present application, a memory management device for a neural network model is provided, and the device includes the following steps.

获取模块,用于获取神经网络模型对应的计算图,所述计算图中包括至少两个网络层算子,所述网络层算子用于表示所述神经网络模型中的网络层。 An acquisition module is used to acquire a computational graph corresponding to a neural network model, wherein the computational graph includes at least two network layer operators, and the network layer operators are used to represent network layers in the neural network model.

确定模块,用于基于所述计算图,确定待分配至所述网络层算子的内存大小,所述内存大小用于表示所述网络层算子在所述神经网络模型运行时需要占用的内存大小。A determination module is used to determine the memory size to be allocated to the network layer operator based on the calculation graph, and the memory size is used to represent the memory size that the network layer operator needs to occupy when the neural network model is running.

分配模块,用于从空闲内存块列表中获取与所述内存大小匹配的分配内存块,将所述分配内存块分配给所述网络层算子。The allocation module is used to obtain an allocated memory block matching the memory size from a free memory block list, and allocate the allocated memory block to the network layer operator.

其中,所述空闲内存块列表用于存放已被分配但被解除占用后的空闲内存块,所述分配内存块是指被分配给所述网络层算子用于存储数据的内存块。The free memory block list is used to store free memory blocks that have been allocated but released, and the allocated memory blocks refer to memory blocks allocated to the network layer operator for storing data.

根据本申请的另一方面,提供了一种计算机设备,该计算机设备包括:处理器和存储器,存储器中存储有至少一条计算机程序,至少一条计算机程序由处理器加载并执行以实现如上方面所述的神经网络模型的内存管理方法。According to another aspect of the present application, a computer device is provided, which includes: a processor and a memory, wherein at least one computer program is stored in the memory, and the at least one computer program is loaded and executed by the processor to implement the memory management method of the neural network model as described above.

根据本申请的另一方面,提供了一种计算机存储介质,计算机可读存储介质中存储有至少一条计算机程序,至少一条计算机程序由处理器加载并执行以实现如上方面所述的神经网络模型的内存管理方法。According to another aspect of the present application, a computer storage medium is provided, in which at least one computer program is stored. The at least one computer program is loaded and executed by a processor to implement the memory management method of the neural network model as described above.

根据本申请的另一方面,提供了一种计算机程序产品,上述计算机程序产品包括计算机程序,所述计算机程序存储在计算机可读存储介质中;所述计算机程序由计算机设备的处理器从所述计算机可读存储介质读取并执行,使得所述计算机设备执行如上方面所述的神经网络模型的内存管理方法。According to another aspect of the present application, a computer program product is provided, which includes a computer program stored in a computer-readable storage medium; the computer program is read and executed from the computer-readable storage medium by a processor of a computer device, so that the computer device executes the memory management method of the neural network model as described above.

通过获取神经网络模型对应的计算图;基于计算图,确定待分配至网络层算子的内存大小;基于内存大小,通过从空闲内存块列表中获取与内存大小匹配的分配内存块,将分配内存块分配给网络层算子。本申请通过利用空闲内存块列表中的空闲内存块,为网络层算子分配到与内存大小匹配的分配内存块,从而减少神经网络模型在运行过程中分配的内存,提高了内存的利用率。By obtaining the calculation graph corresponding to the neural network model; based on the calculation graph, determining the memory size to be allocated to the network layer operator; based on the memory size, obtaining the allocated memory block matching the memory size from the free memory block list, and allocating the allocated memory block to the network layer operator. This application uses the free memory blocks in the free memory block list to allocate the allocated memory block matching the memory size to the network layer operator, thereby reducing the memory allocated during the operation of the neural network model and improving the memory utilization.

由于内存利用率得到提升,因此在配置更小内存的情况下,计算机设备也能够达到与配置更大内存相近或相同的神经网络模型运行效果,有助于降低运动神经网络模型对计算机设备的硬件需求。Since memory utilization is improved, when a smaller memory is configured, the computer device can also achieve a neural network model running effect that is similar or the same as that with a larger memory configuration, which helps to reduce the hardware requirements of the sports neural network model for the computer device.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请一个示例性实施例提供的一种神经网络模型的内存管理方法的示意图;FIG1 is a schematic diagram of a memory management method for a neural network model provided by an exemplary embodiment of the present application;

图2是本申请一个示例性实施例提供的计算机系统的架构示意图;FIG2 is a schematic diagram of the architecture of a computer system provided by an exemplary embodiment of the present application;

图3是本申请一个示例性实施例提供的神经网络模型的内存管理方法的流程图;FIG3 is a flow chart of a memory management method for a neural network model provided by an exemplary embodiment of the present application;

图4是本申请一个示例性实施例提供的另一神经网络模型的内存管理方法的流程图;FIG4 is a flow chart of another memory management method of a neural network model provided by an exemplary embodiment of the present application;

图5是本申请一个示例性实施例提供的计算图的示意图;FIG5 is a schematic diagram of a calculation graph provided by an exemplary embodiment of the present application;

图6是本申请一个示例性实施例提供的分配内存块的确定方法的示意图;FIG6 is a schematic diagram of a method for determining an allocated memory block provided by an exemplary embodiment of the present application;

图7是本申请一个示例性实施例提供的另一分配内存块的确定方法的示意图;FIG7 is a schematic diagram of another method for determining an allocated memory block provided by an exemplary embodiment of the present application;

图8是本申请一个示例性实施例提供的未分配内存的示意图;FIG8 is a schematic diagram of unallocated memory provided by an exemplary embodiment of the present application;

图9是本申请一个示例性实施例提供的释放内存块的示意图;FIG9 is a schematic diagram of releasing a memory block provided by an exemplary embodiment of the present application;

图10是本申请一个示例性实施例提供的形状重塑算子进行重塑的示意图;FIG10 is a schematic diagram of reshaping performed by a shape reshaping operator provided by an exemplary embodiment of the present application;

图11是本申请一个示例性实施例提供的拼接算子进行拼接的示意图;FIG11 is a schematic diagram of splicing performed by a splicing operator provided by an exemplary embodiment of the present application;

图12是本申请一个示例性实施例提供的分裂算子进行分裂的示意图;FIG12 is a schematic diagram of splitting performed by a splitting operator provided by an exemplary embodiment of the present application;

图13是本申请一个示例性实施例提供的另一拼接算子进行拼接的示意图;FIG13 is a schematic diagram of another splicing operator for splicing provided by an exemplary embodiment of the present application;

图14是本申请一个示例性实施例提供的另一分裂算子进行分裂的示意图;FIG14 is a schematic diagram of another splitting operator performing splitting provided by an exemplary embodiment of the present application;

图15是本申请一个示例性实施例提供的再一神经网络模型的内存管理方法的流程图;FIG15 is a flowchart of a memory management method of another neural network model provided by an exemplary embodiment of the present application;

图16是本申请一个示例性实施例提供的AI芯片的结构图;FIG16 is a structural diagram of an AI chip provided by an exemplary embodiment of the present application;

图17是本申请一个示例性实施例提供的神经网络模型的内存管理装置的框图;FIG17 is a block diagram of a memory management device for a neural network model provided by an exemplary embodiment of the present application;

图18是本申请一个示例性实施例提供的计算机设备的结构示意图。FIG. 18 is a schematic diagram of the structure of a computer device provided by an exemplary embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描 述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。In order to make the purpose, technical solution and advantages of the present application clearer, the implementation mode of the present application will be further described in detail below with reference to the accompanying drawings. Here, the exemplary embodiments will be described in detail, and examples thereof are shown in the accompanying drawings. When the description refers to the drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Instead, they are only examples of devices and methods consistent with some aspects of the present application as detailed in the attached claims.

在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in this disclosure are for the purpose of describing specific embodiments only and are not intended to limit the disclosure. The singular forms of "a", "said" and "the" used in this disclosure and the appended claims are also intended to include plural forms unless the context clearly indicates otherwise. It should also be understood that the term "and/or" used herein refers to and includes any or all possible combinations of one or more associated listed items.

应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。It should be understood that although the terms first, second, third, etc. may be used in the present disclosure to describe various information, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.

为了方便理解,下面对本申请实施例中涉及的名词进行说明。For ease of understanding, the nouns involved in the embodiments of the present application are explained below.

计算图:广义上的计算图是一种有向图,用于表示输入、输出以及中间变量之间的计算关系,其中,图中每个节点代表一种数学运算。而本申请实施例中神经网络模型的计算图,则用于表征在神经网络模型的计算过程中,不同网络层之间的执行顺序,以及网络层之间的数据流向。在一些实施例中,计算图中还包括各个网络层的输入输出张量的生命周期。Computational graph: A computational graph in a broad sense is a directed graph used to represent the computational relationship between input, output, and intermediate variables, where each node in the graph represents a mathematical operation. The computational graph of the neural network model in the embodiment of the present application is used to characterize the execution order between different network layers and the data flow between network layers during the calculation process of the neural network model. In some embodiments, the computational graph also includes the life cycle of the input and output tensors of each network layer.

神经网络模型的计算图由网络层算子以及网络层算子之间的边构成。其中,网络层算子与神经网络模型中的网络层相对应,且包含网络层算子的输入以及输出张量的尺寸;网络层算子之间的边用于表征网络层之间的数据流向。The computational graph of a neural network model consists of network layer operators and the edges between network layer operators. The network layer operators correspond to the network layers in the neural network model and contain the input and output tensor sizes of the network layer operators; the edges between network layer operators are used to represent the data flow between network layers.

生命周期(life cycle):本申请实施例中的生命周期指张量的生命周期,该张量可以是网络层的输入或输出张量。在生命周期内,张量被当前网络层使用,相应的,存储该张量的内存块被占用;在生命周期结束时,张量将不会被其他网络层使用,且存储该张量的内存块解除占用。Life cycle: The life cycle in the embodiments of the present application refers to the life cycle of a tensor, which can be an input or output tensor of a network layer. During the life cycle, the tensor is used by the current network layer, and accordingly, the memory block storing the tensor is occupied; at the end of the life cycle, the tensor will not be used by other network layers, and the memory block storing the tensor will be released.

空闲内存块列表&已分配内存块列表:空闲内存块列表用于记录已被分配但被解除占用的空闲内存块的列表。已分配内存块列表用于记录已被分配且被占用的内存块的列表。在一些实施例中,这两个列表均有内存管理单元维护。Free Memory Block List & Allocated Memory Block List: The free memory block list is used to record the list of free memory blocks that have been allocated but released. The allocated memory block list is used to record the list of memory blocks that have been allocated and occupied. In some embodiments, both lists are maintained by the memory management unit.

其中,空闲内存块列表中的空闲内存块被重新分配后,已分配内存块列表中将新增内存块;已分配内存块列表中的内存块解除占用后,空闲内存块列表将新增空闲内存块。Among them, after the free memory blocks in the free memory block list are reallocated, new memory blocks will be added to the allocated memory block list; after the memory blocks in the allocated memory block list are released, new free memory blocks will be added to the free memory block list.

数据处理层算子:指计算图中,神经网络模型中用于进行数据格式调整,但不改变数据内容的网络层对应的算子。该数据格式调整包括调整张量的维度,比如将多通道三维张量拆分为多个单通道二维张量,或者,将多个单通道二维张量拼接为一个单通道二维张量,或者,将多个单通道二维张量组合为多通道三维张量。Data processing layer operator: refers to the operator corresponding to the network layer in the computational graph that is used to adjust the data format in the neural network model but does not change the data content. The data format adjustment includes adjusting the dimension of the tensor, such as splitting a multi-channel three-dimensional tensor into multiple single-channel two-dimensional tensors, or splicing multiple single-channel two-dimensional tensors into a single-channel two-dimensional tensor, or combining multiple single-channel two-dimensional tensors into a multi-channel three-dimensional tensor.

本申请实施例提供了一种神经网络模型的内存管理方法的示意图,如图1所示,该方法可以由计算机设备执行,计算机设备可以是终端或服务器,具体地,该方法可以由计算机设备中的内存管理单元(Memory Management Unit,MMU)执行。An embodiment of the present application provides a schematic diagram of a memory management method for a neural network model, as shown in Figure 1. The method can be executed by a computer device, which can be a terminal or a server. Specifically, the method can be executed by a memory management unit (MMU) in the computer device.

示例性地,计算机设备获取神经网络模型对应的计算图10;计算机设备基于计算图10,确定待分配至网络层算子40的内存大小;计算机设备从空闲内存块列表20中获取与内存大小匹配的分配内存块,将分配内存块分配给网络层算子40。Exemplarily, the computer device obtains a computational graph 10 corresponding to the neural network model; based on the computational graph 10, the computer device determines the memory size to be allocated to the network layer operator 40; the computer device obtains an allocated memory block matching the memory size from the free memory block list 20, and allocates the allocated memory block to the network layer operator 40.

示例性地,内存管理单元获取神经网络模型对应的计算图10;内存管理单元基于计算图10,确定待分配至网络层算子40的内存大小;内存管理单元从空闲内存块列表20中获取与内存大小匹配的分配内存块,将分配内存块分配给网络层算子40。Exemplarily, the memory management unit obtains a calculation graph 10 corresponding to the neural network model; based on the calculation graph 10, the memory management unit determines the memory size to be allocated to the network layer operator 40; the memory management unit obtains an allocated memory block that matches the memory size from the free memory block list 20, and allocates the allocated memory block to the network layer operator 40.

计算图10用以表示神经网络模型的计算过程。The computational graph 10 is used to represent the computational process of the neural network model.

可选地,计算图10中包括至少两个网络层算子40和至少两个网络层算子之间的边50,网络层算子40用于表示神经网络模型中的网络层,边50用于表示网络层之间的数据流向。Optionally, the computation graph 10 includes at least two network layer operators 40 and edges 50 between at least two network layer operators, the network layer operators 40 are used to represent network layers in the neural network model, and the edges 50 are used to represent data flow between network layers.

如图1所示,计算机设备获取神经网络模型对应的计算图10,计算图10中包括至少三个网络层算子40,分别为:网络层算子G0、网络层算子G1和网络层算子G2,在神经网络模型运行时,网络层算子40的运算顺序为网络层算子G0→网络层算子G1→网络层算子G2。其中,待分配至网络层算子G0的内存大小为16M,即,网络层算子G0在运行时需要占用 16M的内存(用于存储网络层算子G0对应网络层的输入输出张量);待分配至网络层算子G1的内存大小为10M,即,网络层算子G1在运行时需要占用10M的内存(用于存储网络层算子G1对应网络层的输入输出张量);待分配至网络层算子G2的内存大小为5M,即,网络层算子G2在运行时需要占用5M的内存(用于存储网络层算子G2对应网络层的输入输出张量)。As shown in FIG1 , the computer device obtains the computational graph 10 corresponding to the neural network model. The computational graph 10 includes at least three network layer operators 40, namely: network layer operator G0, network layer operator G1 and network layer operator G2. When the neural network model is running, the operation order of the network layer operators 40 is network layer operator G0→network layer operator G1→network layer operator G2. Among them, the memory size to be allocated to the network layer operator G0 is 16M, that is, the network layer operator G0 needs to occupy 16M when running. 16M of memory (used to store the input and output tensors of the network layer corresponding to the network layer operator G0); the memory size to be allocated to the network layer operator G1 is 10M, that is, the network layer operator G1 needs to occupy 10M of memory during runtime (used to store the input and output tensors of the network layer corresponding to the network layer operator G1); the memory size to be allocated to the network layer operator G2 is 5M, that is, the network layer operator G2 needs to occupy 5M of memory during runtime (used to store the input and output tensors of the network layer corresponding to the network layer operator G2).

内存大小用于表示网络层算子40在神经网络模型运行时需要占用的内存大小。The memory size is used to indicate the memory size that the network layer operator 40 needs to occupy when the neural network model is running.

空闲内存块列表20用于存放已被分配但被解除占用后的空闲内存块。The free memory block list 20 is used to store free memory blocks that have been allocated but released.

分配内存块是指被分配给网络层算子40用于存储数据的内存块。The allocated memory block refers to a memory block allocated to the network layer operator 40 for storing data.

内存大小包括输入张量的大小和/或输出张量的大小。The memory size includes the size of the input tensor and/or the size of the output tensor.

输入张量用以表示输入至网络层算子40中的多维数组。The input tensor is used to represent the multi-dimensional array input to the network layer operator 40.

输出张量用以表示从网络层算子40输出的多维数组。The output tensor is used to represent the multidimensional array output from the network layer operator 40.

在一些实施例中,计算机设备基于计算图10,还可以确定网络层算子40的输入张量和输出张量对应的生命周期。In some embodiments, the computer device may also determine the life cycles corresponding to the input tensors and output tensors of the network layer operator 40 based on the computation graph 10 .

在当前网络层算子执行完后,且该张量不会被其他网络层算子调用时,该张量结束生命周期,该张量占用的内存块便可被释放。After the current network layer operator is executed and the tensor is not called by other network layer operators, the tensor ends its life cycle and the memory block occupied by the tensor can be released.

示例性地,计算机设备获取网络层算子40的排列顺序;计算机设备在网络层算子40对应的输入张量的大小大于张量大小阈值的情况下,从空闲内存块列表20中获取与输入张量匹配的分配内存块,并按照排列顺序将分配内存块分配给网络层算子用于存储输入张量。Exemplarily, the computer device obtains the arrangement order of the network layer operator 40; when the size of the input tensor corresponding to the network layer operator 40 is greater than the tensor size threshold, the computer device obtains the allocated memory block matching the input tensor from the free memory block list 20, and allocates the allocated memory block to the network layer operator in the arrangement order for storing the input tensor.

排列顺序用于表示网络层算子40在神经网络模型运行时的执行顺序。The arrangement order is used to represent the execution order of the network layer operators 40 when the neural network model is running.

在一些实施例中,分配内存块的确定方式包括以下方式中的至少一种,但不限于此。In some embodiments, the method for determining the allocated memory block includes at least one of the following methods, but is not limited thereto.

(1)在网络层算子40对应的输入张量的大小小于或等于张量大小阈值的情况下,直接从未分配内存中获取与输入张量的大小匹配的分配内存块。(1) When the size of the input tensor corresponding to the network layer operator 40 is less than or equal to the tensor size threshold, an allocated memory block matching the size of the input tensor is directly obtained from the unallocated memory.

(2)在输入张量的大小大于张量大小阈值,且空闲内存块列表20中存在至少一个空闲内存块的大小大于或等于输入张量的大小的情况下,从空闲内存块列表20中获取与输入张量的大小匹配的分配内存块。(2) When the size of the input tensor is greater than the tensor size threshold and there is at least one free memory block in the free memory block list 20 whose size is greater than or equal to the size of the input tensor, obtain an allocated memory block that matches the size of the input tensor from the free memory block list 20.

(3)在输入张量的大小大于张量大小阈值,且空闲内存块列表20中的空闲内存块的大小均小于输入张量的大小的情况下,从空闲内存块列表20和未分配内存中获取与输入张量的大小匹配的分配内存块。(3) When the size of the input tensor is greater than the tensor size threshold and the sizes of the free memory blocks in the free memory block list 20 are all smaller than the size of the input tensor, an allocated memory block that matches the size of the input tensor is obtained from the free memory block list 20 and the unallocated memory.

(4)在输入张量的大小大于张量大小阈值,且空闲内存块列表20中没有空闲内存块的情况下,从未分配内存中划分出与输入张量的大小匹配的内存块作为分配内存块。(4) When the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list 20, a memory block matching the size of the input tensor is allocated from the unallocated memory as an allocated memory block.

其中,未分配内存是指存储空间中未被分配占用过的内存。Among them, unallocated memory refers to the memory in the storage space that has not been allocated or occupied.

张量大小阈值是指能够进行内存复用的最小的内存块的大小。The tensor size threshold refers to the size of the smallest memory block that can be reused.

可选地,张量大小阈值可采用自定义值、默认值中的至少一种,但不限于此,本申请实施例对此不作具体限定。Optionally, the tensor size threshold may adopt at least one of a custom value and a default value, but is not limited to this, and the embodiments of the present application do not make specific limitations on this.

示例性地,在网络层算子40对应的输入张量的大小小于或等于张量大小阈值的情况下,直接从未分配内存中获取与输入张量的大小匹配的分配内存块,从而避免产生内存碎片。Exemplarily, when the size of the input tensor corresponding to the network layer operator 40 is less than or equal to the tensor size threshold, an allocated memory block matching the size of the input tensor is directly obtained from the unallocated memory, thereby avoiding memory fragmentation.

示例性地,在输入张量的大小大于张量大小阈值,且空闲内存块列表中包括与输入张量的大小相同第一空闲内存块的情况下,直接将第一空闲内存块作为分配内存块分配给对应的网络层算子40用于存储输入张量。Exemplarily, when the size of the input tensor is greater than the tensor size threshold and the free memory block list includes a first free memory block with the same size as the input tensor, the first free memory block is directly allocated as an allocated memory block to the corresponding network layer operator 40 for storing the input tensor.

在输入张量的大小大于张量大小阈值,且空闲内存块列表20中包括大于输入张量的大小的第二空闲内存块的情况下,从第二空闲内存块中划分出与输入张量的大小匹配的第三内存块,并将第三内存块作为分配内存块分配给对应的网络层算子20用于存储输入张量。When the size of the input tensor is greater than the tensor size threshold and the free memory block list 20 includes a second free memory block that is larger than the size of the input tensor, a third memory block that matches the size of the input tensor is divided from the second free memory block, and the third memory block is allocated as an allocated memory block to the corresponding network layer operator 20 for storing the input tensor.

例如,如图1所示出的分配内存块的确定方法,如图1中的(a)所示,图中阴影部位为正在运行的网络层算子40,即当前运行的网络层算子40为网络层算子G0,网络层算子G0的输入张量需要占用的内存大小为16M,则计算机设备向网络层算子G0分配16M内存块,分配后已分配内存块列表30中记录有已分配的16M内存块,此时,空闲内存块列表20 为空。如图1中的(b)所示,在网络层算子G0运行完成后,网络层算子G0占用的16M内存块被释放,此时,空闲内存块列表20中记录有被释放的16M内存块。For example, the method for determining the allocated memory block shown in FIG1 is shown in (a) of FIG1 . The shaded area in the figure is the network layer operator 40 that is running, that is, the network layer operator 40 that is currently running is the network layer operator G0. The memory size that the input tensor of the network layer operator G0 needs to occupy is 16M. Then the computer device allocates a 16M memory block to the network layer operator G0. After the allocation, the allocated memory block list 30 records the allocated 16M memory block. At this time, the free memory block list 20 As shown in (b) of FIG1 , after the network layer operator G0 is completed, the 16M memory block occupied by the network layer operator G0 is released. At this time, the free memory block list 20 records the released 16M memory block.

如图1中的(c)所示,在网络层算子G1在网络层算子G2之前运行时,按照排列顺序先为网络层算子G1分配内存块,然后为网络层算子G2分配内存块。此处假设张量大小阈值为4M,在网络层算子G1和网络层算子G2的输入张量的大小均大于张量大小阈值(4M)的情况下,进一步判断是否可从空闲内存块列表20中获取分配内存块。As shown in (c) of FIG1 , when network layer operator G1 runs before network layer operator G2, a memory block is allocated to network layer operator G1 first in the order of arrangement, and then a memory block is allocated to network layer operator G2. Assuming that the tensor size threshold is 4M, when the size of the input tensors of network layer operator G1 and network layer operator G2 are both greater than the tensor size threshold (4M), it is further determined whether the allocated memory block can be obtained from the free memory block list 20.

在网络层算子G1的输入张量的大小(10M)大于张量大小阈值(4M),且空闲内存块列表20中的空闲内存块的大小(16M)大于网络层算子G1的输入张量的大小(10M)的情况下,从空闲内存块列表20中获取与网络层算子G1对应的输入张量的大小(10M)匹配的分配内存块,即,获取的网络层算子G1对应的分配内存块为(10M)。When the size of the input tensor of the network layer operator G1 (10M) is greater than the tensor size threshold (4M), and the size of the free memory block in the free memory block list 20 (16M) is greater than the size of the input tensor of the network layer operator G1 (10M), an allocated memory block that matches the size (10M) of the input tensor corresponding to the network layer operator G1 is obtained from the free memory block list 20, that is, the allocated memory block corresponding to the network layer operator G1 obtained is (10M).

在网络层算子G2的输入张量的大小(5M)大于张量大小阈值(4M),且空闲内存块列表20中的剩余的空闲内存块的大小(6M)大于网络层算子G2的输入张量的大小(5M)的情况下,从空闲内存块列表20中获取与网络层算子G2对应的输入张量的大小(5M)匹配的分配内存块,即,获取的网络层算子G2对应的分配内存块为(5M)。此时,在空闲内存块列表20中记录被复用剩下的1M内存块,在已分配内存块列表30中记录网络层算子G1已分配的内存块(10M)和网络层算子G2已分配的内存块(5M)。When the size of the input tensor of the network layer operator G2 (5M) is greater than the tensor size threshold (4M), and the size of the remaining free memory blocks in the free memory block list 20 (6M) is greater than the size of the input tensor of the network layer operator G2 (5M), the allocated memory block that matches the size of the input tensor corresponding to the network layer operator G2 (5M) is obtained from the free memory block list 20, that is, the allocated memory block corresponding to the obtained network layer operator G2 is (5M). At this time, the remaining 1M memory block that is reused is recorded in the free memory block list 20, and the allocated memory block (10M) of the network layer operator G1 and the allocated memory block (5M) of the network layer operator G2 are recorded in the allocated memory block list 30.

在一些实施例中,在输入张量的大小大于张量大小阈值,且空闲内存块列表20中包括小于输入张量的大小的第四空闲内存块的情况下,将第四空闲内存块与合并内存块合并,得到分配内存块。In some embodiments, when the size of the input tensor is greater than the tensor size threshold and the free memory block list 20 includes a fourth free memory block that is smaller than the size of the input tensor, the fourth free memory block is merged with the merged memory block to obtain an allocated memory block.

合并内存块是从未分配内存中划分得到的内存块,合并内存块的大小为输入张量的大小与第四空闲内存块的大小的差值。The merged memory block is a memory block divided from the unallocated memory, and the size of the merged memory block is the difference between the size of the input tensor and the size of the fourth free memory block.

例如,当前输入张量的大小为10MB,空闲内存块列表中有两块大小分别为2MB和4MB的空闲内存块,其中,4MB的空闲内存块在空闲内存块列表的末尾,则取出4MB大小的空闲内存块并从未分配内存中划分出6MB大小的内存块进行合并,生成一个10MB大小的内存块作为分配内存块分配给对应的网络层算子40用于存储输入张量。For example, the size of the current input tensor is 10MB, and there are two free memory blocks of 2MB and 4MB in the free memory block list, among which the 4MB free memory block is at the end of the free memory block list. Then, the 4MB free memory block is taken out and a 6MB memory block is divided from the unallocated memory to merge, and a 10MB memory block is generated as an allocated memory block and allocated to the corresponding network layer operator 40 for storing the input tensor.

综上所述,本实施例提供的方法,通过获取神经网络模型对应的计算图;基于计算图,确定待分配至网络层算子的内存大小;基于内存大小,通过从空闲内存块列表中的空闲内存块中获取与内存大小匹配的分配内存块,将分配内存块分配给网络层算子。本申请通过利用空闲内存块列表中的空闲内存块,为网络层算子分配到与内存大小匹配的分配内存块,从而减少神经网络模型在运行过程中分配的内存,提高了内存的利用率。In summary, the method provided in this embodiment obtains the calculation graph corresponding to the neural network model; based on the calculation graph, determines the memory size to be allocated to the network layer operator; based on the memory size, obtains the allocated memory block that matches the memory size from the free memory blocks in the free memory block list, and allocates the allocated memory block to the network layer operator. This application uses the free memory blocks in the free memory block list to allocate the allocated memory blocks that match the memory size to the network layer operator, thereby reducing the memory allocated by the neural network model during operation and improving memory utilization.

图2示出了本申请一个实施例提供的计算机系统的架构示意图。该计算机系统可以包括:终端100和服务器200。Fig. 2 shows a schematic diagram of the architecture of a computer system provided by an embodiment of the present application. The computer system may include: a terminal 100 and a server 200.

终端100可以是诸如手机、平板电脑、车载终端(车机)、可穿戴设备、个人计算机(Personal Computer,PC)、车载终端、飞行器、无人售货终端等电子设备。终端100中可以安装运行目标应用程序的客户端,该目标应用程序可以是参考神经网络模型的内存管理的应用程序,也可以是提供有神经网络模型的内存管理功能的其他应用程序,本申请对此不作限定。另外,本申请对该目标应用程序的形式不作限定,包括但不限于安装在终端100中的应用程序(Application,App)、小程序等,还可以是网页形式。The terminal 100 may be an electronic device such as a mobile phone, a tablet computer, a vehicle terminal (vehicle computer), a wearable device, a personal computer (PC), a vehicle terminal, an aircraft, an unmanned vending terminal, etc. The terminal 100 may be installed with a client that runs a target application, and the target application may be an application for memory management of a reference neural network model, or may be other applications that provide a memory management function of a neural network model, and this application does not limit this. In addition, this application does not limit the form of the target application, including but not limited to an application (Application, App) installed in the terminal 100, a mini-program, etc., and may also be in the form of a web page.

服务器200可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云计算服务的云服务器、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工掌部图像识别平台等基础云计算服务的云服务器。服务器200可以是上述目标应用程序的后台服务器,用于为目标应用程序的客户端提供后台服务。Server 200 may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides cloud computing services, cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), and cloud servers for basic cloud computing services such as big data and artificial palm image recognition platforms. Server 200 may be a backend server of the above-mentioned target application, used to provide backend services for the client of the target application.

其中,云技术(Cloud technology)是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、储存、处理和共享的一种托管技术。云技术基于云计算商 业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源,如视频网站、图片类网站和更多的门户网站。伴随着互联网行业的高度发展和应用,将来每个物品都有可能存在自己的识别标志,都需要传输到后台系统进行逻辑处理,不同程度级别的数据将会分开处理,各类行业数据皆需要强大的系统后盾支撑,只能通过云计算来实现。Cloud technology refers to a hosting technology that unifies hardware, software, network and other resources within a wide area network or local area network to achieve data computing, storage, processing and sharing. The general term for network technology, information technology, integration technology, management platform technology, application technology, etc. applied in business models, which can form a resource pool and be used on demand, flexibly and conveniently. Cloud computing technology will become an important support. The background services of technical network systems require a large amount of computing and storage resources, such as video websites, picture websites and more portal websites. With the rapid development and application of the Internet industry, in the future, each item may have its own identification mark, which needs to be transmitted to the background system for logical processing. Data of different levels will be processed separately. All kinds of industry data need strong system backing support, which can only be achieved through cloud computing.

在一些实施例中,上述服务器还可以实现为区块链系统中的节点。区块链(Blockchain)是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链,本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层。In some embodiments, the above-mentioned server can also be implemented as a node in a blockchain system. Blockchain is a new application mode of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanism, encryption algorithm, etc. Blockchain is essentially a decentralized database, a string of data blocks generated by cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify the validity of its information (anti-counterfeiting) and generate the next block. Blockchain can include the underlying blockchain platform, the platform product service layer, and the application service layer.

终端100和服务器200之间可以通过网络进行通信,如有线或无线网络。The terminal 100 and the server 200 may communicate with each other via a network, such as a wired or wireless network.

本申请实施例提供的神经网络模型的内存管理方法,各步骤的执行主体可以是计算机设备,计算机设备是指具备数据计算、处理和存储能力的电子设备。In the memory management method of the neural network model provided in the embodiment of the present application, the executor of each step can be a computer device, and a computer device refers to an electronic device with data calculation, processing and storage capabilities.

在一些实施例中,该计算机设备是具有神经网络模型运行需求的设备。In some embodiments, the computer device is a device having the requirements for running a neural network model.

在一些实施例中,该计算机设备中设置有AI芯片,该AI芯片包括AI处理器、内存以及内存管理单元,其中,该内存管理单元用于为运行中神经网络模型中的网络层分配内存块。该内存管理单元用于实现本申请各个实施例所述的神经网络模型的内存管理方法。In some embodiments, the computer device is provided with an AI chip, which includes an AI processor, a memory, and a memory management unit, wherein the memory management unit is used to allocate memory blocks to the network layers in the running neural network model. The memory management unit is used to implement the memory management method of the neural network model described in various embodiments of the present application.

以图2所示的方案实施环境为例,可以由终端100执行神经网络模型的内存管理方法(如终端100中安装运行的目标应用程序的客户端执行神经网络模型的内存管理方法),也可以由服务器200执行该神经网络模型的内存管理方法,或者由终端100和服务器200交互配合执行,本申请对此不作限定。Taking the solution implementation environment shown in Figure 2 as an example, the memory management method of the neural network model can be executed by the terminal 100 (such as the client of the target application installed and running in the terminal 100 executes the memory management method of the neural network model), or the memory management method of the neural network model can be executed by the server 200, or the terminal 100 and the server 200 can interact and cooperate to execute it, and this application does not limit this.

图3是本申请一个示例性实施例提供的神经网络模型的内存管理方法的流程图。该方法可以由计算机设备执行,计算机设备可以是终端或服务器。该方法包括以下步骤。FIG3 is a flow chart of a memory management method for a neural network model provided by an exemplary embodiment of the present application. The method can be executed by a computer device, which can be a terminal or a server. The method includes the following steps.

步骤302:获取神经网络模型对应的计算图。Step 302: Obtain a computational graph corresponding to the neural network model.

计算图用以表示神经网络模型的计算过程。The computational graph is used to represent the computational process of the neural network model.

可选地,计算图中包括至少两个网络层算子和至少两个网络层算子之间的边,网络层算子对应神经网络模型中的网络层,一个网络层算子对应一个网络层,边用于表示网络层之间的数据流向。Optionally, the computational graph includes at least two network layer operators and edges between at least two network layer operators, the network layer operators correspond to network layers in the neural network model, one network layer operator corresponds to one network layer, and the edges are used to indicate the data flow between network layers.

可选地,神经网络模型包括深度学习神经网络模型(Deep Neural Network,DNN)、卷积神经网络模型(Convolutional Neural Network,CNN)、极限学习机模型(Extreme Learning Machine,ELM)或其他的神经网络模型中的至少一种,但不限于此,本申请实施例对此不作具体限定。Optionally, the neural network model includes at least one of a deep learning neural network model (Deep Neural Network, DNN), a convolutional neural network model (Convolutional Neural Network, CNN), an extreme learning machine model (Extreme Learning Machine, ELM) or other neural network models, but is not limited to this, and the embodiments of the present application do not make specific limitations on this.

在一种可能的实施方式中,该计算图由计算机设备通过对神经网络模型的结构进行分析确定得到。可选的,该计算图在神经网络模型前向传播时即时构建。In a possible implementation, the computation graph is determined by a computer device by analyzing the structure of the neural network model. Optionally, the computation graph is constructed in real time during the forward propagation of the neural network model.

在另一种可能的实施方式中,该计算图预先生成,并与神经网络模型共同输入计算机设备。其中,该计算图可以人为生成或者通过程序生成。In another possible implementation, the calculation graph is generated in advance and input into the computer device together with the neural network model. The calculation graph can be generated manually or by a program.

步骤304:基于计算图,确定待分配至网络层算子的内存大小。Step 304: Based on the computation graph, determine the memory size to be allocated to the network layer operator.

内存大小用于表示网络层算子在神经网络模型运行时需要占用的内存大小。Memory size is used to indicate the memory size that the network layer operator needs to occupy when the neural network model is running.

示例性地,计算机设备基于计算图,确定待分配至网络层算子的内存大小。Exemplarily, the computer device determines the memory size to be allocated to the network layer operator based on the computation graph.

可选地,内存大小包括输入张量的大小和/或输出张量的大小。Optionally, the memory size includes the size of the input tensor and/or the size of the output tensor.

输入张量用以表示输入至网络层算子中的多维数组。Input tensors are used to represent multidimensional arrays that are input to network layer operators.

输出张量用以表示从网络层算子输出的多维数组。Output tensors are multidimensional arrays representing the output from network layer operators.

在一个示意性的例子中,输入张量[2,3,4]表示该输入张量为具有2个通道的3×4矩阵,即包含2*3*4=24个元素,若每个元素均为fp32(即每个元素占用4Bytes),则该输入张量占用的内存大小为24*4=96Bytes。 In an illustrative example, the input tensor [2, 3, 4] indicates that the input tensor is a 3×4 matrix with 2 channels, i.e., it contains 2*3*4=24 elements. If each element is fp32 (i.e., each element occupies 4 bytes), the memory size occupied by the input tensor is 24*4=96 bytes.

步骤306:从空闲内存块列表中获取与内存大小匹配的分配内存块,将分配内存块分配给网络层算子。Step 306: Obtain an allocated memory block that matches the memory size from the free memory block list, and allocate the allocated memory block to the network layer operator.

空闲内存块是指已被分配但被解除占用后的内存块。A free memory block is a memory block that has been allocated but released.

空闲内存块列表用于存放已被分配但被解除占用后的空闲内存块。即该空闲内存块当前未被网络层占用。The free memory block list is used to store free memory blocks that have been allocated but released. That is, the free memory blocks are not currently occupied by the network layer.

分配内存块是指被分配给网络层算子用于存储数据的内存块。Allocated memory blocks refer to memory blocks allocated to network layer operators for storing data.

示例性地,计算机设备通过重新利用空闲内存块列表中的空闲内存块,将空闲内存块列表中的空闲内存块进行调整,得到与内存大小匹配的分配内存块。Exemplarily, the computer device adjusts the free memory blocks in the free memory block list by reusing the free memory blocks in the free memory block list to obtain allocated memory blocks that match the memory size.

综上所述,本实施例提供的方法,通过获取神经网络模型对应的计算图;基于计算图,确定待分配至网络层算子的内存大小;基于内存大小,通过从空闲内存块列表中的空闲内存块中获取与内存大小匹配的分配内存块,将分配内存块分配给网络层算子。本申请通过复用空闲内存块列表中的空闲内存块,为网络层算子分配到与内存大小匹配的分配内存块,从而减少神经网络模型在运行过程中分配的内存,提高了内存的利用率。由于内存利用率得到提升,因此在配置更小内存的情况下,计算机设备也能够达到与配置更大内存相近或相同的神经网络模型运行效果,有助于降低运动神经网络模型对计算机设备的硬件需求。In summary, the method provided in this embodiment obtains the calculation graph corresponding to the neural network model; based on the calculation graph, determines the memory size to be allocated to the network layer operator; based on the memory size, obtains the allocated memory block that matches the memory size from the free memory blocks in the free memory block list, and allocates the allocated memory block to the network layer operator. This application allocates the allocated memory block that matches the memory size to the network layer operator by reusing the free memory blocks in the free memory block list, thereby reducing the memory allocated during the operation of the neural network model and improving the memory utilization. Since the memory utilization is improved, when a smaller memory is configured, the computer device can also achieve a neural network model operation effect that is similar or the same as that configured with a larger memory, which helps to reduce the hardware requirements of the motion neural network model for the computer device.

图4是本申请一个示例性实施例提供的神经网络模型的内存管理方法的流程图。该方法可以由计算机设备执行,计算机设备可以是终端或服务器。该方法包括以下步骤。FIG4 is a flow chart of a memory management method for a neural network model provided by an exemplary embodiment of the present application. The method can be executed by a computer device, which can be a terminal or a server. The method includes the following steps.

步骤401:获取神经网络模型对应的计算图。Step 401: Obtain a computational graph corresponding to the neural network model.

示例性的,深度学习框架或者图编译设备对输入的神经网络模型进行解析并生成对应的计算图。Exemplarily, a deep learning framework or a graph compilation device parses the input neural network model and generates a corresponding computational graph.

步骤402:基于计算图,确定待分配至网络层算子的内存大小。Step 402: Based on the computation graph, determine the memory size to be allocated to the network layer operator.

以输入张量为例,比如,输入张量A对应的多维数组为[512,32,32],假设输入张量A中各个元素的数据类型都为Float 32,即每个元素占用4Bytes,则输入张量A的大小为:512*32*32*4=2MB。Take the input tensor as an example. For example, the multidimensional array corresponding to the input tensor A is [512, 32, 32]. Assuming that the data type of each element in the input tensor A is Float 32, that is, each element occupies 4 bytes, the size of the input tensor A is: 512*32*32*4=2MB.

在本申请实施例中,神经网络模型中的张量(Tensor)是一个多维数组,以开放神经网络交换格式(Open Neural Network Exchange,ONNX)标准为例,张量以(秩(rank)、形状(shape)、数据类型(date type))三元组来表示。例如,张量以三元组来表示时,张量的三元组如表1所示。In the embodiment of the present application, a tensor in a neural network model is a multidimensional array. Taking the Open Neural Network Exchange (ONNX) standard as an example, a tensor is represented by a (rank, shape, data type) triple. For example, when a tensor is represented by a triple, the triple of the tensor is shown in Table 1.

表1张量的三元组
Table 1. Triplet of tensor

如表1中第3行中的例子,张量=[9,10],表示一个9行10列的二维矩阵。As shown in the example in the third row of Table 1, tensor = [9, 10], which represents a two-dimensional matrix with 9 rows and 10 columns.

如表1所示,张量是一个多维数组,在张量的三元组中,秩用以表示张量的维度,形状是张量的一种表现样式,数据类型用以表示张量形状中的元素数据的类型。As shown in Table 1, a tensor is a multidimensional array. In the triplet of a tensor, the rank is used to indicate the dimension of the tensor, the shape is a representation of the tensor, and the data type is used to indicate the type of element data in the tensor shape.

以张量=[9,10]为例,该张量共有9*10=90个元素,该张量是Float 32类型,即每个元素占用4Bytes,则该张量的大小为9*10*4=360B=0.35KB。Taking tensor = [9, 10] as an example, the tensor has 9*10=90 elements in total. The tensor is of Float 32 type, that is, each element occupies 4Bytes, so the size of the tensor is 9*10*4=360B=0.35KB.

张量可以以数组的形式表示,也可以形状来表示,本申请实施例中以形状表示张量。A tensor can be represented in the form of an array or a shape. In the embodiment of the present application, a tensor is represented by a shape.

张量的形状为[],意为维度为0的标量;张量的形状为[10],意为维度为1的向量,且向量包含10个元素;张量的形状为[9,10],意为维度为2的矩阵,第一维有9个元素,第二维有10个元素,表示为9行10列的二维矩阵。 The shape of the tensor is [], which means a scalar with dimension 0; the shape of the tensor is [10], which means a vector with dimension 1 and 10 elements; the shape of the tensor is [9, 10], which means a matrix with dimension 2, with 9 elements in the first dimension and 10 elements in the second dimension, represented as a two-dimensional matrix with 9 rows and 10 columns.

形状中的数字的个数用以表示张量的维度,比如,[D0,D1,D2,D3]中有4个数字,则表示该张量为4维张量。张量的形状中的数字用以表示张量在该维度的元素个数。The number of numbers in the shape indicates the dimension of the tensor. For example, if there are 4 numbers in [D0, D1, D2, D3], it means that the tensor is a 4-dimensional tensor. The number in the shape of a tensor indicates the number of elements in that dimension.

进一步地,通过元素的个数及单个元素占用的比特的乘积,最终得到张量的大小。Furthermore, the size of the tensor is finally obtained by multiplying the number of elements and the bits occupied by a single element.

在一些实施例中,计算机设备基于计算图,还可确定网络层算子对应的排列顺序。In some embodiments, the computer device may also determine the arrangement order corresponding to the network layer operators based on the computation graph.

排列顺序用于表示网络层算子在神经网络模型运行时的执行顺序。The arrangement order is used to indicate the execution order of network layer operators when the neural network model is running.

示例性地,计算机设备获取网络层算子的排列顺序;计算机设备在网络层算子对应的输入张量的大小大于张量大小阈值的情况下,从空闲内存块列表中获取与输入张量匹配的分配内存块,并按照排列顺序将分配内存块分配给网络层算子用于存储输入张量。Exemplarily, the computer device obtains the arrangement order of the network layer operators; when the size of the input tensor corresponding to the network layer operator is greater than the tensor size threshold, the computer device obtains the allocated memory block matching the input tensor from the free memory block list, and allocates the allocated memory block to the network layer operator in the arrangement order for storing the input tensor.

在一些实施例中,该计算图中还包括网络层算子对应的张量(输入或输出张量)的生命周期。可选的,张量的生命周期可以采用使用如下形式表示:[使用张量的第一个网络层算子,使用张量的最后一个网络层算子]。In some embodiments, the computation graph also includes the life cycle of the tensor (input or output tensor) corresponding to the network layer operator. Optionally, the life cycle of the tensor can be expressed in the following form: [the first network layer operator using the tensor, the last network layer operator using the tensor].

例如,如图5所示出的计算图的示意图,计算图中包括G0,G1,G2,G3,G4共5个网络层算子,以G0算子的输出T1为例,T1共2*3*4=24个元素(假设T1是Float 32类型,即T1每个元素占用4Bytes),可以知道T1需要24*4=96Bytes的存储空间,此外T1被G1和G3作为输入张量,所以T1在G3执行完之后生命周期结束,可知T1的生命周期为[G0,G3],其它输入张量的大小和生命周期获取同T1相同。按照执行顺序对网络层算子进行排序的结果为:[G0,G1,G2,G3,G4],按照该排序结果将分配内存块分配给网络层算子用于存储输入张量。For example, as shown in the schematic diagram of the computation graph in FIG5, the computation graph includes five network layer operators, namely G0, G1, G2, G3, and G4. Taking the output T1 of the G0 operator as an example, T1 has 2*3*4=24 elements (assuming that T1 is of type Float 32, i.e., each element of T1 occupies 4 Bytes). It can be seen that T1 requires 24*4=96 Bytes of storage space. In addition, T1 is used as an input tensor by G1 and G3, so the life cycle of T1 ends after G3 is executed. It can be seen that the life cycle of T1 is [G0, G3], and the size and life cycle of other input tensors are the same as T1. The result of sorting the network layer operators in the order of execution is: [G0, G1, G2, G3, G4]. According to the sorting result, the allocated memory block is allocated to the network layer operator for storing the input tensor.

比如,计算机设备按照G0-G1-G2-G3-G4的排序结果进行内存块的分配,第一,为G0算子分配96B的内存块一用以存储T0,在G0算子执行完成后,释放内存块一;第二,为G1算子分配96B的内存块二用以存储T1,由于T1的存储周期为[G0,G3],因此在G1算子执行完成后,不释放内存块二;第三,为G2算子分配96B的内存块三用以存储T2,在G2算子执行完成后,释放内存块三;第四,G3算子开始执行,在G3算子执行完成后,释放内存块二;第五,为G4算子分配96B的内存块四用以存储T3,在G4算子执行完成后,释放内存块五。For example, the computer device allocates memory blocks according to the sorting result of G0-G1-G2-G3-G4. First, a 96B memory block one is allocated to the G0 operator to store T0. After the execution of the G0 operator is completed, the memory block one is released. Second, a 96B memory block two is allocated to the G1 operator to store T1. Since the storage period of T1 is [G0, G3], the memory block two is not released after the execution of the G1 operator is completed. Third, a 96B memory block three is allocated to the G2 operator to store T2. After the execution of the G2 operator is completed, the memory block three is released. Fourth, the G3 operator starts to execute. After the execution of the G3 operator is completed, the memory block two is released. Fifth, a 96B memory block four is allocated to the G4 operator to store T3. After the execution of the G4 operator is completed, the memory block five is released.

步骤403:判断输入张量的大小是否小于张量大小阈值。Step 403: Determine whether the size of the input tensor is less than the tensor size threshold.

张量大小阈值是指能够进行内存复用的最小的内存块的大小。The tensor size threshold refers to the size of the smallest memory block that can be reused.

可选地,张量大小阈值可采用自定义值、默认值中的至少一种,但不限于此,本申请实施例对此不作具体限定。Optionally, the tensor size threshold may adopt at least one of a custom value and a default value, but is not limited to this, and the embodiments of the present application do not make specific limitations on this.

示例性地,在输入张量的大小小于张量大小阈值的情况下,执行步骤409;在输入张量的大小大于或等于张量大小阈值的情况下,执行步骤404。Exemplarily, when the size of the input tensor is smaller than the tensor size threshold, step 409 is performed; when the size of the input tensor is greater than or equal to the tensor size threshold, step 404 is performed.

步骤404:判断空闲内存块列表中是否有空闲内存块。Step 404: Determine whether there is a free memory block in the free memory block list.

示例性地,在空闲内存块列表中包括空闲内存块的情况下,执行步骤405;在空闲内存块列表中没有空闲内存块的情况下,执行步骤408。Exemplarily, when the free memory block list includes free memory blocks, step 405 is executed; when the free memory block list does not include free memory blocks, step 408 is executed.

步骤405:判断空闲内存块的大小是否小于输入张量的大小。Step 405: Determine whether the size of the free memory block is smaller than the size of the input tensor.

示例性地,在存在至少一个空闲内存块的大小小于输入张量的大小的情况下,执行步骤407;在空闲内存块的大小均大于或等于输入张量的大小的情况下,执行步骤406。Exemplarily, when there is at least one free memory block whose size is smaller than the size of the input tensor, step 407 is executed; when the sizes of the free memory blocks are all greater than or equal to the size of the input tensor, step 406 is executed.

步骤406:从空闲内存块列表中获取与输入张量的大小匹配的分配内存块。Step 406: Get an allocated memory block that matches the size of the input tensor from the free memory block list.

在一些实施例中,在输入张量的大小大于张量大小阈值,且空闲内存块列表中的空闲内存块的大小大于或等于输入张量的大小的情况下,从空闲内存块列表中获取与输入张量的大小匹配的分配内存块。In some embodiments, when the size of the input tensor is greater than a tensor size threshold and the size of a free memory block in a free memory block list is greater than or equal to the size of the input tensor, an allocated memory block that matches the size of the input tensor is obtained from the free memory block list.

示例性地,在输入张量的大小大于张量大小阈值的情况下,将空闲内存列表中的第一空闲内存块作为分配内存块;其中,第一空闲内存块的大小与输入张量的大小相同。Exemplarily, when the size of the input tensor is greater than a tensor size threshold, the first free memory block in the free memory list is used as the allocated memory block; wherein the size of the first free memory block is the same as the size of the input tensor.

或,在输入张量的大小大于张量大小阈值的情况下,从空闲内存块类表中的第二空闲内存块中划分出与输入张量的大小匹配的第三内存块,将第三内存块作为分配内存块;第二空闲内存块的大小大于输入张量的大小。 Or, when the size of the input tensor is greater than the tensor size threshold, a third memory block matching the size of the input tensor is divided from the second free memory block in the free memory block class table, and the third memory block is used as the allocated memory block; the size of the second free memory block is greater than the size of the input tensor.

例如,如图6所示出的分配内存块的确定方法的示意图,如图6中的(a)所示,输入张量601为2MB,空闲内存块列表602中存在一个10MB的空闲内存块,则,计算机设备将10MB的空闲内存块分割成2MB和8MB的两个内存块,并将2MB大小的内存块作为分配内存块分配给对应的网络层算子用于存储输入张量601。如图6中的(b)所示,将2MB大小的内存块放于已分配内存块列表603中,并将8MB的内存块放回空闲内存块列表602。For example, as shown in FIG6 , a schematic diagram of a method for determining an allocated memory block is shown. As shown in FIG6 (a), the input tensor 601 is 2MB, and there is a 10MB free memory block in the free memory block list 602. Then, the computer device divides the 10MB free memory block into two memory blocks of 2MB and 8MB, and allocates the 2MB memory block as an allocated memory block to the corresponding network layer operator for storing the input tensor 601. As shown in FIG6 (b), the 2MB memory block is placed in the allocated memory block list 603, and the 8MB memory block is placed back in the free memory block list 602.

步骤407:从空闲内存块列表和未分配内存中获取与输入张量的大小匹配的分配内存块。Step 407: Get an allocated memory block that matches the size of the input tensor from the free memory block list and the unallocated memory.

未分配内存是指存储空间中未被分配占用过的内存。Unallocated memory refers to the memory in the storage space that has not been allocated or occupied.

在一些实施例中,在输入张量的大小大于张量大小阈值,且空闲内存块列表中的空闲内存块的大小小于输入张量的大小的情况下,从空闲内存块列表和未分配内存中获取与输入张量的大小匹配的分配内存块。In some embodiments, when the size of the input tensor is greater than a tensor size threshold and the size of a free memory block in the free memory block list is smaller than the size of the input tensor, an allocated memory block matching the size of the input tensor is obtained from the free memory block list and the unallocated memory.

示例性地,在输入张量的大小大于张量大小阈值,且空闲内存块列表中包括第四空闲内存块的情况下,将第四空闲内存块与合并内存块进行合并,得到分配内存块。Exemplarily, when the size of the input tensor is greater than the tensor size threshold and the free memory block list includes the fourth free memory block, the fourth free memory block is merged with the merged memory block to obtain the allocated memory block.

第四空闲内存块的大小小于输入张量的大小。The size of the fourth free memory block is smaller than the size of the input tensor.

合并内存块是从未分配内存中划分得到的内存块。A merged memory block is a memory block divided from unallocated memory.

合并内存块的大小为输入张量的大小与第四空闲内存块的大小的差值。The size of the merged memory block is the difference between the size of the input tensor and the size of the fourth free memory block.

在一些实施例中,在输入张量的大小大于张量大小阈值的情况下,将空闲内存列表中的第四空闲内存块,与未分配内存中的合并内存块合并,得到分配内存块。In some embodiments, when the size of the input tensor is greater than the tensor size threshold, the fourth free memory block in the free memory list is merged with the merged memory block in the unallocated memory to obtain an allocated memory block.

例如,如图7所示出的分配内存块的确定方法的示意图,如图7中的(a)所示,输入张量701为10MB,空闲内存块列表702中存在一个2MB的空闲内存块和一个4MB的空闲内存块,计算机设备将4MB的空闲内存块从空闲内存块列表702中取出,并从未分配内存中划分得到一个6MB的合并内存块,计算机设备将来自空闲内存块列表702中的4MB的空闲内存块和来自未分配内存中的6MB的合并内存块进行合并,并将合并得到的内存块作为分配内存块用于存储输入张量701,即,如图7中的(b)所示,将4MB大小的内存块和6MB的合并内存块进行合并后得到的10MB的分配内存块放入已分配内存块列表703中。For example, as shown in FIG7 , a schematic diagram of a method for determining an allocated memory block is shown. As shown in (a) of FIG7 , an input tensor 701 is 10MB, and there is a 2MB free memory block and a 4MB free memory block in a free memory block list 702. The computer device takes the 4MB free memory block from the free memory block list 702, and divides it into a 6MB merged memory block from the unallocated memory. The computer device merges the 4MB free memory block from the free memory block list 702 and the 6MB merged memory block from the unallocated memory, and uses the merged memory block as an allocated memory block for storing the input tensor 701. That is, as shown in (b) of FIG7 , a 10MB allocated memory block obtained by merging the 4MB memory block and the 6MB merged memory block is put into the allocated memory block list 703.

步骤408:从未分配内存中划分出与输入张量的大小匹配的内存块作为分配内存块。Step 408: A memory block matching the size of the input tensor is divided from the unallocated memory as an allocated memory block.

在一些实施例中,未分配内存包括一级未分配内存和二级未分配内存,一级未分配内存的分配优先级高于二级未分配内存的分配优先级。In some embodiments, the unallocated memory includes first-level unallocated memory and second-level unallocated memory, and the allocation priority of the first-level unallocated memory is higher than the allocation priority of the second-level unallocated memory.

在一些实施例中,一级未分配内存属于第一存储器,二级未分配内存属于第二存储器,第一存储器的存储优先级高于第二存储器的存储优先级。In some embodiments, the first-level unallocated memory belongs to the first memory, the second-level unallocated memory belongs to the second memory, and the storage priority of the first memory is higher than the storage priority of the second memory.

可选的,第一存储器的访存速度大于第二存储器的访存速度。Optionally, a memory access speed of the first memory is greater than a memory access speed of the second memory.

可选的,第一存储器的容量小于第二存储器的容量,且第一存储器中单位存储空间的硬件成本高于第二存储器中单位存储空间的硬件成本。Optionally, the capacity of the first memory is smaller than the capacity of the second memory, and the hardware cost per unit storage space in the first memory is higher than the hardware cost per unit storage space in the second memory.

在一些实施例中,该第一存储器为L2缓存,第二存储器为L3缓存。In some embodiments, the first memory is an L2 cache and the second memory is an L3 cache.

在一些实施例中,在输入张量的大小大于张量大小阈值,且空闲内存块列表中没有空闲内存块的情况下,从未分配内存中划分出与输入张量的大小匹配的内存块作为分配内存块。In some embodiments, when the size of the input tensor is greater than a tensor size threshold and there is no free memory block in the free memory block list, a memory block matching the size of the input tensor is allocated from the unallocated memory as an allocated memory block.

例如,如图8所示出的未分配内存的示意图,神经网络模型运行在AI芯片上,该AI芯片包括多个处理器核心簇,比如,第一处理器核心簇801和第二处理器核心簇802。为了加速访存,通常会采用多级存储的架构,距离处理器近的存储层级拥有更大的数据传输带宽,但硬件成本更高,所以存储空间较为有限,本申请实施例中称为一级未分配内存803或L2缓存,距离处理器远的存储层级数据传输带宽相小,但是硬件成本低,存储空间较大,本申请实施例中称为二级未分配内存804或L3缓存。For example, as shown in the schematic diagram of unallocated memory in FIG8 , the neural network model runs on an AI chip, which includes multiple processor core clusters, such as a first processor core cluster 801 and a second processor core cluster 802. In order to speed up memory access, a multi-level storage architecture is usually adopted. The storage level close to the processor has a larger data transmission bandwidth, but the hardware cost is higher, so the storage space is relatively limited, which is called the first-level unallocated memory 803 or L2 cache in the embodiment of the present application. The storage level far from the processor has a smaller data transmission bandwidth, but the hardware cost is low and the storage space is larger, which is called the second-level unallocated memory 804 or L3 cache in the embodiment of the present application.

需要说明的是,一级未分配内存和二级未分配内存各自拥有了独立的已分配内存块列表和空闲内存块列表,初始状态下这两个列表均为空。It should be noted that the first-level unallocated memory and the second-level unallocated memory each have independent allocated memory block lists and free memory block lists, and both lists are empty in the initial state.

可选地,空闲内存块列表中的内存块从小到大依次进行排序,最大的内存块会排列在末尾。相应的,计算机设备可以根据空闲内存块列表中末尾的空闲内存块的大小,确定是否存 在大于或等于输入张量的空闲内存块。Optionally, the memory blocks in the free memory block list are sorted from small to large, and the largest memory block is arranged at the end. Accordingly, the computer device can determine whether to store the free memory block according to the size of the free memory block at the end of the free memory block list. In a free memory block that is greater than or equal to the input tensor.

示例性地,在输入张量的大小大于张量大小阈值,且空闲内存块列表中没有空闲内存块的情况下,从一级未分配内存或二级未分配内存中划分与输入张量的大小匹配的内存块作为分配内存块。Exemplarily, when the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list, a memory block matching the size of the input tensor is allocated from the first-level unallocated memory or the second-level unallocated memory as an allocated memory block.

在一些实施例中,在输入张量的大小大于张量大小阈值,空闲内存块列表中没有空闲内存块,且一级未分配内存中的剩余内存大于或等于输入张量的大小的情况下,从一级未分配内存中划分与输入张量的大小匹配的内存块作为分配内存块。In some embodiments, when the size of the input tensor is greater than a tensor size threshold, there are no free memory blocks in the free memory block list, and the remaining memory in the first-level unallocated memory is greater than or equal to the size of the input tensor, a memory block matching the size of the input tensor is divided from the first-level unallocated memory as an allocated memory block.

在一些实施例中,在输入张量的大小大于张量大小阈值,空闲内存块列表中没有空闲内存块,且一级未分配内存中的剩余内存小于输入张量的大小的情况下,从二级未分配内存中划分与输入张量的大小匹配的内存块作为分配内存块。In some embodiments, when the size of the input tensor is greater than a tensor size threshold, there are no free memory blocks in the free memory block list, and the remaining memory in the first-level unallocated memory is less than the size of the input tensor, a memory block matching the size of the input tensor is divided from the second-level unallocated memory as an allocated memory block.

步骤409:从未分配内存中获取与输入张量的大小匹配的分配内存块。Step 409: Get an allocated memory block matching the size of the input tensor from the unallocated memory.

在一些实施例中,在网络层算子对应的输入张量的大小小于或等于张量大小阈值的情况下,从未分配内存中获取与输入张量的大小匹配的分配内存块。In some embodiments, when the size of the input tensor corresponding to the network layer operator is less than or equal to the tensor size threshold, an allocated memory block matching the size of the input tensor is obtained from the unallocated memory.

示例性地,在网络层算子对应的输入张量的大小小于或等于张量大小阈值的情况下,不进行内存块的复用,直接从未分配内存中获取与输入张量的大小匹配的分配内存块。如果能够从一级未分配内存中划分出与输入张量的大小匹配的内存块,则优先从一级未分配内存上分配,若无法从一级未分配内存中划分出与输入张量的大小匹配的内存块,则从二级未分配内存上分配。Exemplarily, when the size of the input tensor corresponding to the network layer operator is less than or equal to the tensor size threshold, the memory block is not reused, and the allocated memory block matching the size of the input tensor is directly obtained from the unallocated memory. If a memory block matching the size of the input tensor can be divided from the first-level unallocated memory, it is allocated from the first-level unallocated memory first. If a memory block matching the size of the input tensor cannot be divided from the first-level unallocated memory, it is allocated from the second-level unallocated memory.

其中,张量大小阈值的设定用于减少较小的输入张量对应的内存块的产生。Among them, the setting of the tensor size threshold is used to reduce the generation of memory blocks corresponding to smaller input tensors.

在一些实施例中,基于计算图,确定网络层算子的输入张量和输出张量对应的生命周期;计算机设备响应于已分配内存块列表中的内存块达到生命周期,释放内存块,并将内存块放入空闲内存块列表。In some embodiments, based on the computational graph, the life cycles corresponding to the input tensors and output tensors of the network layer operator are determined; in response to a memory block in the allocated memory block list reaching its life cycle, the computer device releases the memory block and places the memory block in the free memory block list.

已分配内存块列表用于存放已被占用的内存块。The allocated memory block list is used to store occupied memory blocks.

生命周期用于表示张量在内存块中占用的时间,即,在输入张量和输出张量在达到生命周期后不再被其它网络层算子使用。The lifecycle is used to indicate the time that a tensor occupies a memory block, that is, the input tensor and output tensor are no longer used by other network layer operators after reaching their lifecycle.

示例性地,计算机设备响应于已分配内存块列表中的内存块达到生命周期,释放内存块;在当前释放内存块的相邻位置存在已释放内存块的情况下,将当前释放内存块与已释放内存块进行合并,得到合并释放内存块;将合并释放内存块放入空闲内存块列表。Exemplarily, in response to a memory block in a list of allocated memory blocks reaching its life cycle, the computer device releases a memory block; if there is a released memory block in an adjacent position of the currently released memory block, the currently released memory block is merged with the released memory block to obtain a merged released memory block; and the merged released memory block is placed in a list of free memory blocks.

合并释放内存块是指当前释放内存块和已释放内存块进行合并得到的内存块。The merged released memory block refers to a memory block obtained by merging the currently released memory block and the released memory block.

在一种可能的实施方式中,张量的生命周期可以采用使用如下形式表示:[使用张量的第一个网络层算子,使用张量的最后一个网络层算子]。计算机设备确定当前运行的网络运算层是否为生命周期中使用该张量的最后一个网络层算子。若是,则确定达到生命周期,若不是,则确定未达到生命周期。In a possible implementation, the life cycle of a tensor can be expressed in the following form: [the first network layer operator that uses the tensor, the last network layer operator that uses the tensor]. The computer device determines whether the currently running network operation layer is the last network layer operator that uses the tensor in the life cycle. If so, it is determined that the life cycle has been reached, and if not, it is determined that the life cycle has not been reached.

例如,如图9所示出的释放内存块的示意图,如图9中的(a)所示,一级/二级空闲内存块列表902中大小分别为2M和6M的内存块(位于列表尾部)即将释放。如图9中的(b)所示,在2M当前释放内存块的相邻位置存在2M已释放内存块的和6M已释放内存块的情况下,在一级/二级已分配内存块列表902中将2M当前释放内存块与2M、6M已释放内存块进行合并,得到10M合并释放内存块;将10M合并释放内存块放入一级/二级空闲内存块列表902,一级/二级已分配内存块列表902中仅剩下两个内存块,分别为2M和4M。For example, as shown in the schematic diagram of releasing memory blocks in FIG9 , as shown in (a) in FIG9 , the memory blocks of size 2M and 6M (located at the end of the list) in the primary/secondary free memory block list 902 are about to be released. As shown in (b) in FIG9 , in the case where there are 2M released memory blocks and 6M released memory blocks in the adjacent position of the 2M currently released memory block, the 2M currently released memory block is merged with the 2M and 6M released memory blocks in the primary/secondary allocated memory block list 902 to obtain a 10M merged released memory block; the 10M merged released memory block is put into the primary/secondary free memory block list 902, and only two memory blocks, 2M and 4M, are left in the primary/secondary allocated memory block list 902.

在一些实施例中,网络层算子包括数据处理层算子。In some embodiments, the network layer operators include data processing layer operators.

数据处理层算子用于调整神经网络模型中的数据格式。数据处理层算子对应的网络层称为数据变换层。The data processing layer operator is used to adjust the data format in the neural network model. The network layer corresponding to the data processing layer operator is called the data transformation layer.

数据处理层算子包括形状重塑(Reshape)算子、拼接(Concat)算子及分裂(Split)算子中的至少一种,但不限于此,本申请实施例对此不作具体限定。The data processing layer operators include at least one of a reshape operator, a concatenation operator, and a split operator, but are not limited thereto, and the embodiments of the present application do not make specific limitations on this.

形状重塑算子用于对输入张量的形状进行重塑,以将输入张量的形状重塑为目标形状,但是在重塑数据的过程中,不改变数据包含的元素个数和元素在数据中的排布情况。 The reshape operator is used to reshape the input tensor to reshape the input tensor into a target shape, but in the process of reshaping the data, the number of elements contained in the data and the arrangement of the elements in the data are not changed.

例如,输入形状重塑算子的输入张量以矩阵的形式表示,如图10所示出的形状重塑算子进行重塑的示意图,输入形状重塑算子的矩阵的尺寸为[2,3,4](具有2个通道的3行4列矩阵),即输入形状重塑算子的矩阵为2×3×4的张量,形状重塑算子输出的矩阵的尺寸为[6,4](单通道的6行4列矩阵),也即是形状重塑算子用于将尺寸为[2,3,4]的矩阵变换为尺寸为[6,4]的矩阵。For example, the input tensor of the input reshape operator is represented in the form of a matrix. As shown in the schematic diagram of the reshape operator performing reshape, in FIG10 , the size of the matrix of the input reshape operator is [2, 3, 4] (a 3-row 4-column matrix with 2 channels), that is, the matrix of the input reshape operator is a 2×3×4 tensor, and the size of the matrix output by the reshape operator is [6, 4] (a 6-row 4-column matrix with a single channel), that is, the reshape operator is used to transform a matrix of size [2, 3, 4] into a matrix of size [6, 4].

拼接算子用于将至少两个输入张量进行拼接。例如,输入拼接算子的输入张量以矩阵的形式表示,如图11所示出的拼接算子进行拼接的示意图,输入拼接算子的矩阵的尺寸为:张量A=[1,3,2](单通道的3行2列矩阵),张量B=[2,3,2](具有2个通道的3行4列矩阵),拼接算子输出的矩阵的尺寸为:张量C=[3,3,2](具有3个通道的3行4列矩阵),也即是拼接算子用于将尺寸为[1,3,2]、[2,3,2]的两个张量拼接为[3,3,2]的矩阵。The splicing operator is used to splice at least two input tensors. For example, the input tensor of the splicing operator is represented in the form of a matrix, as shown in FIG11 , the schematic diagram of the splicing operator for splicing, the size of the matrix of the input splicing operator is: tensor A = [1, 3, 2] (single-channel 3-row 2-column matrix), tensor B = [2, 3, 2] (3-row 4-column matrix with 2 channels), the size of the matrix output by the splicing operator is: tensor C = [3, 3, 2] (3-row 4-column matrix with 3 channels), that is, the splicing operator is used to splice two tensors of size [1, 3, 2] and [2, 3, 2] into a matrix of [3, 3, 2].

分裂算子用于将输入张量按照分裂维度进行分裂,分裂为至少两个子输入张量,该分裂算子可以理解为拼接算子的一个逆过程。分裂维度为最高维度或者首个元素个数不为1的维度,则分裂算子的输出张量复用输入张量占用的内存块。比如,以张量A=[1,128,32,32]为例,其最高维度为维度0,即元素个数为1的维度;首个元素个数不为1的维度指的是维度1,即元素个数为128的维度。The split operator is used to split the input tensor according to the split dimension, splitting it into at least two sub-input tensors. The split operator can be understood as an inverse process of the splicing operator. If the split dimension is the highest dimension or the dimension whose first element number is not 1, the output tensor of the split operator reuses the memory block occupied by the input tensor. For example, taking tensor A = [1, 128, 32, 32] as an example, its highest dimension is dimension 0, that is, the dimension with 1 element number; the dimension whose first element number is not 1 refers to dimension 1, that is, the dimension with 128 elements.

例如,输入分裂算子的输入张量以矩阵的形式表示,如图12所示出的分裂算子进行分裂的示意图,输入分裂算子的矩阵的尺寸为:张量C=[3,3,2](具有3个通道的3行2列矩阵),分裂算子输出的矩阵的尺寸为:张量A=[1,3,2](单通道的3行4列矩阵),张量B=[2,3,2](具有2个通道的3行4列矩阵),也即是分裂算子用于将尺寸为[3,3,2]的张量分裂为[1,3,2]、[2,3,2]的矩阵。For example, the input tensor of the input splitting operator is represented in the form of a matrix. As shown in the schematic diagram of the splitting operator splitting in Figure 12, the size of the matrix of the input splitting operator is: tensor C = [3, 3, 2] (a 3-row 2-column matrix with 3 channels), and the size of the matrix output by the splitting operator is: tensor A = [1, 3, 2] (a 3-row 4-column matrix with a single channel), tensor B = [2, 3, 2] (a 3-row 4-column matrix with 2 channels), that is, the splitting operator is used to split a tensor of size [3, 3, 2] into matrices of [1, 3, 2] and [2, 3, 2].

示例性地,计算机设备获取数据处理层算子对应的输入张量和输出张量;示例性地,计算机设备将输出张量配置为复用所述输入张量占用的分配内存块。Exemplarily, the computer device obtains the input tensor and output tensor corresponding to the data processing layer operator; exemplary, the computer device configures the output tensor to reuse the allocated memory block occupied by the input tensor.

可选地,数据处理层算子包括形状重塑算子,输出张量包括形状重塑张量。计算机设备基于输入张量占用的分配内存块,将输入张量占用的分配内存块分配给形状重塑张量。Optionally, the data processing layer operator includes a reshape operator, and the output tensor includes the reshape tensor. The computer device allocates the allocated memory block occupied by the input tensor to the reshape tensor based on the allocated memory block occupied by the input tensor.

形状重塑算子用于调整输入张量的形状,但不改变输入张量中的数据,形状重塑张量是指形状重塑算子输出的张量。The reshape operator is used to adjust the shape of the input tensor without changing the data in the input tensor. The reshape tensor refers to the tensor output by the reshape operator.

形状重塑算子是对输入张量的形状进行重塑,并不会改变输入张量在内存块中的数据,所以形状重塑算子在神经网络模型运行时是将输入张量的内存数据拷贝到输出张量所在的内存(也即是下一层网络层的输入张量所在的内存)。在本申请实施例中,计算机设备可以通过使形状重塑算子的输出张量复用形状重塑算子的输入张量占用的内存块,来消除神经网络模型运行时形状重塑算子对应的数据拷贝操作。The reshape operator reshapes the input tensor and does not change the data of the input tensor in the memory block. Therefore, when the neural network model is running, the reshape operator copies the memory data of the input tensor to the memory where the output tensor is located (that is, the memory where the input tensor of the next network layer is located). In the embodiment of the present application, the computer device can eliminate the data copy operation corresponding to the reshape operator when the neural network model is running by making the output tensor of the reshape operator reuse the memory block occupied by the input tensor of the reshape operator.

例如,张量A=[1,1,512,32,32]经过形状重塑算子,将张量A中的维度3和维度4合并,输出张量B=[1,512,1024](假设张量A、张量B数据类型都为Float 32),如果计算机设备为形状重塑算子的输入张量A分配了512*32*32*4=2MB的内存块A,那么计算机设备为形状重塑算子的输出张量B分配内存是将内存块A分配给张量B,神经网络模型运行时该场景的形状重塑算子就无需执行数据搬运操作,避免频繁访存造成的消耗。For example, tensor A = [1, 1, 512, 32, 32] passes through the reshape operator, and dimensions 3 and 4 in tensor A are merged, and the output tensor B = [1, 512, 1024] (assuming that the data types of tensor A and tensor B are both Float 32). If the computer device allocates a memory block A of 512*32*32*4=2MB for the input tensor A of the reshape operator, then the computer device allocates memory block A to tensor B when allocating memory for the output tensor B of the reshape operator. When the neural network model is running, the reshape operator in this scenario does not need to perform data transfer operations, thus avoiding consumption caused by frequent memory accesses.

可选地,数据处理层算子包括拼接算子。计算机设备确定输出张量占用的分配内存块;计算机设备将至少两个输入张量配置为偏移复用输出张量占用的分配内存块。Optionally, the data processing layer operator includes a concatenation operator. The computer device determines an allocated memory block occupied by the output tensor; and the computer device configures at least two input tensors to offset-reuse the allocated memory block occupied by the output tensor.

拼接算子是对2个或者以上的输入张量按照拼接维度进行拼接。The concatenation operator concatenates two or more input tensors according to the concatenation dimension.

拼接维度为最高维度或者首个元素个数不为1的维度。在本申请实施例中,如果拼接算子指定的拼接维度为最高维度或者首个元素个数不为1的维度,则拼接算子的输出张量和输入张量可复用同一内存块。计算机设备可以通过使拼接算子的多个输入张量按偏移复用拼接算子的输出张量占用的内存块,来消除神经网络模型运行时拼接算子对应的数据拷贝操作。The splicing dimension is the highest dimension or the dimension whose first number of elements is not 1. In an embodiment of the present application, if the splicing dimension specified by the splicing operator is the highest dimension or the dimension whose first number of elements is not 1, the output tensor and input tensor of the splicing operator can reuse the same memory block. The computer device can eliminate the data copy operation corresponding to the splicing operator when the neural network model is running by making multiple input tensors of the splicing operator reuse the memory block occupied by the output tensor of the splicing operator according to the offset.

例如,如图13所示出的拼接算子进行拼接的示意图,张量A=[512,32,32],张量B=[256,32,32],假设经过拼接维度为0的拼接算子,将张量A和张量B拼接为张量C=[768,32,32](假设张量A、张量B和张量C的数据类型都为Float 32),如果计算机设备为拼接算子 的输入张量A分配了768*32*32*4=3MB的内存块C,那么将内存块C根据输入张量A和张量B的大小划分为大小分别为2MB和1MB的两个子内存块A和子内存块B,并将子内存块A和子内存块B分别用于存储张量A和张量B,那么神经网络模型运行时,拼接算子就无需执行数据搬运操作,避免频繁访存造成的消耗。For example, as shown in the schematic diagram of the splicing operator in FIG13, tensor A = [512, 32, 32], tensor B = [256, 32, 32], assuming that after the splicing operator with a splicing dimension of 0, tensor A and tensor B are spliced into tensor C = [768, 32, 32] (assuming that the data types of tensor A, tensor B and tensor C are all Float 32), if the computer device is a splicing operator The input tensor A is allocated a memory block C of 768*32*32*4=3MB. Then the memory block C is divided into two sub-memory blocks A and sub-memory blocks B with sizes of 2MB and 1MB respectively according to the sizes of the input tensors A and B, and sub-memory blocks A and sub-memory blocks B are used to store tensors A and B respectively. Then, when the neural network model is running, the splicing operator does not need to perform data transfer operations, avoiding consumption caused by frequent memory access.

可选地,数据处理层算子包括分裂算子,输出张量包括至少两个子输出张量。计算机设备划分输入张量占用的分配内存块,得到至少两个子输入张量各自对应的子内存块;计算机设备将子内存块分配给至少两个子输出张量。Optionally, the data processing layer operator includes a splitting operator, and the output tensor includes at least two sub-output tensors. The computer device divides the allocated memory block occupied by the input tensor to obtain sub-memory blocks corresponding to the at least two sub-input tensors respectively; and the computer device allocates the sub-memory blocks to the at least two sub-output tensors.

分裂算子用于将输入张量划分为至少两个子输入张量,子输出张量指数据处理层算子输出的张量。The splitting operator is used to divide the input tensor into at least two sub-input tensors, and the sub-output tensor refers to the tensor output by the data processing layer operator.

分裂算子可以理解为拼接算子的逆运算,分裂算子是将输入张量按照分裂维度进行分裂,生成多个输出张量。如果分裂算子指定的分裂维度为最高维度或者首个元素个数不为1的维度,则分裂算子的输出张量复用输入张量占用的内存块。在本申请实施例中,计算机设备可以通过使分裂算子的多个张量按偏移复用分裂算子的输出张量占用的内存块来消除神经网络模型运行时,分裂算子对应的数据拷贝填充操作。例如,如图14所示出的分裂算子进行分裂的示意图,张量C=[768,32,32],假设张量C经过分裂维度为0的分裂算子,维度0由768分裂为512和256,对应输出张量A=[512,32,32],张量B=[256,32,32](假设张量A、张量B和张量C的数据类型都为Float 32),如果计算机设备为分裂算子的输入张量A分配了768*32*32*4=3MB的内存块C,那么将内存块C根据输入张量A、张量B的大小划分为大小分别为2MB和1MB的两个子内存块A和子内存块B,并将子内存块A和子内存块B分别分配给张量A和张量B,那么神经网络模型运行时,该场景的分裂算子就无需执行数据搬运操作,避免频繁访存造成的消耗。The splitting operator can be understood as the inverse operation of the splicing operator. The splitting operator splits the input tensor according to the splitting dimension to generate multiple output tensors. If the splitting dimension specified by the splitting operator is the highest dimension or the dimension whose first element number is not 1, the output tensor of the splitting operator reuses the memory block occupied by the input tensor. In an embodiment of the present application, the computer device can eliminate the data copy filling operation corresponding to the splitting operator when the neural network model is running by making multiple tensors of the splitting operator reuse the memory block occupied by the output tensor of the splitting operator according to the offset. For example, as shown in the schematic diagram of the splitting operator in FIG14 , tensor C = [768, 32, 32], assuming that tensor C passes through the splitting operator with a splitting dimension of 0, and the dimension 0 is split from 768 to 512 and 256, corresponding to the output tensor A = [512, 32, 32], tensor B = [256, 32, 32] (assuming that the data types of tensors A, B and C are all Float 32), if the computer device allocates a memory block C of 768*32*32*4=3MB for the input tensor A of the splitting operator, then the memory block C is divided into two sub-memory blocks A and sub-memory blocks B with sizes of 2MB and 1MB respectively according to the sizes of the input tensors A and B, and the sub-memory blocks A and B are allocated to tensors A and B respectively. Then, when the neural network model is running, the splitting operator in this scenario does not need to perform data transfer operations, thereby avoiding consumption caused by frequent memory accesses.

综上所述,本实施例提供的方法,通过获取神经网络模型对应的计算图;基于计算图,确定待分配至网络层算子的内存大小;基于内存大小,通过从空闲内存块列表中的空闲内存块中获取与内存大小匹配的分配内存块,将分配内存块分配给网络层算子。本申请通过复用空闲内存块列表中的空闲内存块,为网络层算子分配到与内存大小匹配的分配内存块,从而减少神经网络模型在运行过程中分配的内存,提高了内存的利用率。In summary, the method provided in this embodiment obtains the calculation graph corresponding to the neural network model; based on the calculation graph, determines the memory size to be allocated to the network layer operator; based on the memory size, obtains the allocated memory block that matches the memory size from the free memory blocks in the free memory block list, and allocates the allocated memory block to the network layer operator. This application allocates the allocated memory block that matches the memory size to the network layer operator by reusing the free memory blocks in the free memory block list, thereby reducing the memory allocated by the neural network model during operation and improving memory utilization.

本实施例提供的方法,通过判断输入张量的大小与张量大小阈值之间的大小,从而确定不同的获取方式;基于不同的获取方式从空闲内存块列表中获取与输入张量的大小匹配的分配内存块,从而减少神经网络模型在运行过程中分配的内存,提高了内存的利用率。The method provided in this embodiment determines different acquisition methods by judging the size between the size of the input tensor and the tensor size threshold; based on the different acquisition methods, the allocated memory block matching the size of the input tensor is obtained from the free memory block list, thereby reducing the memory allocated during the operation of the neural network model and improving memory utilization.

本实施例提供的方法,通过从空闲内存块列表和未分配内存中组合获取与输入张量的大小匹配的分配内存块,从而减少神经网络模型在运行过程中分配的内存,提高了内存的利用率。The method provided in this embodiment reduces the memory allocated during the operation of the neural network model and improves memory utilization by combining a free memory block list and unallocated memory to obtain an allocated memory block that matches the size of the input tensor.

本实施例提供的方法,在输入张量的大小小于或等于张量大小阈值的情况下,直接从未分配内存中获取与输入张量的大小匹配的分配内存块,避免了小的内存块的产生,提高了内存的利用率。The method provided in this embodiment directly obtains an allocated memory block that matches the size of the input tensor from unallocated memory when the size of the input tensor is less than or equal to the tensor size threshold, thereby avoiding the generation of small memory blocks and improving memory utilization.

本实施例提供的方法,在释放内存块时,将当前释放内存块与已释放内存块进行合并,得到大的合并释放内存块,并将合并释放内存块放入空闲内存块列表。通过上述方法将零散的空闲内存块进行合并,从而使得空闲内存块列表中的空闲内存块可以应用于多种分配场景,提高了内存块的分配效率。The method provided in this embodiment, when releasing a memory block, merges the currently released memory block with the released memory block to obtain a large merged released memory block, and puts the merged released memory block into a free memory block list. By merging scattered free memory blocks through the above method, the free memory blocks in the free memory block list can be applied to a variety of allocation scenarios, thereby improving the allocation efficiency of the memory block.

本实施例提供的方法,针对神经网络模型中的数据处理层算子,使数据处理层算子的输入和输出复用同一个内存块,减少了神经网络模型运行时的数据搬运开销,提高了内存的利用率。The method provided in this embodiment enables the input and output of the data processing layer operators in the neural network model to reuse the same memory block, thereby reducing the data transfer overhead when the neural network model is running and improving memory utilization.

图15是本申请一个示例性实施例提供的神经网络模型的内存管理方法的流程图。该方法可以由计算机设备执行,计算机设备可以是终端或服务器。该方法包括以下步骤。Figure 15 is a flow chart of a memory management method for a neural network model provided by an exemplary embodiment of the present application. The method can be executed by a computer device, which can be a terminal or a server. The method includes the following steps.

步骤1501:获取待分配至当前网络层算子的输入张量的大小。Step 1501: Get the size of the input tensor to be allocated to the current network layer operator.

输入张量用以表示输入至网络层算子中的多维数组。 Input tensors are used to represent multidimensional arrays that are input to network layer operators.

以ONNX标准为例,张量以(秩(rank)、形状(shape)、节点类型(date type))三元组来表示。Taking the ONNX standard as an example, a tensor is represented by a triple of (rank, shape, node type).

输入张量的大小用于表示网络层算子的输入数据在神经网络模型运行时需要占用的内存大小。The size of the input tensor is used to indicate the memory size that the input data of the network layer operator needs to occupy when the neural network model is running.

示例性地,计算机设备基于计算图,确定待分配至网络层算子的输入张量的大小。Exemplarily, the computer device determines the size of the input tensor to be allocated to the network layer operator based on the computation graph.

步骤1502:判断输入张量的大小是否小于张量大小阈值。Step 1502: Determine whether the size of the input tensor is less than the tensor size threshold.

张量大小阈值是指能够进行内存复用的最小的内存块的大小。The tensor size threshold refers to the size of the smallest memory block that can be reused.

计算机设备判断输入张量的大小是否小于张量大小阈值,在输入张量的大小小于张量大小阈值的情况下,执行步骤1508;在输入张量的大小大于或等于张量大小阈值的情况下,执行步骤1503。The computer device determines whether the size of the input tensor is less than the tensor size threshold. If the size of the input tensor is less than the tensor size threshold, execute step 1508; if the size of the input tensor is greater than or equal to the tensor size threshold, execute step 1503.

步骤1503:获取空闲内存块列表中最大的空闲内存块。Step 1503: Get the largest free memory block in the free memory block list.

空闲内存块列表用于存放已被分配但被解除占用后的空闲内存块。The free memory block list is used to store free memory blocks that have been allocated but released.

示例性地,在输入张量的大小大于或等于张量大小阈值的情况下,计算机设备获取空闲内存块列表中最大的空闲内存块。Exemplarily, when the size of the input tensor is greater than or equal to the tensor size threshold, the computer device obtains the largest free memory block in the free memory block list.

步骤1504:判断最大的空闲内存块是否大于或等于输入张量的大小。Step 1504: Determine whether the largest free memory block is greater than or equal to the size of the input tensor.

示例性地,在获取空闲内存块列表中最大的空闲内存块后,计算机设备判断最大的空闲内存块是否大于或等于输入张量的大小,在最大的空闲内存块大于或等于输入张量的大小的情况下,执行步骤1505;在最大的空闲内存块小于输入张量的大小的情况下,执行步骤1506。Exemplarily, after obtaining the largest free memory block in the free memory block list, the computer device determines whether the largest free memory block is greater than or equal to the size of the input tensor. If the largest free memory block is greater than or equal to the size of the input tensor, step 1505 is executed; if the largest free memory block is smaller than the size of the input tensor, step 1506 is executed.

步骤1505:将最大的空闲内存块划分成两个内存块,一个与输入张量的大小匹配,分配至网络层算子,另一个放回至空闲内存块列表。Step 1505: Divide the largest free memory block into two memory blocks, one matching the size of the input tensor and allocated to the network layer operator, and the other put back into the free memory block list.

示例性地,在最大的空闲内存块大于或等于输入张量的大小的情况下,计算机设备将最大的空闲内存块划分成两个内存块,一个与输入张量的大小进行匹配,得到与输入张量的大小匹配的分配内存块,并将分配内存块分配至网络层算子;另一个放回至空闲内存块列表,以备下次使用。Exemplarily, when the largest free memory block is greater than or equal to the size of the input tensor, the computer device divides the largest free memory block into two memory blocks, one of which matches the size of the input tensor to obtain an allocated memory block that matches the size of the input tensor, and allocates the allocated memory block to the network layer operator; the other is put back into the free memory block list for next use.

步骤1506:判断最大的空闲内存块是否处于末尾。Step 1506: Determine whether the largest free memory block is at the end.

空闲内存块列表中的空闲内存块由小到大依次排列。The free memory blocks in the free memory block list are arranged in order from small to large.

示例性地,在最大的空闲内存块小于输入张量的大小的情况下,判断最大的空闲内存块是否处于末尾,即判断空闲内存块列表中是否具有空闲内存块;在最大的空闲内存块处于末尾的情况下,执行步骤1507;在最大的空闲内存块未处于末尾的情况下,执行步骤1508。Exemplarily, when the largest free memory block is smaller than the size of the input tensor, determine whether the largest free memory block is at the end, that is, determine whether there are free memory blocks in the free memory block list; when the largest free memory block is at the end, execute step 1507; when the largest free memory block is not at the end, execute step 1508.

步骤1507:取出最大的空闲内存块,并从未分配内存中划分得到合并内存块,将最大的空闲内存块和合并内存块合并后,分配至网络层算子。Step 1507: Take out the largest free memory block, divide it from the unallocated memory to obtain a merged memory block, merge the largest free memory block and the merged memory block, and allocate them to the network layer operator.

示例性地,在最大的空闲内存块处于末尾的情况下,取出最大的空闲内存块,并从未分配内存中划分得到合并内存块,将最大的空闲内存块和合并内存块合并后,分配至网络层算子。Exemplarily, when the largest free memory block is at the end, the largest free memory block is taken out and divided from the unallocated memory to obtain a merged memory block, and the largest free memory block and the merged memory block are merged and allocated to the network layer operator.

步骤1508:判断一级未分配内存中的剩余内存是否大于/等于输入张量的大小。Step 1508: Determine whether the remaining memory in the first-level unallocated memory is greater than/equal to the size of the input tensor.

示例性地,在最大的空闲内存块未处于末尾的情况下,进一步判断一级未分配内存中的剩余内存是否大于/等于输入张量的大小,在一级未分配内存中的剩余内存大于/等于输入张量的大小的情况下,执行步骤1509;在一级未分配内存中的剩余内存小于输入张量的大小的情况下,执行步骤1510。Exemplarily, when the largest free memory block is not at the end, it is further determined whether the remaining memory in the first-level unallocated memory is greater than/equal to the size of the input tensor. When the remaining memory in the first-level unallocated memory is greater than/equal to the size of the input tensor, step 1509 is executed; when the remaining memory in the first-level unallocated memory is less than the size of the input tensor, step 1510 is executed.

步骤1509:从一级未分配内存中划分与输入张量的大小匹配的内存块作为分配内存块,分配至网络层算子。Step 1509: Divide a memory block that matches the size of the input tensor from the first-level unallocated memory as an allocated memory block, and allocate it to the network layer operator.

示例性地,在一级未分配内存中的剩余内存大于/等于输入张量的大小的情况下,直接从一级未分配内存中划分与输入张量的大小匹配的内存块作为分配内存块,并分配至网络层算子。Exemplarily, when the remaining memory in the first-level unallocated memory is greater than/equal to the size of the input tensor, a memory block matching the size of the input tensor is directly divided from the first-level unallocated memory as an allocated memory block and allocated to the network layer operator.

步骤1510:从二级未分配内存中划分与输入张量的大小匹配的内存块作为分配内存块,分配至网络层算子。 Step 1510: Divide a memory block that matches the size of the input tensor from the secondary unallocated memory as an allocated memory block, and allocate it to the network layer operator.

示例性地,在一级未分配内存中的剩余内存小于输入张量的大小的情况下,从二级未分配内存中划分与输入张量的大小匹配的内存块作为分配内存块,分配至网络层算子。Exemplarily, when the remaining memory in the first-level unallocated memory is smaller than the size of the input tensor, a memory block matching the size of the input tensor is divided from the second-level unallocated memory as an allocated memory block and allocated to the network layer operator.

图16是本申请一个示例性实施例提供的AI芯片的结构图。该AI芯片包括AI处理器1601、内存1602以及内存管理单元1603。FIG16 is a structural diagram of an AI chip provided by an exemplary embodiment of the present application. The AI chip includes an AI processor 1601 , a memory 1602 , and a memory management unit 1603 .

其中,AI处理器1601用于运行神经网络模型。可选的,AI处理器1601可以包括多个处理器核心簇,每个处理器核心簇中包含多个处理器核心。The AI processor 1601 is used to run the neural network model. Optionally, the AI processor 1601 may include multiple processor core clusters, each of which includes multiple processor cores.

可选的,内存1602可以由多级缓存构成。比如,内存1602有L2和L3缓存构成。Optionally, the memory 1602 may be composed of multiple levels of cache. For example, the memory 1602 may be composed of L2 and L3 caches.

内存管理单元1603用于为运行中神经网络模型中的网络层分配内存块。该内存管理单元1603用于实现本申请各个实施例所述的神经网络模型的内存管理方法。The memory management unit 1603 is used to allocate memory blocks to the network layers in the running neural network model. The memory management unit 1603 is used to implement the memory management method of the neural network model described in various embodiments of the present application.

图17示出了本申请一个示例性实施例提供的神经网络模型的内存管理装置的结构示意图。该装置包括以下模块。Fig. 17 shows a schematic diagram of the structure of a memory management device for a neural network model provided by an exemplary embodiment of the present application. The device includes the following modules.

获取模块1701,用于获取神经网络模型对应的计算图,所述计算图中包括至少两个网络层算子,所述网络层算子用于表示所述神经网络模型中的网络层。The acquisition module 1701 is used to obtain a calculation graph corresponding to the neural network model, wherein the calculation graph includes at least two network layer operators, and the network layer operators are used to represent the network layers in the neural network model.

确定模块1702,用于基于所述计算图,确定待分配至所述网络层算子的内存大小,所述内存大小用于表示所述网络层算子在所述神经网络模型运行时需要占用的内存大小。Determination module 1702 is used to determine the memory size to be allocated to the network layer operator based on the calculation graph, and the memory size is used to represent the memory size that the network layer operator needs to occupy when the neural network model is running.

分配模块1703,用于从空闲内存块列表中获取与所述内存大小匹配的分配内存块,将所述分配内存块分配给所述网络层算子。The allocation module 1703 is used to obtain an allocated memory block matching the memory size from the free memory block list, and allocate the allocated memory block to the network layer operator.

其中,所述空闲内存块列表用于存放已被分配但被解除占用后的空闲内存块,所述分配内存块是指被分配给所述网络层算子用于存储数据的内存块。The free memory block list is used to store free memory blocks that have been allocated but released, and the allocated memory blocks refer to memory blocks allocated to the network layer operator for storing data.

在一些实施例中,获取模块1701,用于获取所述网络层算子的排列顺序,所述排列顺序用于表示所述网络层算子在所述神经网络模型运行时的执行顺序。In some embodiments, the acquisition module 1701 is used to obtain the arrangement order of the network layer operators, and the arrangement order is used to represent the execution order of the network layer operators when the neural network model is running.

在一些实施例中,分配模块1703,用于在所述网络层算子的所述输入张量的大小大于张量大小阈值的情况下,从所述空闲内存块列表中获取与所述输入张量的大小匹配的所述分配内存块,所述张量大小阈值是内存复用的最小的内存块大小。In some embodiments, the allocation module 1703 is used to obtain the allocated memory block matching the size of the input tensor from the free memory block list when the size of the input tensor of the network layer operator is greater than a tensor size threshold, and the tensor size threshold is the minimum memory block size for memory reuse.

在一些实施例中,分配模块1703,用于按照所述排列顺序将所述分配内存块分配给所述网络层算子用于存储所述输入张量。In some embodiments, the allocation module 1703 is used to allocate the allocated memory blocks to the network layer operators for storing the input tensors in accordance with the arrangement order.

其中,所述输入张量是指输入至所述网络层算子中的多维数组。The input tensor refers to a multidimensional array input into the network layer operator.

在一些实施例中,分配模块1703,用于在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中存在至少一个空闲内存块的大小大于或等于所述输入张量的大小的情况下,从所述空闲内存块列表中获取与所述输入张量的大小匹配的所述分配内存块。In some embodiments, the allocation module 1703 is used to obtain the allocated memory block that matches the size of the input tensor from the free memory block list when the size of the input tensor is greater than the tensor size threshold and there is at least one free memory block in the free memory block list whose size is greater than or equal to the size of the input tensor.

在一些实施例中,分配模块1703,用于在所述输入张量的大小大于所述张量大小阈值的情况下,将所述空闲内存块列表中的第一空闲内存块作为所述分配内存块。In some embodiments, the allocation module 1703 is used to use the first free memory block in the free memory block list as the allocated memory block when the size of the input tensor is greater than the tensor size threshold.

在一些实施例中,分配模块1703,用于在所述输入张量的大小大于所述张量大小阈值的情况下,从所述空闲内存块列表中的第二空闲内存块中划分出与所述输入张量的大小匹配的第三内存块,将所述第三内存块作为所述分配内存块。In some embodiments, the allocation module 1703 is used to divide a third memory block matching the size of the input tensor from the second free memory block in the free memory block list when the size of the input tensor is greater than the tensor size threshold, and use the third memory block as the allocated memory block.

其中,所述第一空闲内存块的大小与所述输入张量的大小相同,所述第二空闲内存块的大小大于所述输入张量的大小。The size of the first free memory block is the same as the size of the input tensor, and the size of the second free memory block is larger than the size of the input tensor.

在一些实施例中,分配模块1703,用于在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中的所述空闲内存块的大小小于所述输入张量的大小的情况下,从所述空闲内存块列表和未分配内存中获取与所述输入张量的大小匹配的所述分配内存块。In some embodiments, the allocation module 1703 is used to obtain the allocated memory block matching the size of the input tensor from the free memory block list and the unallocated memory when the size of the input tensor is greater than the tensor size threshold and the size of the free memory block in the free memory block list is smaller than the size of the input tensor.

其中,所述未分配内存是指存储空间中未被分配占用过的内存。The unallocated memory refers to the memory in the storage space that has not been allocated or occupied.

在一些实施例中,分配模块1703,用于在所述输入张量的大小大于所述张量大小阈值的情况下,将所述空闲内存块列表中的第四空闲内存块,与所述未分配内存中的合并内存块合并,得到所述分配内存块。In some embodiments, the allocation module 1703 is used to merge the fourth free memory block in the free memory block list with the merged memory block in the unallocated memory to obtain the allocated memory block when the size of the input tensor is greater than the tensor size threshold.

其中,所述第四空闲内存块的大小小于所述输入张量的大小,所述合并内存块是从所述未分配内存中划分得到的内存块,所述合并内存块的大小为所述输入张量的大小与所述第四 空闲内存块的大小的差值。The size of the fourth free memory block is smaller than the size of the input tensor, the merged memory block is a memory block obtained by dividing the unallocated memory, and the size of the merged memory block is the sum of the size of the input tensor and the size of the fourth free memory block. The difference in the sizes of free memory blocks.

在一些实施例中,分配模块1703,用于在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中没有空闲内存块的情况下,从所述未分配内存中划分出与所述输入张量的大小匹配的内存块作为所述分配内存块。In some embodiments, the allocation module 1703 is used to allocate a memory block matching the size of the input tensor from the unallocated memory as the allocated memory block when the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list.

所述未分配内存包括一级未分配内存和二级未分配内存,所述一级未分配内存的分配优先级高于所述二级未分配内存的分配优先级。The unallocated memory includes first-level unallocated memory and second-level unallocated memory, and the allocation priority of the first-level unallocated memory is higher than the allocation priority of the second-level unallocated memory.

在一些实施例中,分配模块1703,用于在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中没有空闲内存块的情况下,从所述一级未分配内存或所述二级未分配内存中划分与所述输入张量的大小匹配的内存块作为所述分配内存块。In some embodiments, the allocation module 1703 is used to allocate a memory block matching the size of the input tensor from the first-level unallocated memory or the second-level unallocated memory as the allocated memory block when the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list.

在一些实施例中,分配模块1703,用于在所述输入张量的大小大于所述张量大小阈值,所述空闲内存块列表中没有空闲内存块,且所述一级未分配内存中的剩余内存大于或等于所述输入张量的大小的情况下,从所述一级未分配内存中划分与所述输入张量的大小匹配的内存块作为所述分配内存块。In some embodiments, the allocation module 1703 is used to allocate a memory block matching the size of the input tensor from the first-level unallocated memory as the allocated memory block when the size of the input tensor is greater than the tensor size threshold, there are no free memory blocks in the free memory block list, and the remaining memory in the first-level unallocated memory is greater than or equal to the size of the input tensor.

在一些实施例中,分配模块1703,用于在所述输入张量的大小大于所述张量大小阈值,所述空闲内存块列表中没有空闲内存块,且所述一级未分配内存中的剩余内存小于所述输入张量的大小的情况下,从所述二级未分配内存中划分与所述输入张量的大小匹配的内存块作为所述分配内存块。In some embodiments, the allocation module 1703 is used to allocate a memory block matching the size of the input tensor from the secondary unallocated memory as the allocated memory block when the size of the input tensor is greater than the tensor size threshold, there is no free memory block in the free memory block list, and the remaining memory in the first-level unallocated memory is less than the size of the input tensor.

在一些实施例中,分配模块1703,用于在所述网络层算子的所述输入张量的大小小于或等于张量大小阈值的情况下,从所述未分配内存中获取与所述输入张量的大小匹配的所述分配内存块,所述未分配内存是指存储空间中未被分配占用过的内存。In some embodiments, the allocation module 1703 is used to obtain the allocated memory block matching the size of the input tensor from the unallocated memory when the size of the input tensor of the network layer operator is less than or equal to the tensor size threshold, wherein the unallocated memory refers to the memory in the storage space that has not been allocated and occupied.

在一些实施例中,确定模块1702,用于基于所述计算图,确定所述网络层算子的输入张量和输出张量对应的生命周期,其中,输入张量和输出张量在达到所述生命周期后不再被其它网络层算子使用。In some embodiments, the determination module 1702 is used to determine the life cycle corresponding to the input tensor and the output tensor of the network layer operator based on the computation graph, wherein the input tensor and the output tensor are no longer used by other network layer operators after reaching the life cycle.

在一些实施例中,所述装置还包括释放模块1704,释放模块1704用于响应于已分配内存块列表中的内存块达到所述存储周期,释放所述内存块,并将所述内存块放入所述空闲内存块列表。In some embodiments, the apparatus further comprises a release module 1704, and the release module 1704 is used for releasing the memory block in response to the memory block in the allocated memory block list reaching the storage period, and putting the memory block into the free memory block list.

其中,所述已分配内存块列表用于存放已被占用的内存块。The allocated memory block list is used to store occupied memory blocks.

在一些实施例中,释放模块1704,用于响应于所述已分配内存块列表中的内存块达到所述生命周期,释放所述内存块。In some embodiments, the release module 1704 is used to release the memory block in response to the memory block in the allocated memory block list reaching the life cycle.

在一些实施例中,所述装置还包括合并模块1705,合并模块1705,用于在当前释放内存块的相邻位置存在已释放内存块的情况下,将所述当前释放内存块与所述已释放内存块合并,得到合并释放内存块。In some embodiments, the device further includes a merging module 1705, which is used to merge the currently released memory block with the released memory block to obtain a merged released memory block when there is a released memory block adjacent to the currently released memory block.

在一些实施例中,合并模块1705,用于将所述合并释放内存块放入所述空闲内存块列表。In some embodiments, the merging module 1705 is used to put the merged released memory block into the free memory block list.

在一些实施例中,获取模块1701,用于获取所述数据处理层算子对应的输入张量和输出张量。In some embodiments, the acquisition module 1701 is used to obtain the input tensor and output tensor corresponding to the data processing layer operator.

在一些实施例中,所述装置还包括复用模块1706,复用模块1706,用于将所述输出张量配置为复用所述输入张量占用的所述分配内存块。In some embodiments, the apparatus further comprises a multiplexing module 1706, the multiplexing module 1706 being configured to configure the output tensor to reuse the allocated memory block occupied by the input tensor.

在一些实施例中,复用模块1706,用于基于所述输入张量占用的所述分配内存块,将所述输入张量占用的所述分配内存块分配给所述形状重塑张量。In some embodiments, the multiplexing module 1706 is configured to allocate the allocated memory block occupied by the input tensor to the reshaped tensor based on the allocated memory block occupied by the input tensor.

其中,所述形状重塑算子用于调整所述输入张量的形状,但不改变所述输入张量中的数据,所述形状重塑张量指所述形状重塑算子输出的张量。The reshape operator is used to adjust the shape of the input tensor but does not change the data in the input tensor, and the reshape tensor refers to the tensor output by the reshape operator.

在一些实施例中,复用模块1706,用于划分所述输入张量占用的所述分配内存块,得到所述至少两个子输入张量各自对应的子内存块;将子内存块分配给所述至少两个子输出张量。In some embodiments, the multiplexing module 1706 is used to divide the allocated memory block occupied by the input tensor to obtain sub-memory blocks corresponding to each of the at least two sub-input tensors; and allocate the sub-memory blocks to the at least two sub-output tensors.

其中,所述分裂算子用于将所述输入张量分裂为至少两个子输入张量,所述子输出张量 是指所述数据处理层算子输出的张量。The splitting operator is used to split the input tensor into at least two sub-input tensors, and the sub-output tensors Refers to the tensor output by the operator of the data processing layer.

在一些实施例中,复用模块1706,用于确定所述输出张量占用的所述分配内存块;将至少两个所述输入张量配置为偏移复用所述输出张量占用的所述分配内存块;In some embodiments, the multiplexing module 1706 is used to determine the allocated memory block occupied by the output tensor; configure at least two of the input tensors to offset multiplex the allocated memory block occupied by the output tensor;

其中,所述拼接算子用于拼接至少两个所述输入张量。The concatenation operator is used to concatenate at least two of the input tensors.

图18示出了本申请一示例性实施例示出的计算机设备1800的结构框图。该计算机设备可以实现为本申请上述方案中的服务器。所述计算机设备1800包括中央处理单元(Central Processing Unit,CPU)1801、包括随机存取存储器(Random Access Memory,RAM)1802和只读存储器(Read-Only Memory,ROM)1803的系统存储器1804,以及连接系统存储器1804和中央处理单元1801的系统总线1805。所述计算机设备1800还包括用于存储操作系统1809、应用程序1810和其他程序模块1811的大容量存储设备1806。FIG18 shows a block diagram of a computer device 1800 shown in an exemplary embodiment of the present application. The computer device can be implemented as a server in the above-mentioned solution of the present application. The computer device 1800 includes a central processing unit (CPU) 1801, a system memory 1804 including a random access memory (RAM) 1802 and a read-only memory (ROM) 1803, and a system bus 1805 connecting the system memory 1804 and the central processing unit 1801. The computer device 1800 also includes a large-capacity storage device 1806 for storing an operating system 1809, an application program 1810, and other program modules 1811.

所述大容量存储设备1806通过连接到系统总线1805的大容量存储控制器(未示出)连接到中央处理单元1801。所述大容量存储设备1806及其相关联的计算机可读介质为计算机设备1800提供非易失性存储。也就是说,所述大容量存储设备1806可以包括诸如硬盘或者只读光盘(Compact Disc Read-Only Memory,CD-ROM)驱动器之类的计算机可读介质(未示出)。The mass storage device 1806 is connected to the central processing unit 1801 through a mass storage controller (not shown) connected to the system bus 1805. The mass storage device 1806 and its associated computer readable medium provide non-volatile storage for the computer device 1800. That is, the mass storage device 1806 may include a computer readable medium (not shown) such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.

不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、可擦除可编程只读寄存器(Erasable Programmable Read Only Memory,EPROM)、电子抹除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)闪存或其他固态存储其技术,CD-ROM、数字多功能光盘(Digital Versatile Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1804和大容量存储设备1806可以统称为存储器。Without loss of generality, the computer-readable medium may include computer storage media and communication media. Computer storage media include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information such as computer-readable instructions, data structures, program modules or other data. Computer storage media include RAM, Erasable Programmable Read Only Memory (EPROM), Electronically-Erasable Programmable Read-Only Memory (EEPROM) flash memory or other solid-state storage technology, CD-ROM, Digital Versatile Disc (DVD) or other optical storage, tape cassettes, magnetic tapes, disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the above. The above-mentioned system memory 1804 and mass storage device 1806 can be collectively referred to as memory.

可选地,存储器包括一级未分配内存(未示出)和二级未分配内存(未示出),中央处理单元1801为了加速访存,通常会采用多级存储的架构,一级未分配内存距离处理器近的存储层级拥有更大的数据传输带宽,但硬件成本更高,所以存储空间较为有限;二级未分配内存距离处理器远的存储层级数据传输带宽相小,但是硬件成本低,存储空间较大。Optionally, the memory includes a first-level unallocated memory (not shown) and a second-level unallocated memory (not shown). In order to speed up memory access, the central processing unit 1801 usually adopts a multi-level storage architecture. The first-level unallocated memory is a storage layer close to the processor and has a larger data transmission bandwidth, but the hardware cost is higher, so the storage space is relatively limited; the second-level unallocated memory is a storage layer far from the processor and has a smaller data transmission bandwidth, but the hardware cost is low and the storage space is larger.

根据本公开的各种实施例,所述计算机设备1800还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备1800可以通过连接在所述系统总线1805上的网络接口单元1807连接到网络1808,或者说,也可以使用网络接口单元1807来连接到其他类型的网络或远程计算机系统(未示出)。According to various embodiments of the present disclosure, the computer device 1800 can also be connected to a remote computer on the network through a network such as the Internet. That is, the computer device 1800 can be connected to the network 1808 through the network interface unit 1807 connected to the system bus 1805, or the network interface unit 1807 can be used to connect to other types of networks or remote computer systems (not shown).

所述存储器还包括至少一段计算机程序,所述至少一段计算机程序存储于存储器中,中央处理器1801通过执行该至少一段程序来实现上述各个实施例所示的神经网络模型的内存管理方法中的全部或部分步骤。The memory also includes at least one computer program, which is stored in the memory. The central processing unit 1801 implements all or part of the steps in the memory management method of the neural network model shown in the above-mentioned embodiments by executing the at least one program.

本申请实施例还提供一种计算机设备,该计算机设备包括处理器和存储器,该存储器中存储有至少一条程序,该至少一条程序由处理器加载并执行以实现上述各方法实施例提供的神经网络模型的内存管理方法。An embodiment of the present application also provides a computer device, which includes a processor and a memory, wherein the memory stores at least one program, and the at least one program is loaded and executed by the processor to implement the memory management method of the neural network model provided by the above-mentioned method embodiments.

本申请实施例还提供一种计算机可读存储介质,该存储介质中存储有至少一条计算机程序,该至少一条计算机程序由处理器加载并执行以实现上述各方法实施例提供的神经网络模型的内存管理方法。An embodiment of the present application also provides a computer-readable storage medium, which stores at least one computer program. The at least one computer program is loaded and executed by a processor to implement the memory management method of the neural network model provided by the above-mentioned method embodiments.

本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序存储在计算机可读存储介质中;所述计算机程序由计算机设备的处理器从所述计算机可读存储介质读取并执行,使得所述计算机设备执行以实现上述各方法实施例提供的神经网络模型的内存管理方法。An embodiment of the present application also provides a computer program product, which includes a computer program, and the computer program is stored in a computer-readable storage medium; the computer program is read and executed from the computer-readable storage medium by a processor of a computer device, so that the computer device executes to implement the memory management method of the neural network model provided in the above-mentioned method embodiments.

可以理解的是,在本申请的具体实施方式中,涉及到的数据,历史数据,以及画像等与 用户身份或特性相关的用户数据处理等相关的数据,当本申请以上实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It is understood that in the specific implementation of this application, the data, historical data, and portraits involved are related to For user data processing related to user identity or characteristics and other related data, when the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of relevant data must comply with relevant laws, regulations and standards of relevant countries and regions.

应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。It should be understood that the "plurality" mentioned in this article refers to two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three relationships. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are in an "or" relationship.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person skilled in the art will understand that all or part of the steps to implement the above embodiments may be accomplished by hardware or by instructing related hardware through a program, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a disk or an optical disk, etc.

以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同切换、改进等,均应包含在本申请的保护范围之内。 The above description is only an optional embodiment of the present application and is not intended to limit the present application. Any modifications, equivalent switching, improvements, etc. made within the spirit and principles of the present application shall be included in the protection scope of the present application.

Claims (21)

一种神经网络模型的内存管理方法,所述方法由计算机设备执行,所述方法包括:A memory management method for a neural network model, the method being executed by a computer device, the method comprising: 获取神经网络模型对应的计算图,所述计算图中包括至少两个网络层算子,所述网络层算子用于表示所述神经网络模型中的网络层;Obtaining a computational graph corresponding to a neural network model, wherein the computational graph includes at least two network layer operators, and the network layer operators are used to represent network layers in the neural network model; 基于所述计算图,确定待分配至所述网络层算子的内存大小,所述内存大小用于表示所述网络层算子在所述神经网络模型运行时需要占用的内存大小;Based on the computation graph, determining a memory size to be allocated to the network layer operator, wherein the memory size is used to indicate a memory size that the network layer operator needs to occupy when the neural network model is running; 从空闲内存块列表中获取与所述内存大小匹配的分配内存块,将所述分配内存块分配给所述网络层算子;Acquire an allocated memory block matching the memory size from a free memory block list, and allocate the allocated memory block to the network layer operator; 其中,所述空闲内存块列表用于存放已被分配但被解除占用后的空闲内存块,所述分配内存块是指被分配给所述网络层算子用于存储数据的内存块。The free memory block list is used to store free memory blocks that have been allocated but released, and the allocated memory blocks refer to memory blocks allocated to the network layer operator for storing data. 根据权利要求1所述的方法,其中,所述内存大小包括所述网络层算子的输入张量的大小;The method according to claim 1, wherein the memory size includes the size of the input tensor of the network layer operator; 所述从空闲内存块列表中获取与所述内存大小匹配的分配内存块,将所述分配内存块分配给所述网络层算子,包括:The acquiring an allocated memory block matching the memory size from a free memory block list and allocating the allocated memory block to the network layer operator comprises: 获取所述网络层算子的排列顺序,所述排列顺序用于表示所述网络层算子在所述神经网络模型运行时的执行顺序;Obtaining an arrangement order of the network layer operators, wherein the arrangement order is used to represent an execution order of the network layer operators when the neural network model is running; 在所述网络层算子的所述输入张量的大小大于张量大小阈值的情况下,从所述空闲内存块列表中获取与所述输入张量的大小匹配的所述分配内存块,所述张量大小阈值是内存复用的最小的内存块大小;When the size of the input tensor of the network layer operator is greater than a tensor size threshold, obtaining the allocated memory block matching the size of the input tensor from the free memory block list, wherein the tensor size threshold is a minimum memory block size for memory reuse; 按照所述排列顺序将所述分配内存块分配给所述网络层算子用于存储所述输入张量;Allocating the allocated memory blocks to the network layer operators for storing the input tensors according to the arrangement order; 其中,所述输入张量是指输入至所述网络层算子中的多维数组。The input tensor refers to a multidimensional array input into the network layer operator. 根据权利要求2所述的方法,其中,所述在所述网络层算子的所述输入张量的大小大于张量大小阈值的情况下,从所述空闲内存块列表中获取与所述输入张量的大小匹配的所述分配内存块,包括:The method according to claim 2, wherein, when the size of the input tensor of the network layer operator is greater than a tensor size threshold, obtaining the allocated memory block matching the size of the input tensor from the free memory block list comprises: 在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中存在至少一个空闲内存块的大小大于或等于所述输入张量的大小的情况下,从所述空闲内存块列表中获取与所述输入张量的大小匹配的所述分配内存块。When the size of the input tensor is greater than the tensor size threshold and there is at least one free memory block in the free memory block list whose size is greater than or equal to the size of the input tensor, the allocated memory block matching the size of the input tensor is obtained from the free memory block list. 根据权利要求3所述的方法,其中,所述在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中存在至少一个空闲内存块的大小大于或等于所述输入张量的大小的情况下,从所述空闲内存块列表中获取与所述输入张量的大小匹配的所述分配内存块,包括:The method according to claim 3, wherein, when the size of the input tensor is greater than the tensor size threshold, and there is at least one free memory block in the free memory block list whose size is greater than or equal to the size of the input tensor, obtaining the allocated memory block matching the size of the input tensor from the free memory block list comprises: 在所述输入张量的大小大于所述张量大小阈值的情况下,将所述空闲内存块列表中的第一空闲内存块作为所述分配内存块;When the size of the input tensor is greater than the tensor size threshold, using the first free memory block in the free memory block list as the allocated memory block; 或,在所述输入张量的大小大于所述张量大小阈值的情况下,从所述空闲内存块列表中的第二空闲内存块中划分出与所述输入张量的大小匹配的第三内存块,将所述第三内存块作为所述分配内存块;or, when the size of the input tensor is greater than the tensor size threshold, dividing a third memory block matching the size of the input tensor from a second free memory block in the free memory block list, and using the third memory block as the allocated memory block; 其中,所述第一空闲内存块的大小与所述输入张量的大小相同,所述第二空闲内存块的大小大于所述输入张量的大小。The size of the first free memory block is the same as the size of the input tensor, and the size of the second free memory block is larger than the size of the input tensor. 根据权利要求3或4所述的方法,其中,所述方法还包括:The method according to claim 3 or 4, wherein the method further comprises: 在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中的所述空闲内存块的大小小于所述输入张量的大小的情况下,从所述空闲内存块列表和未分配内存中获取与所述输入张量的大小匹配的所述分配内存块;When the size of the input tensor is greater than the tensor size threshold and the size of the free memory block in the free memory block list is smaller than the size of the input tensor, acquiring the allocated memory block matching the size of the input tensor from the free memory block list and unallocated memory; 其中,所述未分配内存是指存储空间中未被分配占用过的内存。The unallocated memory refers to the memory in the storage space that has not been allocated or occupied. 根据权利要求5所述的方法,其中,所述在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中的所述空闲内存块的大小小于所述输入张量的大小的情况下, 从所述空闲内存块列表和未分配内存中获取与所述输入张量的大小匹配的所述分配内存块,包括:The method according to claim 5, wherein, when the size of the input tensor is greater than the tensor size threshold, and the size of the free memory block in the free memory block list is smaller than the size of the input tensor, Acquiring the allocated memory block matching the size of the input tensor from the free memory block list and the unallocated memory, comprising: 在所述输入张量的大小大于所述张量大小阈值的情况下,将所述空闲内存块列表中的第四空闲内存块,与所述未分配内存中的合并内存块合并,得到所述分配内存块;When the size of the input tensor is greater than the tensor size threshold, merging the fourth free memory block in the free memory block list with the merged memory block in the unallocated memory to obtain the allocated memory block; 其中,所述第四空闲内存块的大小小于所述输入张量的大小,所述合并内存块是从所述未分配内存中划分得到的内存块,所述合并内存块的大小为所述输入张量的大小与所述第四空闲内存块的大小的差值。Among them, the size of the fourth free memory block is smaller than the size of the input tensor, the merged memory block is a memory block divided from the unallocated memory, and the size of the merged memory block is the difference between the size of the input tensor and the size of the fourth free memory block. 根据权利要求3至6任一所述的方法,其中,所述方法还包括:The method according to any one of claims 3 to 6, wherein the method further comprises: 在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中没有空闲内存块的情况下,从未分配内存中划分出与所述输入张量的大小匹配的内存块作为所述分配内存块。When the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list, a memory block matching the size of the input tensor is allocated from unallocated memory as the allocated memory block. 根据权利要求7所述的方法,其中,所述未分配内存包括一级未分配内存和二级未分配内存,所述一级未分配内存的分配优先级高于所述二级未分配内存的分配优先级;The method according to claim 7, wherein the unallocated memory comprises a first-level unallocated memory and a second-level unallocated memory, and the allocation priority of the first-level unallocated memory is higher than the allocation priority of the second-level unallocated memory; 所述在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中没有空闲内存块的情况下,从所述未分配内存中划分出与所述输入张量的大小匹配的内存块作为所述分配内存块,包括:When the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list, dividing a memory block matching the size of the input tensor from the unallocated memory as the allocated memory block includes: 在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中没有空闲内存块的情况下,从所述一级未分配内存或所述二级未分配内存中划分与所述输入张量的大小匹配的内存块作为所述分配内存块。When the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list, a memory block matching the size of the input tensor is allocated from the first-level unallocated memory or the second-level unallocated memory as the allocated memory block. 根据权利要求8所述的方法,其中,所述在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中没有空闲内存块的情况下,从所述一级未分配内存或所述二级未分配内存中划分与所述输入张量的大小匹配的内存块作为所述分配内存块,包括:The method according to claim 8, wherein, when the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list, allocating a memory block matching the size of the input tensor from the first-level unallocated memory or the second-level unallocated memory as the allocated memory block, comprises: 在所述输入张量的大小大于所述张量大小阈值,所述空闲内存块列表中没有空闲内存块,且所述一级未分配内存中的剩余内存大于或等于所述输入张量的大小的情况下,从所述一级未分配内存中划分与所述输入张量的大小匹配的内存块作为所述分配内存块。When the size of the input tensor is greater than the tensor size threshold, there is no free memory block in the free memory block list, and the remaining memory in the first-level unallocated memory is greater than or equal to the size of the input tensor, a memory block matching the size of the input tensor is allocated from the first-level unallocated memory as the allocated memory block. 根据权利要求8或9所述的方法,其中,所述在所述输入张量的大小大于所述张量大小阈值,且所述空闲内存块列表中没有空闲内存块的情况下,从所述一级未分配内存或所述二级未分配内存中划分与所述输入张量的大小匹配的内存块作为所述分配内存块,包括:The method according to claim 8 or 9, wherein, when the size of the input tensor is greater than the tensor size threshold and there is no free memory block in the free memory block list, allocating a memory block matching the size of the input tensor from the first-level unallocated memory or the second-level unallocated memory as the allocated memory block, comprises: 在所述输入张量的大小大于所述张量大小阈值,所述空闲内存块列表中没有空闲内存块,且所述一级未分配内存中的剩余内存小于所述输入张量的大小的情况下,从所述二级未分配内存中划分与所述输入张量的大小匹配的内存块作为所述分配内存块。When the size of the input tensor is greater than the tensor size threshold, there is no free memory block in the free memory block list, and the remaining memory in the first-level unallocated memory is less than the size of the input tensor, a memory block matching the size of the input tensor is allocated from the second-level unallocated memory as the allocated memory block. 根据权利要求2至10任一所述的方法,其中,所述方法还包括:The method according to any one of claims 2 to 10, wherein the method further comprises: 在所述网络层算子的所述输入张量的大小小于或等于张量大小阈值的情况下,从未分配内存中获取与所述输入张量的大小匹配的所述分配内存块,所述未分配内存是指存储空间中未被分配占用过的内存。When the size of the input tensor of the network layer operator is less than or equal to a tensor size threshold, the allocated memory block matching the size of the input tensor is obtained from unallocated memory, where the unallocated memory refers to memory in the storage space that has not been allocated or occupied. 根据权利要求1至11任一所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 11, wherein the method further comprises: 基于所述计算图,确定所述网络层算子的输入张量和输出张量对应的生命周期,其中,输入张量和输出张量在达到所述生命周期后不再被其它网络层算子使用;Based on the computation graph, determining a life cycle corresponding to an input tensor and an output tensor of the network layer operator, wherein the input tensor and the output tensor are no longer used by other network layer operators after reaching the life cycle; 响应于已分配内存块列表中的内存块达到所述生命周期,释放所述内存块,并将所述内存块放入所述空闲内存块列表;In response to a memory block in the allocated memory block list reaching the life cycle, releasing the memory block and placing the memory block into the free memory block list; 其中,所述已分配内存块列表用于存放已被占用的内存块。The allocated memory block list is used to store occupied memory blocks. 根据权利要求12所述的方法,其中,所述响应于已分配内存块列表中的内存块达到所述生命周期,释放所述内存块,并将所述内存块放入所述空闲内存块列表,包括:The method according to claim 12, wherein, in response to a memory block in the allocated memory block list reaching the life cycle, releasing the memory block and placing the memory block in the free memory block list comprises: 响应于所述已分配内存块列表中的内存块达到所述生命周期,释放所述内存块;In response to a memory block in the allocated memory block list reaching the life cycle, releasing the memory block; 在当前释放内存块的相邻位置存在已释放内存块的情况下,将所述当前释放内存块与所述已释放内存块合并,得到合并释放内存块; In the case that there is a released memory block at an adjacent position of the currently released memory block, merging the currently released memory block with the released memory block to obtain a merged released memory block; 将所述合并释放内存块放入所述空闲内存块列表。The merged released memory block is put into the free memory block list. 根据权利要求1至13任一所述的方法,其中,所述网络层算子包括数据处理层算子,所述数据处理层算子用于调整所述神经网络模型中的数据格式;所述方法还包括:The method according to any one of claims 1 to 13, wherein the network layer operator includes a data processing layer operator, and the data processing layer operator is used to adjust the data format in the neural network model; the method further comprises: 获取所述数据处理层算子的输入张量和输出张量;Obtaining input tensors and output tensors of the data processing layer operator; 将所述输出张量配置为复用所述输入张量占用的所述分配内存块。The output tensor is configured to reuse the allocated memory block occupied by the input tensor. 根据权利要求14所述的方法,其中,所述数据处理层算子包括形状重塑算子,所述输出张量包括形状重塑张量;The method of claim 14, wherein the data processing layer operator comprises a reshape operator and the output tensor comprises a reshape tensor; 所述将所述输出张量配置为复用所述输入张量占用的所述分配内存块,包括:The configuring the output tensor to reuse the allocated memory block occupied by the input tensor comprises: 基于所述输入张量占用的所述分配内存块,将所述输入张量占用的所述分配内存块分配给所述形状重塑张量;allocating the allocated memory block occupied by the input tensor to the reshaped tensor based on the allocated memory block occupied by the input tensor; 其中,所述形状重塑算子用于调整所述输入张量的形状,但不改变所述输入张量中的数据,所述形状重塑张量指所述形状重塑算子输出的张量。The reshape operator is used to adjust the shape of the input tensor but does not change the data in the input tensor, and the reshape tensor refers to the tensor output by the reshape operator. 根据权利要求14所述的方法,其中,所述数据处理层算子包括分裂算子,所述输出张量包括至少两个子输出张量;The method of claim 14, wherein the data processing layer operator comprises a split operator, and the output tensor comprises at least two sub-output tensors; 所述将所述输出张量配置为复用所述输入张量占用的所述分配内存块,包括:The configuring the output tensor to reuse the allocated memory block occupied by the input tensor comprises: 划分所述输入张量占用的所述分配内存块,得到所述至少两个子输入张量各自对应的子内存块;Dividing the allocated memory block occupied by the input tensor to obtain sub-memory blocks corresponding to the at least two sub-input tensors; 将所述子内存块分配给所述至少两个子输出张量;Allocating the sub-memory block to the at least two sub-output tensors; 其中,所述分裂算子用于将所述输入张量划分为至少两个子输入张量,所述子输出张量指所述数据处理层算子输出的张量。The splitting operator is used to divide the input tensor into at least two sub-input tensors, and the sub-output tensor refers to the tensor output by the data processing layer operator. 根据权利要求14所述的方法,其中,所述数据处理层算子包括拼接算子;所述方法还包括:The method according to claim 14, wherein the data processing layer operator comprises a concatenation operator; the method further comprising: 确定所述输出张量占用的所述分配内存块;Determining the allocated memory block occupied by the output tensor; 将至少两个所述输入张量配置为偏移复用所述输出张量占用的所述分配内存块;Configuring at least two of the input tensors to offset-reuse the allocated memory block occupied by the output tensor; 其中,所述拼接算子用于拼接至少两个所述输入张量。The concatenation operator is used to concatenate at least two of the input tensors. 一种神经网络模型的内存管理装置,所述装置包括:A memory management device for a neural network model, the device comprising: 获取模块,用于获取神经网络模型对应的计算图,所述计算图中包括至少两个网络层算子,所述网络层算子用于表示所述神经网络模型中的网络层;An acquisition module, used to acquire a computational graph corresponding to a neural network model, wherein the computational graph includes at least two network layer operators, and the network layer operators are used to represent network layers in the neural network model; 确定模块,用于基于所述计算图,确定待分配至所述网络层算子的内存大小,所述内存大小用于表示所述网络层算子在所述神经网络模型运行时需要占用的内存大小;A determination module, used to determine the memory size to be allocated to the network layer operator based on the calculation graph, wherein the memory size is used to indicate the memory size that the network layer operator needs to occupy when the neural network model is running; 分配模块,用于从空闲内存块列表中获取与所述内存大小匹配的分配内存块,将所述分配内存块分配给所述网络层算子;An allocation module, configured to obtain an allocated memory block matching the memory size from a free memory block list, and allocate the allocated memory block to the network layer operator; 其中,所述空闲内存块列表用于存放已被分配但被解除占用后的空闲内存块,所述分配内存块是指被分配给所述网络层算子用于存储数据的内存块。The free memory block list is used to store free memory blocks that have been allocated but released, and the allocated memory blocks refer to memory blocks allocated to the network layer operator for storing data. 一种计算机设备,所述计算机设备包括:处理器和存储器,所述存储器中存储有至少一条计算机程序,至少一条所述计算机程序由所述处理器加载并执行以实现如权利要求1至17中任一项所述的神经网络模型的内存管理方法。A computer device, comprising: a processor and a memory, wherein at least one computer program is stored in the memory, and at least one computer program is loaded and executed by the processor to implement the memory management method of the neural network model as described in any one of claims 1 to 17. 一种计算机存储介质,所述计算机可读存储介质中存储有至少一条计算机程序,至少一条计算机程序由处理器加载并执行以实现如权利要求1至17中任一项所述的神经网络模型的内存管理方法。A computer storage medium, wherein at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor to implement the memory management method of a neural network model as described in any one of claims 1 to 17. 一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序存储在计算机可读存储介质中;所述计算机程序由计算机设备的处理器从所述计算机可读存储介质读取并执行,使得所述计算机设备执行如权利要求1至17中任一项所述的神经网络模型的内存管理方法。 A computer program product, comprising a computer program, wherein the computer program is stored in a computer-readable storage medium; the computer program is read and executed from the computer-readable storage medium by a processor of a computer device, so that the computer device executes the memory management method of a neural network model as described in any one of claims 1 to 17.
PCT/CN2024/103342 2023-09-11 2024-07-03 Memory management method and apparatus for neural network model, device, medium and product Pending WO2025055495A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202311165933.1 2023-09-11
CN202311165933.1A CN116893904B (en) 2023-09-11 2023-09-11 Memory management method, device, equipment, medium and product of neural network model

Publications (1)

Publication Number Publication Date
WO2025055495A1 true WO2025055495A1 (en) 2025-03-20

Family

ID=88309762

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/103342 Pending WO2025055495A1 (en) 2023-09-11 2024-07-03 Memory management method and apparatus for neural network model, device, medium and product

Country Status (2)

Country Link
CN (1) CN116893904B (en)
WO (1) WO2025055495A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116893904B (en) * 2023-09-11 2023-12-26 腾讯科技(深圳)有限公司 Memory management method, device, equipment, medium and product of neural network model
CN117785759B (en) * 2024-02-28 2024-04-23 北京壁仞科技开发有限公司 Data storage method, data reading method, electronic device, and storage medium
CN117892769B (en) * 2024-03-15 2024-06-11 之江实验室 Neural network training method, video memory scheduling method, system, device and product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190332925A1 (en) * 2018-04-30 2019-10-31 International Business Machines Corporation Neural hardware accelerator for parallel and distributed tensor computations
CN114298294A (en) * 2021-12-28 2022-04-08 杭州雄迈集成电路技术股份有限公司 Neural network memory optimization method and device based on hardware accelerator
CN114327844A (en) * 2020-09-29 2022-04-12 华为技术有限公司 Memory allocation method, related device and computer readable storage medium
CN116893904A (en) * 2023-09-11 2023-10-17 腾讯科技(深圳)有限公司 Memory management method, device, equipment, medium and product of neural network model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9389997B2 (en) * 2013-04-24 2016-07-12 International Business Machines Corporation Heap management using dynamic memory allocation
CN110597616B (en) * 2018-06-13 2022-07-29 华为技术有限公司 Memory allocation method and device for neural network
CN114492775A (en) * 2022-01-13 2022-05-13 哲库科技(上海)有限公司 Data processing method and device, neural network accelerator and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190332925A1 (en) * 2018-04-30 2019-10-31 International Business Machines Corporation Neural hardware accelerator for parallel and distributed tensor computations
CN114327844A (en) * 2020-09-29 2022-04-12 华为技术有限公司 Memory allocation method, related device and computer readable storage medium
CN114298294A (en) * 2021-12-28 2022-04-08 杭州雄迈集成电路技术股份有限公司 Neural network memory optimization method and device based on hardware accelerator
CN116893904A (en) * 2023-09-11 2023-10-17 腾讯科技(深圳)有限公司 Memory management method, device, equipment, medium and product of neural network model

Also Published As

Publication number Publication date
CN116893904A (en) 2023-10-17
CN116893904B (en) 2023-12-26

Similar Documents

Publication Publication Date Title
WO2025055495A1 (en) Memory management method and apparatus for neural network model, device, medium and product
CN112529169B (en) Data processing method, model optimization device and model execution device
CN110321223B (en) Coflow collaborative work flow scheduling-aware data flow division method and device
Ni et al. Efficient ranking and selection in parallel computing environments
CN114327844A (en) Memory allocation method, related device and computer readable storage medium
CN106406987A (en) Task execution method and apparatus in cluster
US11676074B2 (en) Heterogeneous processing system for federated learning and privacy-preserving computation
CN113723443B (en) A distributed training method and system for large visual models
CN110413776B (en) High-performance calculation method for LDA (text-based extension) of text topic model based on CPU-GPU (Central processing Unit-graphics processing Unit) collaborative parallel
WO2021115082A1 (en) Job scheduling method and job scheduling apparatus
CN113569511A (en) A method and device for simulating a quantum circuit
CN106502918A (en) A kind of scheduling memory method and device
CN113037800A (en) Job scheduling method and job scheduling device
KR102793524B1 (en) Interconnect device, operation method of interconnect device, and artificial intelligence(ai) accelerator system
CN120123082A (en) Process scheduling method, device, electronic device and storage medium based on NUMA architecture
CN102831102A (en) Method and system for carrying out matrix product operation on computer cluster
CN117112145B (en) Training model distribution method, training model distribution device, computer equipment and storage medium
CN119783812B (en) Optimization Methods for Parallel Training and Inference Adaptation of Next-Generation Heterogeneous Supercomputing Large Models
CN112306675A (en) Data processing method, related device and computer readable storage medium
CN112991144B (en) Method and system for segmenting image data for neural networks
CN115827178A (en) Edge calculation task allocation method and device, computer equipment and related medium
CN113835852A (en) Task data scheduling method and device
CN114296912A (en) Computing power resource allocation method and device and storage medium
Haggarty et al. Distributed response time analysis of GSPN models with MapReduce
CN111461144A (en) Method for accelerating convolutional neural network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24864220

Country of ref document: EP

Kind code of ref document: A1