CN116301874A

CN116301874A - Code compiling method, electronic device and storage medium

Info

Publication number: CN116301874A
Application number: CN202111576033.7A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2023-06-23

Abstract

The embodiment of the application discloses a code compiling method, electronic equipment and a storage medium. The method comprises the following steps: acquiring N code blocks, wherein each code block comprises a state identifier for determining the running sequence of the code block, and N is an integer greater than or equal to 2; determining the running sequence of the N code blocks according to the state identifier packaged in each code block; and generating a pipeline according to the running sequence of the N code blocks and the N code blocks. The embodiment of the application is beneficial to improving the production line generation efficiency.

Description

Code compiling method, electronic device and storage medium

技术领域technical field

本申请涉及计算机软件技术领域，具体涉及一种代码编译方法、电子设备及存储介质。The present application relates to the technical field of computer software, in particular to a code compiling method, electronic equipment and a storage medium.

背景技术Background technique

随着科技的发展，处理器运算能力的提高，对运算速度的要求越来越高。目前，流水线处理模式一种比较通用的加速方法。一种常见的流水线的实现方式就使用双缓存(double buffer)机制来实现不同任务流之间的并行，但双缓存机制很难适用于复杂任务流的情况。而随着流水中任务数量的增多，多个任务之间关系也较为复杂，此时对代码的解析越来越繁琐和复杂，基于代码解析确定流水线的实现也会变得困难。因此，如何高效快速地构造出一个包含有多任务的流水线是目前亟待解决的技术问题。With the development of science and technology and the improvement of the computing power of processors, the requirements for computing speed are getting higher and higher. At present, the pipeline processing mode is a relatively general acceleration method. A common pipeline implementation method uses a double buffer (double buffer) mechanism to achieve parallelism between different task flows, but the double buffer mechanism is difficult to apply to complex task flows. With the increase in the number of tasks in the pipeline, the relationship between multiple tasks is also more complicated. At this time, the analysis of the code is becoming more and more cumbersome and complicated, and it will become difficult to determine the implementation of the pipeline based on the code analysis. Therefore, how to efficiently and quickly construct a multi-task pipeline is a technical problem to be solved urgently.

发明内容Contents of the invention

本申请实施例提供了一种代码编译方法、电子设备及存储介质，通过在代码块中设置状态标识，提高了流水线的构造效率。Embodiments of the present application provide a code compiling method, electronic equipment, and a storage medium. By setting state flags in code blocks, the construction efficiency of the pipeline is improved.

第一方面，本申请实施例提供一种代码编译方法，包括：In the first aspect, the embodiment of the present application provides a code compilation method, including:

获取N个代码块，其中，每个所述代码块中包含有用于确定该代码块的运行顺序的状态标识，N为大于或者等于2的整数；Obtaining N code blocks, wherein each of the code blocks contains a status identifier for determining the running sequence of the code block, and N is an integer greater than or equal to 2;

根据每个所述代码块中封装的状态标识，确定所述N个代码块的运行顺序；Determine the running order of the N code blocks according to the state identifier encapsulated in each of the code blocks;

根据所述N个代码块的运行顺序以及所述N个代码块，生成流水线。A pipeline is generated according to the running sequence of the N code blocks and the N code blocks.

在本申请的一个实施方式中，所述状态标识用于指示所述代码块所实现的任务的任务状态，所述根据每个所述代码块中封装的状态标识，确定所述N个代码块的运行顺序，包括：In one embodiment of the present application, the status identifier is used to indicate the task status of the task implemented by the code block, and the N code blocks are determined according to the status identifier encapsulated in each of the code blocks sequence of operations, including:

根据所述N个代码块中的状态标识所指示的任务状态，确定所述N个代码块之间的依赖关系；Determine the dependencies between the N code blocks according to the task states indicated by the state identifiers in the N code blocks;

根据所述N个代码块之间的依赖关系，确定所述N个代码块的运行顺序。The running order of the N code blocks is determined according to the dependencies among the N code blocks.

在本申请的一个实施方式中，所述N个代码块所实现的任务的任务状态包括数据加载任务、数据运算任务以及数据存储任务；In one embodiment of the present application, the task states of the tasks implemented by the N code blocks include data loading tasks, data computing tasks, and data storage tasks;

在一个流水线中，实现所述数据加载任务的代码块、所述数据运算任务的代码块和所述数据存储任务的代码块顺次执行。In one pipeline, the code block for implementing the data loading task, the code block for the data operation task, and the code block for the data storage task are executed sequentially.

在本申请的一个实施方式中，所述流水线还包括代码块；In one embodiment of the present application, the pipeline further includes a code block;

所述流水线中任意两个相邻的代码块之间均包含所述同步代码块；The synchronization code block is included between any two adjacent code blocks in the pipeline;

所述同步代码块用于指示在运行完所述流水线中的第i个代码块之后，运行第i+1个代码块，其中，i为大于或等于1的正整数，且i小于或等于N。The synchronization code block is used to indicate that after the i-th code block in the pipeline is executed, the i+1th code block is run, wherein, i is a positive integer greater than or equal to 1, and i is less than or equal to N .

在本申请的一个实施方式中，每个所述代码块中还包含有位置标识，该位置标识用于标识该代码块在原始代码中的起始位置；所述获取N个代码块，包括：In one embodiment of the present application, each of the code blocks also includes a position identifier, which is used to identify the starting position of the code block in the original code; the acquisition of N code blocks includes:

将所述原始代码中的第j个位置标识和第j+1个位置标识之间的代码，以及所述第j个位置标识，作为第j个代码块，j的取值为1到N的整数。The code between the jth location identifier and the j+1th location identifier in the original code, and the jth location identifier, as the jth code block, and the value of j is from 1 to N integer.

在本申请的一个实施方式中，获取N个代码块之前，所述方法还包括：In one embodiment of the present application, before obtaining the N code blocks, the method further includes:

获取预设标志位的取值；Obtain the value of the preset flag bit;

确定所述预设标志位的取值为预设值时，执行获取N个代码块的操作。When it is determined that the value of the preset flag bit is a preset value, an operation of acquiring N code blocks is performed.

在本申请的一个实施方式中，所述流水线的数量至少为两个，所述方法还包括：In one embodiment of the present application, the number of the pipelines is at least two, and the method also includes:

根据至少两个流水线中各个代码块的状态标识，将不同流水线中处于不同任务状态的代码块确定为并行代码块组，该并行代码块组中的多个代码块能够在同一时间单元内并行执行。According to the state identification of each code block in at least two pipelines, determine the code blocks in different task states in different pipelines as a parallel code block group, and multiple code blocks in the parallel code block group can be executed in parallel within the same time unit .

在本申请的一个实施方式中，所述根据至少两个流水线中各个代码块的状态标识，将不同流水线中处于不同任务状态的代码块确定为并行代码块组，包括：In one embodiment of the present application, according to the state identification of each code block in at least two pipelines, the code blocks in different task states in different pipelines are determined as parallel code block groups, including:

在执行第一个流水线中的第k个代码块时，并行执行第二个流水线中的第k-1个代码块，所述第k-1个代码块的任务状态与所述第一个流水线的任务状态不同；When the kth code block in the first pipeline is executed, the k-1th code block in the second pipeline is executed in parallel, and the task status of the k-1th code block is the same as that of the first pipeline different task status;

其中，所述第一流水线为所述至少两个流水线中的任意一个，所述第二个流水线为所述至少两个流水线中的所述第一个流水线的下一级流水线，k为大于或者等于2的正整数，且k小于或等于N。Wherein, the first pipeline is any one of the at least two pipelines, the second pipeline is the next-level pipeline of the first pipeline in the at least two pipelines, and k is greater than or A positive integer equal to 2, and k is less than or equal to N.

在本申请的一个实施方式中，所述方法还包括：In one embodiment of the present application, the method also includes:

为所述并行代码块组分配内存，其中，所述内存的大小为所述并行代码组中代码块的数量与预设存储空间大小的乘积。Allocating memory for the parallel code block group, wherein the size of the memory is the product of the number of code blocks in the parallel code group and the size of a preset storage space.

将所述N个代码块形成的原始代码编译为目标代码。Compiling the original code formed by the N code blocks into object code.

第二方面，本申请实施例提供一种流水线生成装置，包括：In the second aspect, the embodiment of the present application provides a pipeline generation device, including:

获取单元，用于获取N个代码块，其中，每个所述代码块中包含有用于确定该代码块的运行顺序的状态标识，N为大于或者等于2的整数；An acquisition unit, configured to acquire N code blocks, wherein each of the code blocks contains a status identifier for determining the running sequence of the code block, and N is an integer greater than or equal to 2;

处理单元，用于根据每个所述代码块中封装的状态标识，确定所述N个代码块的运行顺序；A processing unit, configured to determine the running order of the N code blocks according to the state identifier encapsulated in each of the code blocks;

在本申请的一个实施方式中，所述状态标识用于指示所述代码块所实现的任务的任务状态，在根据每个所述代码块中封装的状态标识，确定所述N个代码块的运行顺序方面，所述处理单元，具体用于：In one embodiment of the present application, the status identifier is used to indicate the task status of the task implemented by the code block, and according to the status identifier encapsulated in each of the code blocks, determine the status of the N code blocks In terms of running sequence, the processing unit is specifically used for:

在本申请的一个实施方式中，每个所述代码块中还包含有位置标识，该位置标识用于标识该代码块在原始代码中的起始位置；在获取N个代码块方面，所述获取单元，具体用于：In one embodiment of the present application, each code block also includes a position identifier, which is used to identify the starting position of the code block in the original code; in terms of obtaining N code blocks, the Get unit, specifically for:

在本申请的一个实施方式中，获取N个代码块之前，所述获取单元，还用于获取预设标志位的取值；所述处理单元，还用于确定所述预设标志位的取值为预设值时，执行获取N个代码块的操作。In one embodiment of the present application, before acquiring the N code blocks, the acquisition unit is also used to acquire the value of the preset flag bit; the processing unit is also used to determine the value of the preset flag bit When the value is a preset value, perform the operation of obtaining N code blocks.

在本申请的一个实施方式中，所述流水线的数量至少为两个，所述处理单元，还用于：In one embodiment of the present application, the number of the pipelines is at least two, and the processing unit is further configured to:

在本申请的一个实施方式中，在根据至少两个流水线中各个代码块的状态标识，将不同流水线中处于不同任务状态的代码块确定为并行代码块组方面，所述处理单元，具体用于：In one embodiment of the present application, in terms of determining code blocks in different task states in different pipelines as parallel code block groups according to the state identification of each code block in at least two pipelines, the processing unit is specifically used to :

在本申请的一个实施方式中，所述处理单元，还用于为所述并行代码块组分配内存，其中，所述内存的大小为所述并行代码组中代码块的数量与预设存储空间大小的乘积。In one embodiment of the present application, the processing unit is further configured to allocate memory for the parallel code block group, wherein the size of the memory is equal to the number of code blocks in the parallel code group and the preset storage space The product of sizes.

在本申请的一个实施方式中，所述处理单元，还用于将所述N个代码块形成的原始代码编译为目标代码。In one embodiment of the present application, the processing unit is further configured to compile the original code formed by the N code blocks into an object code.

第三方面，本申请实施例提供一种电子设备，包括：处理器，所述处理器与存储器相连，所述存储器用于存储计算机程序，所述处理器用于执行所述存储器中存储的计算机程序，以使得所述电子设备执行如第一方面所述的方法。In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , so that the electronic device executes the method described in the first aspect.

第四方面，本申请实施例提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序使得计算机执行如第一方面所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program causes a computer to execute the method as described in the first aspect.

第五方面，本申请实施例提供一种计算机程序产品，所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，所述计算机可操作来使计算机执行如第一方面所述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer is operable to enable the computer to execute the computer program described in the first aspect. Methods.

实施本申请实施例，具有如下有益效果：Implementing the embodiment of the present application has the following beneficial effects:

可以看出，在本申请实施方式中，在代码块置有状态标识，可以直接根据状态标识确定出每个任务的代码块的运行顺序，无需通过分析代码的功能及各个代码块之间的数据流向去确定每个任务的代码块的运行顺序，从而实现快速、高效的构造出流水线。进一步地，由于每个代码块中均封装有状态标识，因此针对复杂的任务流的场景，本申请可以依照上述实现方式，分别为每个任务流生成相应的流水线，且该生成流水线的方式不会受限于硬件资源(如存储分区的数量)，从而能够支持复杂任务流的场景。It can be seen that in the embodiment of the present application, the code block is provided with a state flag, and the running order of the code blocks of each task can be determined directly according to the state flag, without analyzing the function of the code and the data between each code block The flow direction determines the running order of the code blocks of each task, so as to realize the fast and efficient construction of the pipeline. Furthermore, since each code block is encapsulated with a state identifier, for complex task flow scenarios, the present application can generate corresponding pipelines for each task flow according to the above implementation method, and the way of generating pipelines is not It will be limited by hardware resources (such as the number of storage partitions), so that it can support complex task flow scenarios.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.

图1为本申请实施例提供的一种双缓存机制的实现过程的示意图；FIG. 1 is a schematic diagram of an implementation process of a double buffering mechanism provided in an embodiment of the present application;

图2为本申请实施例提供的一种代码编译方法的流程示意图；Fig. 2 is a schematic flow chart of a code compiling method provided by the embodiment of the present application;

图3为本申请实施例提供的一种获取代码块的示意图；FIG. 3 is a schematic diagram of an acquisition code block provided by an embodiment of the present application;

图4为本申请实施例提供的一种组成流水线的示意图；FIG. 4 is a schematic diagram of a composition pipeline provided by an embodiment of the present application;

图5为本申请实施例提供的另一种组成流水线的示意图；FIG. 5 is a schematic diagram of another composition pipeline provided by the embodiment of the present application;

图6为本申请实施例提供的又一种组成流水线的示意图；FIG. 6 is a schematic diagram of another composition pipeline provided by the embodiment of the present application;

图7为本申请实施例提供的一种流水线生成装置的功能单元组成框图；FIG. 7 is a block diagram of functional units of a pipeline generating device provided in an embodiment of the present application;

图8为本申请实施例提供的一种电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" in the specification and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order . Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结果或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

为了高效地执行不同任务，充分地利用硬件的计算资源，往往会将有相互依赖的任务组成一条流水线，这样不同流水线中不存在相互依赖的任务之间可以并行执行，从而可以提高任务的执行效率。针对不同的处理场景，任务流水线的生成及实现可以具有多种不同的方式，本公开的目的在于提供一种针对复杂任务流的多级流水线的生成方法、装置及相关产品。In order to efficiently perform different tasks and make full use of hardware computing resources, interdependent tasks are often formed into a pipeline, so that tasks that do not have interdependence in different pipelines can be executed in parallel, thereby improving task execution efficiency. . For different processing scenarios, task pipelines can be generated and implemented in many different ways. The purpose of the present disclosure is to provide a multi-stage pipeline generation method, device and related products for complex task flows.

一种常见的流水线实现方式是采用存储分区的方式，不同的存储块内的数据可以轮流送至运算单元进行运算，但此种实现方式仅能支持至多两个流水线的简单任务流，难以适用于复杂任务流的场景。例如，图1示出了采用双缓存机制(double buffer)实现任务流水线的过程，下面结合图1介绍一下目前的双缓存机制的实现过程。A common pipeline implementation method is to use storage partitions. The data in different storage blocks can be sent to the computing unit in turn for calculation. However, this implementation method can only support simple task flows with at most two pipelines, which is difficult to apply to Scenarios with complex task flows. For example, FIG. 1 shows the process of implementing a task pipeline by adopting a double buffer mechanism (double buffer). The implementation process of the current double buffer mechanism will be introduced below in conjunction with FIG. 1 .

如图1所示，为减少运算单元等待时间，可以将统一缓存的存储空间分为两个部分，例如将统一缓存拆分为第一缓存和第二缓存。当运算单元对第一缓存中的数据进行读取和计算时，存储单元可并行地将下一份数据写入第二缓存中。而当运算单元切换到对第二缓存中的数据进行读取和计算时，存储单元可以并行地将第一缓存中存储的计算结果写出至存储单元，或存储单元可以并行地将下一份数据写入到第一缓存中。由此，数据的存取任务和运算单元的计算任务可以并行执行，从而运算单元的闲置问题得以有效缓解，提高了运算单元的利用率，且提高了任务的执行效率。然而，双缓存机制针对每个任务的实现，是通过解析底层代码找到每个任务的代码内容，然后基于该代码内容实现该任务。例如，针对一个流水中的运算单元的计算任务，则需要解析底层代码找到能够实现计算任务的代码内容，通过执行该代码内容，实现该计算任务。然而，随着流水中任务数量的增多，对代码的解析越来越繁琐和复杂。因此，如何高效地构造出一个支持多条任务的流水线是目前亟待解决的技术问题。As shown in FIG. 1 , in order to reduce the waiting time of the computing unit, the storage space of the unified cache can be divided into two parts, for example, the unified cache can be divided into a first cache and a second cache. When the operation unit reads and calculates the data in the first cache, the storage unit can write the next piece of data into the second cache in parallel. When the operation unit switches to read and calculate the data in the second cache, the storage unit can write out the calculation results stored in the first cache to the storage unit in parallel, or the storage unit can write the next copy in parallel Data is written into the first cache. Thus, the data access task and the computing task of the computing unit can be executed in parallel, thereby effectively alleviating the idle problem of the computing unit, improving the utilization rate of the computing unit, and improving the execution efficiency of the task. However, the implementation of the double buffering mechanism for each task is to find the code content of each task by parsing the underlying code, and then implement the task based on the code content. For example, for the calculation task of a computing unit in a pipeline, it is necessary to analyze the underlying code to find the code content that can realize the calculation task, and realize the calculation task by executing the code content. However, as the number of tasks in the pipeline increases, the parsing of the code becomes more and more cumbersome and complex. Therefore, how to efficiently construct a pipeline that supports multiple tasks is a technical problem that needs to be solved urgently.

参阅图2，图2为本申请实施例提供的一种代码编译方法的流程示意图。该方法可以由运行于一处理器(如通用处理器CPU等)上的编译器实现。该方法可以包括以下步骤内容：Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a code compiling method provided by an embodiment of the present application. The method can be implemented by a compiler running on a processor (such as a general-purpose processor, CPU, etc.). The method may include the following steps:

201：获取N个代码块，N为大于或者等于2的整数。201: Obtain N code blocks, where N is an integer greater than or equal to 2.

其中，每个代码块的功能为实现一个任务，因此本申请所提到的执行一个代码块，本质上就是执行一个任务，两者在本质上是一致的，无需区分。此外，不同的代码块具有不同的功能，即不同的代码块用于实现类型不同的任务。在本申请中，主要以N个代码块所实现的任务为数据加载任务(data_load)、数据运算任务(data_computer)以及数据存储任务(data_store)进行举例说明，在实际应用中，还可以有其他的任务状态，不再详细描述。Among them, the function of each code block is to realize a task, so the execution of a code block mentioned in this application is essentially the execution of a task, and the two are essentially the same without distinction. In addition, different code blocks have different functions, that is, different code blocks are used to implement different types of tasks. In this application, the tasks implemented by N code blocks are mainly illustrated as data loading tasks (data_load), data computing tasks (data_computer) and data storage tasks (data_store). In practical applications, there may also be other The status of the task will not be described in detail.

本申请可以首先获得原始代码，并基于该原始代码解析获得N个代码块。示例性的，原始代码中预先设置有至少一个代码块的标识，即位置标识；这样在解析原始代码时，可以将该位置标识和下一个位置标识之间的代码作为一个代码块，以获取N个代码块。The application can first obtain the original code, and obtain N code blocks based on the analysis of the original code. Exemplarily, at least one code block identifier, i.e. a position identifier, is preset in the original code; in this way, when parsing the original code, the code between the position identifier and the next location identifier can be used as a code block to obtain N code block.

具体地，该位置标识可以置于每个代码块中，位置标识用于标识该代码块在原始代码中的起始位置。因此针对该N个代码块，在原始代码中设置N个位置标识。在解析原始代码时，将第j个位置标识与第j+1个位置标识之间的代码，以及该第j个位置标识作为第j个代码块，j的取值为1到N的整数。Specifically, the position identifier can be placed in each code block, and the position identifier is used to identify the starting position of the code block in the original code. Therefore, for the N code blocks, N position identifiers are set in the original code. When parsing the original code, the code between the jth location identifier and the j+1th location identifier, and the jth location identifier are used as the jth code block, and the value of j is an integer from 1 to N.

可选地，本申请中通块级作用域对每个任务对应的代码块进行封装，获得原始代码。本申请所提到的位置标识可以使用块级标识符进行表示，例如，with IR.block。则在解析原始代码的过程中，若识别到with IR.block标识，则可以将该with IR.block，以及该with IR.block与下一个with IR.block之间的代码作为一个代码块，以获取到N个代码块。其中，位置标识符中的IR表示原始代码的代码层级。在一种可实现方式中，原始代码的代码层级包括但不限于，高层级的Python代码、低层级的Python代码以及更低层级的目标代码，该目标代码可以是采用类C语言形成的代码(例如CUDAC)。其中，该低层级的Python代码可以是基于Python语言构建的张量计算原语形成的代码，例如，TCP(Tensor ComputerPrimitive)或TIK(Tensor Iterator Kernel)。Optionally, in this application, the code block corresponding to each task is encapsulated through the block-level scope to obtain the original code. The location identifier mentioned in this application can be represented by a block-level identifier, for example, with IR.block. Then, in the process of parsing the original code, if the with IR.block logo is recognized, the with IR.block and the code between the with IR.block and the next with IR.block can be used as a code block to Get N code blocks. Wherein, the IR in the location identifier represents the code level of the original code. In a practicable manner, the code levels of the original code include, but are not limited to, high-level Python codes, low-level Python codes, and lower-level object codes. The object codes may be codes formed in a C-like language ( such as CUDAC). Wherein, the low-level Python code may be a code formed based on a tensor computing primitive constructed in the Python language, for example, TCP (Tensor Computer Primitive) or TIK (Tensor Iterator Kernel).

举例来说，针对本申请所要实现的三个任务，可以将with tcp.block作为位置标识，则如图3所示，可以将第一个with tcp.block，以及第一个with tcp.block与第二个with tcp.block之间的代码作为第一个代码块，将第二个with tcp.block，以及第二个with tcp.block与第三个with tcp.block之间的代码作为第二个代码块，将第三个withtcp.block与代码结束位置之间的代码作为第三个代码块。For example, for the three tasks to be achieved in this application, with tcp.block can be used as a location identifier, as shown in Figure 3, the first with tcp.block and the first with tcp.block can be combined with The code between the second with tcp.block is the first code block, the second with tcp.block, and the code between the second with tcp.block and the third with tcp.block are the second A code block, the code between the third withtcp.block and the end of the code is the third code block.

202：根据每个代码块中封装的状态标识，确定N个代码块的运行顺序。202: Determine the running order of the N code blocks according to the state identifier encapsulated in each code block.

其中，每个代码块中包含有用于确定该代码块的运行顺序的状态标识。Wherein, each code block includes a status identifier for determining the running order of the code block.

示例性的，每个代码块中的状态标识可以用于指示该代码块所实现的任务的任务状态。可选地，该状态标识可以以每个代码块的本身功能表示，即通过每个代码块本身的功能去指示每个代码块所实现的任务的任务状态。Exemplarily, the status identifier in each code block may be used to indicate the task status of the task implemented by the code block. Optionally, the status identifier may be represented by the function of each code block itself, that is, the function of each code block itself indicates the task status of the task realized by each code block.

例如，如图3所示，针对第一个用于实现数据加载任务的代码块来说，其状态标识可以为“stage_scope＝load”，因此，在解析出该代码块的状态标识为“tage_scope＝load”时，则确定该代码所实现的任务为数据加载任务。针对第二个用于实现数据运算任务的代码块来说，其状态标识可以为“stage_scope＝computer”，因此，在解析出该代码块的状态标识为“stage_scope＝computer”时，则确定该代码所实现的任务为数据运算任务。针对第三个用于实现数据存储任务的代码块来说，其状态标识可以为“stage_scope＝store”，因此，在解析出该代码块的状态标识为“stage_scope＝store”时，则确定该代码所实现的任务为数据存储任务。通过上述步骤，本申请可以分别得到三个代码块分别用于实现数据加载任务、数据运算任务以及数据存储任务。之后，本申请可以根据三个代码块实现任务之间的依赖关系，确定三个代码块的运行顺序。For example, as shown in Figure 3, for the first code block used to implement the data loading task, its state identifier can be "stage_scope=load", therefore, after parsing out the state identifier of the code block is "tage_scope= load", it is determined that the task implemented by the code is a data loading task. For the second code block used to realize the data operation task, its state identifier can be "stage_scope=computer", therefore, when the state identifier of the code block is parsed as "stage_scope=computer", then determine the code The implemented task is a data operation task. For the third code block used to implement the data storage task, its state identifier can be "stage_scope=store", therefore, when the state identifier of the code block is parsed as "stage_scope=store", the code is determined The implemented tasks are data storage tasks. Through the above steps, the present application can respectively obtain three code blocks for realizing the data loading task, the data operation task and the data storage task respectively. Afterwards, the application can determine the running sequence of the three code blocks according to the dependencies among the tasks implemented by the three code blocks.

进一步地，根据该N个代码中的状态标识所指示的任务状态，确定该N个代码块的依赖关系。所谓依赖关系，就是各个代码块之间的数据的流向。因此可根据该N个代码块之间的依赖关系，确定N个代码块之间的运行顺序。Further, according to the task status indicated by the status identifiers in the N codes, the dependencies of the N code blocks are determined. The so-called dependency is the flow of data between each code block. Therefore, the running order among the N code blocks can be determined according to the dependencies among the N code blocks.

例如，针对数据加载任务、数据运算任务、数据存储任务来说，其依赖关系为数据运算任务依赖于数据加载任务，数据存储任务依赖于数据运算任务，因此，这三个任务所对应的代码块之间的依赖关系为：用于实现数据运算任务的代码块依赖于用于实现数据加载任务的代码块，用于实现数据存储任务的代码块依赖于用于实现数据运算任务的代码块。因此，在流水线中，需要顺次执行用于实现数据加载任务的代码块、用于实现数据运算任务的代码块和用于实现数据存储任务的代码块。应当清楚的是，此处的数据加载任务所加载的数据为数据运算任务的输入数据，数据存储任务所写出的数据为数据运算任务的输出数据，因此，三个任务之间存在数据依赖关系。否则，在其他场景下，上述三个任务之间并不必然存在数据依赖关系。For example, for data loading tasks, data computing tasks, and data storage tasks, the dependencies are that data computing tasks depend on data loading tasks, and data storage tasks depend on data computing tasks. Therefore, the code blocks corresponding to these three tasks The dependencies among them are: the code block for realizing the data operation task depends on the code block for realizing the data loading task, and the code block for realizing the data storage task depends on the code block for realizing the data operation task. Therefore, in the pipeline, it is necessary to sequentially execute the code block for implementing the data loading task, the code block for implementing the data operation task, and the code block for implementing the data storage task. It should be clear that the data loaded by the data loading task here is the input data of the data operation task, and the data written by the data storage task is the output data of the data operation task. Therefore, there is a data dependency between the three tasks . Otherwise, in other scenarios, data dependencies do not necessarily exist among the above three tasks.

在一个可能的实现方式中，可以直接在每个代码块中封装该代码块的运行顺序，例如，可将每个代码块的状态标识直接设置为该代码块的运行顺序。比如，针对用于实现数据加载任务的代码块来说，可以将其状态标识设置为“1”，这样在获取到该代码块之后，可以基于该代码块的状态标识“1”直接确定出该代码块的运行顺序为第一个，无需再根据每个代码块所实现得任务的任务状态确定N个代码块的依赖关系，再根据依赖关系确定N个代码块的运行顺序，进而提高了流水线的构造效率。In a possible implementation manner, the running order of the code block may be directly encapsulated in each code block, for example, the state identifier of each code block may be directly set as the running order of the code block. For example, for the code block used to implement the data loading task, its status flag can be set to "1", so that after the code block is obtained, the code block can be directly determined based on the status flag "1". The running order of the code blocks is the first one. It is no longer necessary to determine the dependencies of the N code blocks according to the task status of the tasks implemented by each code block, and then determine the running order of the N code blocks according to the dependencies, thereby improving the pipeline. construction efficiency.

203：根据N个代码块的运行顺序以及N个代码块，生成流水线。203: Generate a pipeline according to the running order of the N code blocks and the N code blocks.

示例性的，根据该N个代码块的运行顺序，自动将该N个代码块进行组合，生成流水线。Exemplarily, according to the running sequence of the N code blocks, the N code blocks are automatically combined to generate a pipeline.

举例来说，针对本申请的三个代码块，可以分别按照该三个代码块的运行顺序，将该三个代码块组成如图4所示的流水线。For example, for the three code blocks of the present application, the three code blocks can be composed into a pipeline as shown in FIG. 4 according to the running order of the three code blocks.

可以看出，在本申请实施方式中，在代码块中置有状态标识，可以直接根据状态标识确定出每个任务的代码块的运行顺序，无需通过分析代码的功能及各个代码块之间的数据流向去确定每个任务的代码块的运行顺序，从而实现快速、高效的构造出流水线。进一步地，由于每个代码块中均封装有状态标识，因此针对复杂的任务流的场景，本申请可以依照上述实现方式，分别为每个任务流生成相应的流水线，且该生成流水线的方式不会受限于硬件资源(如存储分区的数量)，从而能够支持复杂任务流的场景。It can be seen that, in the embodiment of the present application, the code block is provided with a state flag, and the running order of the code blocks of each task can be determined directly according to the state flag, without analyzing the function of the code and the relationship between each code block. The data flow direction determines the running order of the code blocks of each task, so as to realize the fast and efficient construction of the pipeline. Furthermore, since each code block is encapsulated with a state identifier, for complex task flow scenarios, the present application can generate corresponding pipelines for each task flow according to the above implementation method, and the way of generating pipelines is not It will be limited by hardware resources (such as the number of storage partitions), so that it can support complex task flow scenarios.

在本申请的一个实施方式中，该流水线中的任意两个相邻的代码块之间均包含一个同步代码块，该同步代码块用于指示在运行完流水线中的第i个代码块之后，运行第i+1个代码块，其中，i为大于或等于1的正整数，且i小于或等于N。虽然，将N个代码块按照运行顺序进行组合，但是，某些代码块的运行时间可能比较长，为了能够完全保证顺次执行该N个代码块，在两个相邻的代码块之间插入一个同步代码块，这样针对该两个相邻的两个代码块，只会在执行完前一个代码块之后，才会去执行下一个代码块。从任务执行角度来说，每个同步代码块用于实现一个同步任务，即相当于在任意两个相邻的代码块所实现的两个相邻的任务之间插入了一个同步任务，这样针对流水线中两个相邻的任务，只会在执行完前一个任务之后，才会去执行下一个任务，从而实现顺次执行流水线中的N个任务。在流水线中的任意两个相邻的代码块之间插入同步代码块，可以生成如图5所示的插入有同步代码块的流水线。In one embodiment of the present application, a synchronization code block is included between any two adjacent code blocks in the pipeline, and the synchronization code block is used to indicate that after running the i-th code block in the pipeline, Run the i+1th code block, where i is a positive integer greater than or equal to 1, and i is less than or equal to N. Although the N code blocks are combined according to the running order, the running time of some code blocks may be relatively long. In order to completely ensure the sequential execution of the N code blocks, insert between two adjacent code blocks A synchronous code block, so that for the two adjacent code blocks, the next code block will only be executed after the previous code block is executed. From the perspective of task execution, each synchronization code block is used to implement a synchronization task, which is equivalent to inserting a synchronization task between two adjacent tasks implemented by any two adjacent code blocks. Two adjacent tasks in the pipeline will only execute the next task after the previous task is executed, so as to realize the sequential execution of N tasks in the pipeline. Inserting a synchronization code block between any two adjacent code blocks in the pipeline can generate a pipeline with the synchronization code block inserted as shown in FIG. 5 .

在本申请的一个实施方式中，上述生成的流水线的数量至少为两个，即可以将该N个代码块组成两个或多个流水线。针对每个流水线，均是按照该N个代码块的运行顺序进行组合得到，不再叙述。举例来说，针对本申请的三个代码块，则可以按照各个代码块之间的顺序，组成如图6所示的三个流水线，并在每个流水线中的任意两个相邻的代码块之间均插入同步代码块，从而保证每个流水线中的三个代码块均是顺次执行。In an embodiment of the present application, the number of the generated pipelines is at least two, that is, the N code blocks can be composed into two or more pipelines. For each pipeline, it is combined according to the running order of the N code blocks, and will not be described again. For example, for the three code blocks of this application, the three pipelines shown in Figure 6 can be formed according to the order between the code blocks, and any two adjacent code blocks in each pipeline Synchronous code blocks are inserted between each pipeline to ensure that the three code blocks in each pipeline are executed sequentially.

如图2或图6所示，上述的流水线可以是指时间线形成的串行流水线。本申请的流水线还可以包括基于不同的任务状态形成的并行流水线。进一步地，本申请可根据该至少两个流水线中各个代码块的状态标识，将不同流水线中处于不同任务状态的代码块确定为并行代码块组，则该并行代码块组中的多个代码块在同一个时间单元内并行执行，该并行代码块组可以形成一个并行流水线。As shown in FIG. 2 or FIG. 6 , the aforementioned pipeline may refer to a serial pipeline formed by timelines. The pipelines of the present application may also include parallel pipelines formed based on different task states. Further, the present application can determine the code blocks in different task states in different pipelines as a parallel code block group according to the state identification of each code block in the at least two pipelines, then the multiple code blocks in the parallel code block group Executed in parallel within the same time unit, the group of parallel code blocks can form a parallel pipeline.

具体地，该至少两个流水线可以有优先级。比如，针对图6组成的流水线来说，最上面的一个流水线的优先级最高，其他流水线的优先级依次降低。本申请中至少两个流水线的优先级可以是基于不同的迭代周期确定的。例如，该至少两个流水线可以是同一个循环中的不同迭代周期。从时间线上看，在先被执行的流水线的优先级高于在后被执行的流水线的优先级，其中，在先被执行的流水线所处的迭代周期小于在后被执行的流水线所处的迭代周期。如图6所示，一个循环中可以包括3个迭代周期，其中，3个迭代周期按照其优先级分别对应图6中的三条流水线。本申请中为提高不同流水线之间任务的并行度，提高任务的执行效率，可以通过将不同流水线中处于不同任务状态的代码块确定为并行代码块组，则该并行代码块组中的多个代码块在同一个时间单元内并行执行，从而可以实现不同代码块所实现任务的任务流水线，提高运算效率和硬件利用率。Specifically, the at least two pipelines may have priorities. For example, for the pipelines formed in FIG. 6 , the top pipeline has the highest priority, and the other pipelines have lower priorities in turn. In this application, the priorities of at least two pipelines may be determined based on different iteration periods. For example, the at least two pipelines may be different iteration cycles in the same loop. From the perspective of the timeline, the priority of the pipeline executed earlier is higher than that of the pipeline executed later, and the iteration period of the pipeline executed earlier is smaller than that of the pipeline executed later iteration cycle. As shown in FIG. 6 , one cycle may include three iteration cycles, wherein the three iteration cycles correspond to the three pipelines in FIG. 6 according to their priorities. In this application, in order to improve the parallelism of tasks between different pipelines and improve the execution efficiency of tasks, code blocks in different task states in different pipelines can be determined as parallel code block groups, and then multiple parallel code block groups in the parallel code block groups The code blocks are executed in parallel in the same time unit, so that the task pipeline of the tasks implemented by different code blocks can be realized, and the operation efficiency and hardware utilization rate can be improved.

由于N个代码块分别用于实现不同的串行任务，且相邻两个任务之间分别调用不同的硬件资源(如相邻两个任务分别调用存储和计算两个不同的资源)，因此，针对该至少两个流水线，组成代码块组的方式可以为：Since the N code blocks are respectively used to implement different serial tasks, and two adjacent tasks call different hardware resources respectively (for example, two adjacent tasks call two different resources for storage and calculation respectively), therefore, For the at least two pipelines, the way to form a code block group can be:

在执行第一个流水线中的第k个代码块时，并行执行第二个流线水中的第k-1个代码块，即将第一流水线中的第k个代码块和第二个流水线中的第k-1个代码块作为一个代码块组；其中，第一流水线为至少两个流水线中的任意一个，第二个流水线为至少两个流水线中的第一个流水线的下一级流水线，k的取值为从2到N的整数。When the kth code block in the first pipeline is executed, the k-1th code block in the second pipeline is executed in parallel, that is, the kth code block in the first pipeline and the kth code block in the second pipeline The k-1th code block is used as a code block group; wherein, the first pipeline is any one of at least two pipelines, and the second pipeline is the next-level pipeline of the first pipeline in at least two pipelines, k The value of is an integer from 2 to N.

具体地，对于硬件资源来说是有限的，一般不会在一个时间单元内同时执行任务状态相同的两个任务，因此，针对一种任务状态的任务，在一个时间单元内，会分配相应的硬件资源来执行该任务。示例性的，针对本申请的三个任务来说，在第一个时间单元，可以为第一个流水线的数据加载任务分配存储资源，从而执行第一个流水线的第一个代码块；在第二个时间单元内，由于第一个流水线的第一个代码块已经执行完成，则可以为第一个流水线的数据运算任务分配计算资源，并且由于存储资源闲置，则可以将第一个流水线的第二个代码块和第二个流水线的第一个代码块作为一个代码块组，因此在第二时间单元内可以使用该存储资源和计算资源并行执行该代码块组中的两个代码块。依次类推，在第三个时间单元内，可以将第一个流水线中的第三个代码块、第二个流水线中的第二个代码块以及第三个流水线中的第一个代码块作为一个代码块组，在第三个时间单元内可以使用存储资源和计算资源并行执行该代码块组中的三个代码块。因此，针对多条流水线中均包含有N个代码块的情况，在第N个时间单元内，可以将该N个代码块作为一个代码块组，从而并行执行N个代码块，实现N个任务的并行执行。Specifically, hardware resources are limited, and generally two tasks with the same task status will not be executed simultaneously within a time unit. Therefore, for a task with a task status, within a time unit, the corresponding hardware resources to perform the task. Exemplarily, for the three tasks of this application, in the first time unit, storage resources can be allocated for the data loading task of the first pipeline, so as to execute the first code block of the first pipeline; Within two time units, since the first code block of the first pipeline has been executed, computing resources can be allocated to the data operation tasks of the first pipeline, and because storage resources are idle, the first pipeline can be allocated The second code block and the first code block of the second pipeline serve as a code block group, so the storage resources and computing resources can be used to execute the two code blocks in the code block group in parallel within the second time unit. By analogy, in the third time unit, the third code block in the first pipeline, the second code block in the second pipeline, and the first code block in the third pipeline can be used as a In the code block group, three code blocks in the code block group can be executed in parallel by using storage resources and computing resources in the third time unit. Therefore, in the case that multiple pipelines contain N code blocks, in the Nth time unit, the N code blocks can be regarded as a code block group, so that N code blocks can be executed in parallel to realize N tasks of parallel execution.

进一步地，本申请还用于在不同流水线之间插入同步任务，以保证代码块能够并行执行。也就是说，在每个迭代周期之间的相邻代码块中间插入同步代码块，以实现同步任务。例如，在第一个迭代周期中用于完成数据存储任务的代码块和第二个迭代周期中用于完成数据加载任务的代码块之间插入同步代码块，以实现同步任务。Furthermore, the present application is also used to insert synchronization tasks between different pipelines to ensure that code blocks can be executed in parallel. That is to say, a synchronous code block is inserted between adjacent code blocks between each iteration cycle to realize synchronous tasks. For example, a synchronization code block is inserted between the code block for completing the data storage task in the first iteration cycle and the code block for completing the data loading task in the second iteration cycle, so as to realize the synchronization task.

在本申请的一个实施方式中，所述方法还可以包括为该代码块组分配内存。In an embodiment of the present application, the method may further include allocating memory for the code block group.

可选地，该内存的大小为该并行代码块组中代码块的数量与预设存储空间大小的乘积，该内存的大小还可以等于并行代码块中所能容纳的代码块的最大数量。其中，该预设空间大小与待处理数据大小相关。可以理解，针对每个代码块组来说，在同一个时间单元内，需要并行执行该代码块组中的所有代码块，则每个代码块需要有相应的内存缓存该代码块的运行结果，因此根据代码块组中的代码块的数量一次性分配为所有的代码块分配内存，无需单独为每个代码块分配内存，可以提高内存的分配效率。Optionally, the size of the memory is the product of the number of code blocks in the parallel code block group and the size of the preset storage space, and the size of the memory may also be equal to the maximum number of code blocks that can be accommodated in the parallel code block group. Wherein, the preset space size is related to the size of the data to be processed. It can be understood that for each code block group, within the same time unit, all code blocks in the code block group need to be executed in parallel, and each code block needs to have a corresponding memory to cache the running result of the code block. Therefore, according to the number of code blocks in the code block group, memory is allocated for all code blocks at one time, without allocating memory for each code block separately, which can improve memory allocation efficiency.

例如，待处理数据为二维数据，则该二维数据的大小可以用m乘以n表示，其中，m用于表示待处理数据在第一个维度上的大小，n用于表示待处理数据在第二个维度上的大小。如原始代码要完成该待处理数据的处理过程，则需m个迭代周期，每个迭代周期处理数据大小为n的待处理数据。每个迭代周期内均执行图6中沿着时间线的一个流水线的至少三个代码块，即每个迭代周期内按顺序完成数据加载任务、数据计算任务和数据存储任务。针对每个迭代周期的数据处理，需要分配一预设存储空间以完成数据大小为n的待处理数据的处理过程。本申请实施例中，为支持多层级流水线的实现，可以一次性为整个并行代码块组分配内存，其中，该内存的大小可以等于并行代码块中的数量N乘以单个迭代周期内所需的预设存储空间大小n。如图6所示，该内存的大小可以为3n。For example, if the data to be processed is two-dimensional data, the size of the two-dimensional data can be represented by m times n, where m is used to represent the size of the data to be processed in the first dimension, and n is used to represent the data to be processed Size in the second dimension. If the original code needs to complete the processing of the data to be processed, m iteration cycles are required, and each iteration cycle processes the data to be processed with a data size of n. At least three code blocks of a pipeline along the timeline in FIG. 6 are executed in each iteration cycle, that is, data loading tasks, data calculation tasks, and data storage tasks are completed in sequence in each iteration cycle. For the data processing of each iterative cycle, a preset storage space needs to be allocated to complete the processing process of the data to be processed with a data size of n. In the embodiment of the present application, in order to support the implementation of multi-level pipelines, memory can be allocated for the entire parallel code block group at one time, wherein the size of the memory can be equal to the number N in the parallel code block multiplied by the number required in a single iteration cycle The preset storage space size n. As shown in Figure 6, the size of this memory may be 3n.

在本申请的一个实施方式中，可以包括自动流水线生成模式和普通模式，其中，自动流水线模式是指根据代码块中的状态标识，自动生成流水线，如上述所示的方法步骤。普通模式是指按照传统的代码分析方式生成流水线的方式。为支持上述两种模式之间的切换，本公开实施还可以在原始代码中设置用于实现该模式切换的预设标志位。具体地，在解析原始代码的过程中，当确定该预设标志位的取值为预设值时，则确定使能自动流水线生成模式，当确定该预设标志位的取值不是预设值时，则确定使能普通模式，其中，该预设值可以为1或者其他值。In an embodiment of the present application, an automatic pipeline generation mode and a normal mode may be included, wherein the automatic pipeline mode refers to automatically generating a pipeline according to the state identification in a code block, such as the method steps shown above. Normal mode refers to the way to generate pipelines according to the traditional code analysis method. In order to support switching between the above two modes, the implementation of the present disclosure may also set a preset flag bit for implementing the mode switching in the original code. Specifically, in the process of parsing the original code, when it is determined that the value of the preset flag is a preset value, it is determined to enable the automatic pipeline generation mode, and when it is determined that the value of the preset flag is not a preset value , it is determined to enable the normal mode, where the preset value can be 1 or other values.

在本实施方式中，预先设置标志位，当读取到预设标志位的取值为预设值时，确定需要使能自动流水线生成模式，这个时候会去获取N个代码块，并将N个代码块按照图2或图6示出的方式组成流水线。具体地，如果确定使能该自动流水线生成模式，则可以自动完成上述的内存分配操作。若该预设标志位的取值不是预设值，则确定开启普通模式，这个时候就不用获取N个代码块，直接根据原始代码的编写顺序，按照传统的代码分析方式生成流水线。In this embodiment, the flag bit is set in advance. When the value of the preset flag bit is read as the preset value, it is determined that the automatic pipeline generation mode needs to be enabled. At this time, N code blocks will be acquired, and N A code block forms a pipeline in the manner shown in Figure 2 or Figure 6. Specifically, if it is determined to enable the automatic pipeline generation mode, the above memory allocation operation may be automatically completed. If the value of the preset flag bit is not the default value, it is determined to enable the normal mode. At this time, there is no need to obtain N code blocks, and the pipeline is generated directly according to the writing sequence of the original code and according to the traditional code analysis method.

在本申请的一个实施方式中，上述方法还可以包括：In one embodiment of the present application, the above method may also include:

将N个代码块形成的原始代码编译为目标代码，其中，该目标代码可以为类C语言表示的代码(如CudaC)。本申请实施例中，通过上述对原始代码进行自动的流水线优化，可以将完成流水线优化的原始代码编译生成目标代码。进一步地，本申请还可以将该目标代码编译为硬件平台能够执行的二进制指令。该硬件平台包括但不限于处理单元和存储单元，其中的处理单元和存储单元可以按照图6所示的流水线的方式完成相应的操作。The original code formed by the N code blocks is compiled into an object code, where the object code can be a code expressed in a C-like language (such as CudaC). In the embodiment of the present application, through the above-mentioned automatic pipeline optimization of the original code, the pipeline-optimized original code can be compiled to generate the target code. Furthermore, the present application can also compile the object code into binary instructions executable by the hardware platform. The hardware platform includes, but is not limited to, a processing unit and a storage unit, wherein the processing unit and the storage unit can complete corresponding operations in a pipeline manner as shown in FIG. 6 .

参阅图7，图7本申请实施例提供的一种流水线生成装置的功能单元组成框图。流水线生成装置700包括：Referring to FIG. 7 , FIG. 7 is a block diagram of functional units of a pipeline generation device provided by an embodiment of the present application. The pipeline generation device 700 includes:

获取单元701，用于获取N个代码块，其中，每个所述代码块中包含有用于确定该代码块的运行顺序的状态标识，N为大于或者等于2的整数；An acquisition unit 701, configured to acquire N code blocks, wherein each of the code blocks contains a status identifier for determining the running sequence of the code block, and N is an integer greater than or equal to 2;

处理单元702，用于根据每个所述代码块中封装的状态标识，确定所述N个代码块的运行顺序；A processing unit 702, configured to determine the running order of the N code blocks according to the status identifier encapsulated in each of the code blocks;

在本申请的一个实施方式中，所述状态标识用于指示所述代码块所实现的任务的任务状态，在根据每个所述代码块中封装的状态标识，确定所述N个代码块的运行顺序方面，所述处理单元702，具体用于：In one embodiment of the present application, the status identifier is used to indicate the task status of the task implemented by the code block, and according to the status identifier encapsulated in each of the code blocks, determine the status of the N code blocks In terms of running sequence, the processing unit 702 is specifically used to:

在本申请的一个实施方式中，每个所述代码块中还包含有位置标识，该位置标识用于标识该代码块在原始代码中的起始位置；在获取N个代码块方面，所述获取单元702，具体用于：In one embodiment of the present application, each code block also includes a position identifier, which is used to identify the starting position of the code block in the original code; in terms of obtaining N code blocks, the The acquiring unit 702 is specifically used for:

在本申请的一个实施方式中，获取N个代码块之前，所述获取单元701，还用于获取预设标志位的取值；所述处理单元702，还用于确定所述预设标志位的取值为预设值时，执行获取N个代码块的操作。In one embodiment of the present application, before acquiring N code blocks, the acquiring unit 701 is also used to acquire the value of the preset flag bit; the processing unit 702 is also used to determine the preset flag bit When the value of is a preset value, the operation of obtaining N code blocks is performed.

在本申请的一个实施方式中，所述流水线的数量至少为两个，所述处理单元702，还用于：In one embodiment of the present application, the number of the pipelines is at least two, and the processing unit 702 is further configured to:

在本申请的一个实施方式中，在根据至少两个流水线中各个代码块的状态标识，将不同流水线中处于不同任务状态的代码块确定为并行代码块组方面，所述处理单元702，具体用于：In one embodiment of the present application, the processing unit 702 specifically uses At:

在本申请的一个实施方式中，所述处理单元702，还用于为所述并行代码块组分配内存，其中，所述内存的大小为所述并行代码组中代码块的数量与预设存储空间大小的乘积。In one embodiment of the present application, the processing unit 702 is further configured to allocate memory for the parallel code block group, where the size of the memory is the number of code blocks in the parallel code group and the preset storage The product of the size of the space.

在本申请的一个实施方式中，所述处理单元702，还用于将所述N个代码块形成的原始代码编译为目标代码。In one embodiment of the present application, the processing unit 702 is further configured to compile the original code formed by the N code blocks into an object code.

参阅图8，图8为本申请实施例提供的一种电子设备的结构示意图。如图8所示，电子设备800包括收发器801、处理器802和存储器803。它们之间通过总线804连接。存储器803用于存储计算机程序和数据，并可以将存储器803存储的数据传输给处理器802。Referring to FIG. 8 , FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 8 , an electronic device 800 includes a transceiver 801 , a processor 802 and a memory 803 . They are connected through a bus 804 . The memory 803 is used to store computer programs and data, and can transmit the data stored in the memory 803 to the processor 802 .

处理器802用于读取存储器803中的计算机程序执行以下操作：The processor 802 is used to read the computer program in the memory 803 to perform the following operations:

其中，处理器802的具体功能可以参照上述的处理单元702以及获取单元701的具体功能，不再叙述。For the specific functions of the processor 802, reference may be made to the specific functions of the above-mentioned processing unit 702 and the obtaining unit 701, and will not be described again.

本申请实施例还提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行以实现如上述方法实施例中记载的任何一种代码编译方法的部分或全部步骤。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement any code compilation method as described in the above-mentioned method embodiments some or all of the steps.

本申请实施例还提供一种计算机程序产品，所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种代码编译方法的部分或全部步骤。The embodiment of the present application also provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to enable the computer to execute the method described in the above method embodiments Some or all of the steps of any one method of code compilation.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于可选实施例，所涉及的动作和模块并不一定是本申请所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Depending on the application, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily required by the application.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置，可通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件程序模块的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented not only in the form of hardware, but also in the form of software program modules.

所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储器中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储器中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated units may be stored in a computer-readable memory if implemented in the form of a software program module and sold or used as an independent product. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. Several instructions are included to make a computer device (which may be a personal computer, server or network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储器中，存储器可以包括：闪存盘、只读存储器(英文：Read-Only Memory，简称：ROM)、随机存取器(英文：Random Access Memory，简称：RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviated: ROM), random access device (English: Random Access Memory, abbreviated: RAM), magnetic disk or optical disk, etc.

以上对本申请实施例进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The embodiments of the present application have been introduced in detail above, and specific examples have been used in this paper to illustrate the principles and implementation methods of the present application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application; meanwhile, for Those skilled in the art will have changes in specific implementation methods and application scopes based on the ideas of the present application. In summary, the contents of this specification should not be construed as limiting the present application.

Claims

1. A code compiling method, comprising:

acquiring N code blocks, wherein each code block comprises a state identifier for determining the running sequence of the code block, and N is an integer greater than or equal to 2;

Determining the running sequence of the N code blocks according to the state identifier packaged in each code block;

and generating a pipeline according to the running sequence of the N code blocks and the N code blocks.

2. The method of claim 1, wherein the status identifiers are used to indicate task status of tasks implemented by the code blocks, and wherein determining the running order of the N code blocks according to the status identifiers encapsulated in each code block comprises:

determining the dependency relationship among the N code blocks according to the task state indicated by the state identification in the N code blocks;

and determining the running sequence of the N code blocks according to the dependency relationship among the N code blocks.

3. The method according to claim 1 or 2, wherein the task states of the tasks implemented by the N code blocks include a data loading task, a data operation task and a data storage task;

in one pipeline, the code blocks implementing the data loading task, the code blocks of the data operation task, and the code blocks of the data storage task are sequentially executed.

4. A method according to any one of claims 1-3, wherein each of said code blocks further comprises a location identifier for identifying the starting location of the code block in the original code; the acquiring N code blocks includes:

And taking a code between a j-th position identifier and a j+1th position identifier in the original code and the j-th position identifier as a j-th code block, wherein the j value is an integer from 1 to N.

5. The method of any of claims 1-4, wherein the number of pipelines is at least two, the method further comprising:

according to the state identification of each code block in at least two pipelines, code blocks in different task states in different pipelines are determined to be parallel code block groups, and a plurality of code blocks in the parallel code block groups can be executed in parallel in the same time unit.

6. The method of claim 5, wherein determining code blocks in different task states in different pipelines as parallel code block groups based on the state identification of each code block in at least two pipelines, comprises:

executing a kth code block in a first pipeline in parallel with a kth-1 code block in a second pipeline, the task state of the kth-1 code block being different from the task state of the first pipeline;

the first pipeline is any one of the at least two pipelines, the second pipeline is a next stage pipeline of the first pipeline of the at least two pipelines, k is a positive integer greater than or equal to 2, and k is less than or equal to N.

7. The method according to claim 5 or 6, characterized in that the method further comprises:

and allocating a memory for the parallel code block group, wherein the size of the memory is the product of the number of the code blocks in the parallel code block group and the size of a preset storage space.

8. The method according to any one of claims 1-7, further comprising:

acquiring the value of a preset marker bit;

and when the value of the preset flag bit is determined to be a preset value, entering an automatic pipeline generation mode to automatically generate a pipeline according to the state identification packaged in each code block.

9. The method according to any one of claims 1-8, further comprising:

and compiling the original codes formed by the N code blocks into target codes.

10. An electronic device, comprising: a processor and a memory, the processor being connected to the memory, the memory being for storing a computer program, the processor being for executing the computer program stored in the memory to cause the electronic device to perform the method of any one of claims 1-9.

11. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is executed by a processor to implement the method of any one of claims 1-9.