[go: up one dir, main page]

CN114861870A - A method, apparatus and device for configuring a neural network architecture - Google Patents

A method, apparatus and device for configuring a neural network architecture Download PDF

Info

Publication number
CN114861870A
CN114861870A CN202210372773.7A CN202210372773A CN114861870A CN 114861870 A CN114861870 A CN 114861870A CN 202210372773 A CN202210372773 A CN 202210372773A CN 114861870 A CN114861870 A CN 114861870A
Authority
CN
China
Prior art keywords
environment
neural network
decision
trained
architecture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210372773.7A
Other languages
Chinese (zh)
Inventor
徐波
唐伟
徐博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202210372773.7A priority Critical patent/CN114861870A/en
Publication of CN114861870A publication Critical patent/CN114861870A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本发明公开了一种配置神经网络架构的方法、装置及设备,其中,所述方法包括:接入待训练神经网络的决策问题;根据所述决策问题,得到所述决策问题的第一环境;将所述决策问题与所述第一环境进行封装,得到封装后的第二环境;接入所述待训练神经网络;根据所述第二环境以及所述待训练神经网络,接入架构算法;将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性;根据所述轨迹数据属性对所述待训练神经网络进行优化,得到配置架构后的待训练神经网络。通过上述方式,本发明提高了神经网络架构配置的通用性和扩展性。

Figure 202210372773

The invention discloses a method, device and device for configuring a neural network architecture, wherein the method comprises: accessing a decision problem of a neural network to be trained; obtaining a first environment of the decision problem according to the decision problem; Encapsulate the decision problem with the first environment to obtain an encapsulated second environment; access the neural network to be trained; access the architecture algorithm according to the second environment and the neural network to be trained; The second environment, the neural network to be trained and the architecture algorithm are adapted to generate trajectory data attributes; the neural network to be trained is optimized according to the trajectory data attributes to obtain the configuration architecture to be trained Neural Networks. In the above manner, the present invention improves the versatility and expansibility of the neural network architecture configuration.

Figure 202210372773

Description

一种配置神经网络架构的方法、装置及设备A method, apparatus and device for configuring a neural network architecture

技术领域technical field

本发明涉及机器学习技术领域,具体涉及一种配置神经网络架构的方法、装置及设备。The present invention relates to the technical field of machine learning, and in particular, to a method, apparatus and device for configuring a neural network architecture.

背景技术Background technique

深度强化学习是将深度学习和具有决策能力的强化学习融合,形成能够直接处理高维复杂信息作为输入的优化决策方法。深度强化学习不仅能够为强化学习带来端到端自动优化的便利,而且使得强化学习不再受限于低维的空间中,可以解决更加复杂的问题决策,极大地拓展了强化学习的使用范围。Deep reinforcement learning is a combination of deep learning and reinforcement learning with decision-making ability to form an optimal decision-making method that can directly process high-dimensional complex information as input. Deep reinforcement learning can not only bring the convenience of end-to-end automatic optimization for reinforcement learning, but also make reinforcement learning no longer limited to low-dimensional space, and can solve more complex decision-making problems, which greatly expands the application scope of reinforcement learning. .

深度强化学习近年来也越来越多的被应用到多个智能决策领域,但目前深度强化学习仍处于快速发展阶段,涌现了各种类型的算法,形成了多样的问题环境。在算法层面,各类算法的形式和实现细节难以统一,深度强化学习算法由于其计算模式不固定,并且需要高并发,因此计算机视觉、自然语言处理、语音处理领域的成熟训练框架不适合深度强化学习领域;在环境层面,一方面,需要解决的问题环境多种多样,使得应用深度强化学习到各类智能决策问题的环境中没有统一规范,往往需要从零开始,工作量巨大;另外一方面,很多问题中只提供了裁决引擎,并未提供如何将裁决引擎转化为问题环境的标准化流程,使得从裁决引擎到环境的实现过程没有统一的框架流程作为参考。Deep reinforcement learning has been increasingly applied to many intelligent decision-making fields in recent years, but at present, deep reinforcement learning is still in the stage of rapid development, and various types of algorithms have emerged, forming a variety of problem environments. At the algorithm level, it is difficult to unify the forms and implementation details of various algorithms. The deep reinforcement learning algorithm is not suitable for deep reinforcement because its computing mode is not fixed and requires high concurrency, so mature training frameworks in the fields of computer vision, natural language processing, and speech processing are not suitable Learning field; at the environmental level, on the one hand, the problem environment that needs to be solved is diverse, so that there is no unified standard for the application of deep reinforcement learning to various intelligent decision-making problems, and it is often necessary to start from scratch, and the workload is huge; on the other hand , many problems only provide an adjudication engine, but do not provide a standardized process for how to transform the adjudication engine into the problem environment, so that there is no unified framework process for the implementation process from the adjudication engine to the environment as a reference.

针对上述算法和环境两个层面的问题,目前已有一些框架项目尝试解决,但目前的框架抽象复杂,均存在通用性和扩展性不足等问题。由于这些不足,各个框架实现的易用性也很难得到保证。In response to the above-mentioned problems at the two levels of algorithm and environment, some framework projects have tried to solve them, but the current frameworks are abstract and complex, and all have problems such as lack of versatility and scalability. Due to these shortcomings, the ease of use of each framework implementation is also difficult to guarantee.

发明内容SUMMARY OF THE INVENTION

为解决上述问题,提出了本发明实施例的配置神经网络架构的方法、装置及设备。In order to solve the above problems, a method, an apparatus, and a device for configuring a neural network architecture according to embodiments of the present invention are proposed.

根据本发明实施例的一个方面,提供了一种配置神经网络架构的方法,包括:According to an aspect of the embodiments of the present invention, a method for configuring a neural network architecture is provided, including:

接入待训练神经网络的决策问题;Access the decision-making problem of the neural network to be trained;

根据所述决策问题,得到所述决策问题的第一环境;According to the decision-making problem, obtain the first environment of the decision-making problem;

将所述决策问题与所述第一环境进行封装,得到封装后的第二环境;Encapsulating the decision problem and the first environment to obtain an encapsulated second environment;

接入所述待训练神经网络;access the neural network to be trained;

根据所述第二环境以及所述待训练神经网络,接入架构算法;Access the architecture algorithm according to the second environment and the neural network to be trained;

将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性;Adapting the second environment, the neural network to be trained, and the architecture algorithm to generate trajectory data attributes;

根据所述轨迹数据属性对所述待训练神经网络进行优化,得到配置架构后的待训练神经网络。The neural network to be trained is optimized according to the attributes of the trajectory data to obtain the neural network to be trained after the configuration architecture.

可选的,根据所述决策问题,得到所述决策问题的第一环境,包括:Optionally, according to the decision-making problem, the first environment of the decision-making problem is obtained, including:

若所述决策问题中带有环境,则根据所述决策问题中带有的环境得到所述第一环境;If there is an environment in the decision-making problem, obtaining the first environment according to the environment in the decision-making problem;

若所述决策问题中没有环境,则对所述决策问题定义新环境,得到定义后的第三环境,再根据所述第三环境,得到所述第一环境。If there is no environment in the decision-making problem, define a new environment for the decision-making problem, obtain a defined third environment, and then obtain the first environment according to the third environment.

可选的,若所述决策问题中带有环境,则根据所述决策问题中带有的环境得到所述第一环境,包括:Optionally, if there is an environment in the decision-making problem, the first environment is obtained according to the environment in the decision-making problem, including:

若所述决策问题中带有的环境满足预设条件,则将所述决策问题中带有的环境确定为所述第一环境;If the environment included in the decision-making problem satisfies the preset condition, determining the environment included in the decision-making problem as the first environment;

若所述决策问题中带有的环境不满足预设条件,则将所述决策问题中带有的环境与通用环境进行兼容,将兼容后的环境确定为所述第一环境。If the environment included in the decision-making problem does not satisfy the preset condition, the environment included in the decision-making problem is compatible with the general environment, and the compatible environment is determined as the first environment.

可选的,根据所述第三环境,得到所述第一环境,包括:Optionally, obtaining the first environment according to the third environment, including:

将所述第三环境与通用环境进行兼容,将兼容后的环境确定为所述第一环境。The third environment is compatible with the general environment, and the compatible environment is determined as the first environment.

可选的,在得到封装后的第二环境之后,还包括:Optionally, after obtaining the encapsulated second environment, it further includes:

根据所述第二环境,得到所述第二环境的状态空间,所述状态空间是指在所述待训练神经网络运行时,所产生状态的空间。According to the second environment, the state space of the second environment is obtained, and the state space refers to the space of states generated when the neural network to be trained is running.

可选的,在接入架构算法之后,还包括:Optionally, after accessing the architecture algorithm, it further includes:

根据所述架构算法,判断所述待训练神经网络预生成数据的属性。According to the architecture algorithm, the attributes of the pre-generated data of the neural network to be trained are determined.

可选的,将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性,包括:Optionally, the second environment, the neural network to be trained, and the architecture algorithm are adapted to generate trajectory data attributes, including:

将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,再根据适配后的所述第二环境的状态空间、适配后的所述待训练神经网络以及适配后的所述待训练神经网络预生成数据,生成轨迹数据属性。Adapting the second environment, the neural network to be trained and the architecture algorithm, and then according to the state space of the second environment after adaptation, the neural network to be trained after adaptation, and the adaptation The latter neural network to be trained pre-generates data to generate trajectory data attributes.

根据本发明实施例的另一方面,提供了一种配置神经网络架构的装置,所述装置包括:According to another aspect of the embodiments of the present invention, there is provided an apparatus for configuring a neural network architecture, the apparatus comprising:

第一接入模块,用于接入待训练神经网络的决策问题;The first access module is used to access the decision problem of the neural network to be trained;

处理模块,用于根据所述决策问题,得到所述决策问题的第一环境;将所述决策问题与所述第一环境进行封装,得到封装后的第二环境;a processing module, configured to obtain a first environment of the decision-making problem according to the decision-making problem; encapsulate the decision-making problem and the first environment to obtain an encapsulated second environment;

第二接入模块,用于接入所述待训练神经网络;a second access module, configured to access the neural network to be trained;

第三接入模块,用于根据所述第二环境以及所述待训练神经网络,接入架构算法;a third access module, configured to access the architecture algorithm according to the second environment and the neural network to be trained;

适配模块,用于将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性;根据所述轨迹数据属性对所述待训练神经网络进行优化,得到配置架构后的待训练神经网络。an adaptation module, configured to adapt the second environment, the neural network to be trained and the architecture algorithm to generate trajectory data attributes; optimize the neural network to be trained according to the trajectory data attributes to obtain The neural network to be trained after configuring the architecture.

根据本发明实施例的又一方面,提供了一种计算设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;According to yet another aspect of the embodiments of the present invention, a computing device is provided, including: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface complete each other through the communication bus communication between;

所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行上述配置神经网络架构的方法对应的操作。The memory is used for storing at least one executable instruction, and the executable instruction enables the processor to perform operations corresponding to the above method for configuring a neural network architecture.

根据本发明实施例的再一方面,提供了一种计算机存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行如上述配置神经网络架构的方法对应的操作。According to yet another aspect of the embodiments of the present invention, a computer storage medium is provided, where at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to execute a method corresponding to the above-mentioned method for configuring a neural network architecture. operate.

根据本发明上述实施例提供的方案,通过接入待训练神经网络的决策问题;根据所述决策问题,得到所述决策问题的第一环境;将所述决策问题与所述第一环境进行封装,得到封装后的第二环境;接入所述待训练神经网络;根据所述第二环境以及所述待训练神经网络,接入架构算法;将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性;根据所述轨迹数据属性对所述待训练神经网络进行优化,得到配置架构后的待训练神经网络,提高了神经网络架构配置的通用性和扩展性。According to the solution provided by the above embodiments of the present invention, the decision problem of the neural network to be trained is accessed by accessing the decision problem; according to the decision problem, the first environment of the decision problem is obtained; the decision problem and the first environment are encapsulated , obtain the encapsulated second environment; access the neural network to be trained; access the architecture algorithm according to the second environment and the neural network to be trained; connect the second environment and the neural network to be trained and the architecture algorithm is adapted to generate trajectory data attributes; the neural network to be trained is optimized according to the trajectory data attributes to obtain the neural network to be trained after the configuration architecture, which improves the versatility of the neural network architecture configuration. Extensibility.

上述说明仅是本发明实施例技术方案的概述,为了能够更清楚了解本发明实施例的技术手段,而可依照说明书的内容予以实施,并且为了让本发明实施例的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明实施例的具体实施方式。The above description is only an overview of the technical solutions of the embodiments of the present invention. In order to understand the technical means of the embodiments of the present invention more clearly, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and The advantages can be more obvious and easy to understand, and the following specific implementations of the embodiments of the present invention are given.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明实施例的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for the purpose of illustrating the preferred embodiments, and are not considered to be limitations of the embodiments of the present invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

图1示出了本发明实施例提供的配置神经网络架构的方法流程图;1 shows a flowchart of a method for configuring a neural network architecture provided by an embodiment of the present invention;

图2示出了本发明实施例提供的一种具体的配置神经网络架构的方法流程图;2 shows a flowchart of a specific method for configuring a neural network architecture provided by an embodiment of the present invention;

图3示出了图2所示的一种具体的配置神经网络架构的方法中的架构抽象分层示意图;Fig. 3 shows a schematic diagram of architecture abstraction in a specific method for configuring a neural network architecture shown in Fig. 2;

图4示出了本发明实施例提供的配置神经网络架构的装置的结构示意图;4 shows a schematic structural diagram of an apparatus for configuring a neural network architecture provided by an embodiment of the present invention;

图5示出了本发明实施例提供的计算设备的结构示意图。FIG. 5 shows a schematic structural diagram of a computing device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例,然而应当理解,可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本发明,并且能够将本发明的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present invention will be more thoroughly understood, and will fully convey the scope of the present invention to those skilled in the art.

图1示出了本发明实施例提供的配置神经网络架构的方法流程图。如图1所示,该方法包括以下步骤:FIG. 1 shows a flowchart of a method for configuring a neural network architecture provided by an embodiment of the present invention. As shown in Figure 1, the method includes the following steps:

步骤11,构接入待训练神经网络的决策问题;Step 11, construct the decision problem of accessing the neural network to be trained;

步骤12,根据所述决策问题,得到所述决策问题的第一环境;Step 12, obtaining the first environment of the decision-making problem according to the decision-making problem;

步骤13,将所述决策问题与所述第一环境进行封装,得到封装后的第二环境;Step 13, encapsulating the decision-making problem and the first environment to obtain an encapsulated second environment;

步骤14,接入所述待训练神经网络;Step 14, accessing the neural network to be trained;

步骤15,根据所述第二环境以及所述待训练神经网络,接入架构算法;Step 15, according to the second environment and the neural network to be trained, access the architecture algorithm;

步骤16,将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性;Step 16, adapting the second environment, the neural network to be trained and the architecture algorithm to generate trajectory data attributes;

步骤17,根据所述轨迹数据属性对所述待训练神经网络进行优化,得到配置架构后的待训练神经网络。Step 17: Optimizing the neural network to be trained according to the attributes of the trajectory data to obtain the neural network to be trained after the configuration architecture.

该实施例中,通过接入待训练神经网络的决策问题;根据所述决策问题,得到所述决策问题的第一环境;将所述决策问题与所述第一环境进行封装,得到封装后的第二环境;接入所述待训练神经网络;根据所述第二环境以及所述待训练神经网络,接入架构算法;将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性;根据所述轨迹数据属性对所述待训练神经网络进行优化,得到配置架构后的待训练神经网络,提高了神经网络架构配置的通用性和扩展性。In this embodiment, the decision problem of the neural network to be trained is accessed; according to the decision problem, the first environment of the decision problem is obtained; the decision problem and the first environment are encapsulated to obtain the encapsulated second environment; access the neural network to be trained; access the architecture algorithm according to the second environment and the neural network to be trained; connect the second environment, the neural network to be trained and the architecture algorithm Perform adaptation to generate trajectory data attributes; optimize the neural network to be trained according to the trajectory data attributes to obtain the neural network to be trained after the configuration architecture, which improves the versatility and scalability of the configuration of the neural network architecture.

在本发明的一可选的实施例中,步骤12可以包括:In an optional embodiment of the present invention, step 12 may include:

步骤121,若所述决策问题中带有环境,则根据所述决策问题中带有的环境得到所述第一环境,所述第一环境为可运行环境;Step 121, if the decision-making problem has an environment, obtain the first environment according to the environment included in the decision-making problem, and the first environment is a runnable environment;

步骤122,若所述决策问题中没有环境,则对所述决策问题定义新环境,得到定义后的第三环境,再根据所述第三环境,得到所述第一环境。Step 122, if there is no environment in the decision-making problem, define a new environment for the decision-making problem, obtain a defined third environment, and then obtain the first environment according to the third environment.

图2示出了本发明实施例提供的一种具体的配置神经网络架构的方法流程图,如图2所示,在该实施例中,根据所述决策问题已有的环境或者引擎,确定环境接入流程。若目前决策问题的场景提供的仅仅是裁决引擎,则需要根据问题具体需求对环境进行实现,实现过程中若需要根据问题本身是否为完全信息、是否为回合制和是否为实时制等约束进行问题层的接入,但不仅限于如上所述,则可以根据问题的完全信息属性和回合制属性来进行环境实现。例如,针对具体的五子棋问题场景,若该决策问题场景还没有满足标准gym接口的环境实现,但是已经有了“某方五个棋子在横线、竖线或者斜线上连续排布即获胜”的裁决规则的实现引擎,则可以将“某方五个棋子在横线、竖线或者斜线上连续排布即获胜”的裁决规则与标准gym接口进行兼容。Fig. 2 shows a flowchart of a specific method for configuring a neural network architecture provided by an embodiment of the present invention. As shown in Fig. 2, in this embodiment, the environment is determined according to the existing environment or engine of the decision problem. access process. If the current decision-making problem scenario provides only the adjudication engine, the environment needs to be implemented according to the specific needs of the problem. During the implementation process, if the problem needs to be based on whether the problem itself is complete information, whether it is a turn-based system, whether it is a real-time system, etc. Layer access, but not limited to the above, can be contextualized according to the fully informative and turn-based properties of the problem. For example, for a specific backgammon problem scenario, if the decision problem scenario has not yet met the environment implementation of the standard gym interface, but there is already "a party's five chess pieces are arranged in a row on a horizontal line, a vertical line or an oblique line to win" The implementation engine of the adjudication rule can make the adjudication rule that "a party's five chess pieces are arranged in a row on a horizontal line, a vertical line or an oblique line to win" is compatible with the standard gym interface.

在本发明的又一可选的实施例中,步骤121可以包括:In yet another optional embodiment of the present invention, step 121 may include:

步骤1211,若所述决策问题中带有的环境满足预设条件,则将所述决策问题中带有的环境确定为所述第一环境;Step 1211, if the environment included in the decision-making problem satisfies a preset condition, determine the environment included in the decision-making problem as the first environment;

步骤1212,若所述决策问题中带有的环境不满足预设条件,则将所述决策问题中带有的环境与通用环境进行兼容,将兼容后的环境确定为所述第一环境。Step 1212: If the environment included in the decision-making problem does not satisfy the preset condition, the environment included in the decision-making problem is compatible with the general environment, and the compatible environment is determined as the first environment.

该实施例中,所述预设条件可以设置为满足gym标准接口,但不仅限于如上所述,若所述决策问题中带有的环境满足gym标准接口,则直接将所述决策问题中带有的环境作为所述第一环境,再接入环境封装层实现环境与智能体交互的互通;若所述决策问题中带有的环境不满足gym标准接口,则首先将所述决策问题中带有的环境与gym标准接口进行兼容,得到兼容后的环境,再将其确定为所述第一环境后,再接入环境封装层实现环境与智能体交互的互通,也可以将兼容与封装同时进行,即直接将所述决策问题中带有的环境进行gym兼容环境层的封装,实现环境与智能体交互的互通。In this embodiment, the preset condition can be set to satisfy the gym standard interface, but it is not limited to the above. If the environment in the decision-making problem satisfies the gym standard interface, the decision problem is directly The environment is used as the first environment, and then the environment encapsulation layer is connected to realize the intercommunication between the environment and the agent; if the environment with the decision problem does not meet the gym standard interface, then the decision problem with the The environment is compatible with the gym standard interface, and the compatible environment is obtained, and then it is determined as the first environment, and then the environment encapsulation layer is connected to realize the interaction between the environment and the agent, or the compatibility and encapsulation can be carried out at the same time. , that is, the environment in the decision problem is directly encapsulated in the gym-compatible environment layer to realize the interaction between the environment and the agent.

在本发明的又一可选的实施例中,步骤122中,根据所述第三环境,得到所述第一环境,可以包括:In yet another optional embodiment of the present invention, in step 122, obtaining the first environment according to the third environment may include:

步骤1221,将所述第三环境与通用环境进行兼容,将兼容后的环境确定为所述第一环境。In step 1221, the third environment is compatible with the general environment, and the compatible environment is determined as the first environment.

该实施例中,首先将所述第三环境与gym标准接口进行兼容,得到兼容后的环境,再将其确定为所述第一环境后,再接入环境封装层实现环境与智能体交互的互通,也可以将兼容与封装同时进行,即直接将所述第三环境进行gym兼容环境层的封装,实现环境与智能体交互的互通。In this embodiment, the third environment is first compatible with the gym standard interface to obtain a compatible environment, and then it is determined as the first environment, and then the environment encapsulation layer is connected to realize the interaction between the environment and the agent. For intercommunication, compatibility and encapsulation can also be performed at the same time, that is, the third environment is directly encapsulated in the gym-compatible environment layer to realize the intercommunication between the environment and the agent.

在本发明的又一可选的实施例中,在步骤13之后,还可以包括:In yet another optional embodiment of the present invention, after step 13, it may further include:

步骤131,根据所述第二环境,得到所述第二环境的状态空间,所述状态空间是指在所述待训练神经网络运行时,所产生状态的空间。Step 131: Obtain a state space of the second environment according to the second environment, where the state space refers to the space of states generated when the neural network to be trained is running.

具体的,所述状态包括智能体本身的状态以及智能体所产生的动作,但不仅限于如上所述。Specifically, the state includes the state of the agent itself and the actions generated by the agent, but is not limited to the above.

该实施例中,在步骤13之后,步骤14之前,需要根据第二环境的类型和规模,确定状态空间,并确定待训练神经网络的结构。还需要根据是否接入知识规则,确定规则的逻辑结构,但不仅限于如上所述。在步骤14之后,还可以将待训练神经网络和规则接入推理层。In this embodiment, after step 13 and before step 14, it is necessary to determine the state space according to the type and scale of the second environment, and to determine the structure of the neural network to be trained. It is also necessary to determine the logical structure of the rule according to whether the knowledge rule is accessed, but it is not limited to the above. After step 14, the neural network and rules to be trained can also be connected to the inference layer.

在本发明的又一可选的实施例中,在步骤15之后,还可以包括:In yet another optional embodiment of the present invention, after step 15, it may further include:

步骤151,根据所述架构算法,判断所述待训练神经网络预生成数据的属性。Step 151: According to the architecture algorithm, determine the attributes of the pre-generated data of the neural network to be trained.

该实施例中,所述待训练神经网络预生成数据的属性包括所述待训练神经网络预生成数据中预生成动作的数量,但不仅限于如上所述。以预生成动作的数量为例,比如使用单智能体算法,所述待训练神经网络中训练一方输出的动作数量为1;使用多智能体算法,训练一方输出的动作数量大于1。In this embodiment, the attribute of the pre-generated data of the neural network to be trained includes the number of pre-generated actions in the pre-generated data of the neural network to be trained, but is not limited to the above. Taking the number of pre-generated actions as an example, for example, when using a single-agent algorithm, the number of actions output by the training party in the neural network to be trained is 1; when using a multi-agent algorithm, the number of actions output by the training party is greater than 1.

在本发明的又一可选的实施例中,在步骤151之后,还可以包括:In yet another optional embodiment of the present invention, after step 151, it may further include:

步骤152,根据所述架构算法,得到所述待训练神经网络预生成数据的属性的维度。Step 152: According to the architecture algorithm, obtain the dimension of the attribute of the pre-generated data of the neural network to be trained.

在本发明的又一可选的实施例中,步骤16可以包括:In yet another optional embodiment of the present invention, step 16 may include:

步骤161,将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,再根据适配后的所述第二环境的状态空间、适配后的所述待训练神经网络以及适配后的所述待训练神经网络预生成数据,生成轨迹数据属性。Step 161: Adapt the second environment, the neural network to be trained and the architecture algorithm, and then according to the state space of the second environment after adaptation, the neural network to be trained after adaptation and the pre-generated data of the neural network to be trained after adaptation to generate trajectory data attributes.

该实施例中,根据适配后的所述第二环境的状态空间、适配后的所述待训练神经网络以及适配后的所述待训练神经网络预生成数据,确定预生成数据各属性的维度,再对各个属性的(A,T,N,D)维度数据的构建,然后根据算法和状态空间确定动作的采样逻辑,接入智能体适配层,最后根据算法中的优化逻辑接入训练层,其中,A表示所述待训练神经网络中待训练智能体整体输出的动作数量维度,例如对于单智能体,A为1,对于多智能体,A大于1且A等于智能体数量;T表示所述待训练神经网络中待训练智能体与环境交互的步数维度,例如100步训练一次轨迹数据,则T为100;N表示同时并行环境交互的数量维度,根据需要可以等于或大于1,但不仅限于如上所述,例如在单机单环境下的训练,N可以为1,在单机多并行环境下的训练,N可以为并行环境的数量,在多机多并行环境下的训练,N可以为所有并行环境的数量;D表示具体某个轨迹属性数据的维度,例如轨迹中状态信息的维度为(2,3,4,),则针对状态信息的存储的D为(2,3,4,);在对各个属性的(A,T,N,D)维度数据的构建时,针对轨迹数据中的每个属性,比如环境状态信息、奖励信息、一局是否完成信息等,Buffer都采用一致的(A,T,N,D)四元组表示具体属性的数据维度,这样可以满足各种轨迹属性的通用表达需求,方便以通用的方式进行存储和读取。In this embodiment, each attribute of the pre-generated data is determined according to the state space of the second environment after adaptation, the neural network to be trained after adaptation, and the pre-generated data of the neural network to be trained after adaptation Then construct the (A, T, N, D) dimensional data of each attribute, and then determine the sampling logic of the action according to the algorithm and state space, access the agent adaptation layer, and finally connect according to the optimization logic in the algorithm. Enter the training layer, where A represents the dimension of the number of actions output by the agent to be trained in the neural network to be trained. For example, for a single agent, A is 1, and for a multi-agent, A is greater than 1 and A is equal to the number of agents ; T represents the step dimension of the interaction between the agent to be trained and the environment in the neural network to be trained, for example, if 100 steps are used to train the trajectory data once, then T is 100; N represents the quantity dimension of simultaneous parallel environment interaction, which can be equal to or Greater than 1, but not limited to the above, for example, for training in a single-machine and single-environment, N can be 1, for training in a single-machine multi-parallel environment, N can be the number of parallel environments, and for training in a multi-machine, multi-parallel environment , N can be the number of all parallel environments; D represents the dimension of a specific trajectory attribute data, for example, the dimension of the state information in the trajectory is (2, 3, 4, ), then the D stored for the state information is (2, 3, 4,); when constructing the (A, T, N, D) dimensional data of each attribute, for each attribute in the trajectory data, such as environmental status information, reward information, whether a round is completed, etc., Buffers all use a consistent (A, T, N, D) quadruple to represent the data dimension of specific attributes, which can meet the general expression requirements of various trajectory attributes and facilitate storage and reading in a general way.

在本发明的再一可选的实施例中,在步骤17之后,还可以包括:In yet another optional embodiment of the present invention, after step 17, it may further include:

步骤18,训练配置架构后的待训练神经网络。Step 18: Train the neural network to be trained after the configuration architecture.

该实施例中,在配置架构后的待训练神经网络在实际训练过程中,智能体适配层与环境进行直接交互,输入动作,得到观测数据,通过将观测数据传给推理层,得到策略的直接输出,通过对直接输出进行映射与动作分布的采样,得到下一时刻与环境交互的动作。交互过程中的轨迹数据存储在Buffer中,需要交互的步数一般为超参,需要根据问题进行动态调整。训练层利用批量的轨迹数据对深度神经网络模型的参数进行优化,并将优化结果更新到推理层中。至此,完成一步训练流程。不停的重复此训练流程,直至需要优化的评估指标收敛,完成整体训练流程。In this embodiment, in the actual training process of the neural network to be trained after the configuration architecture, the agent adaptation layer directly interacts with the environment, inputs actions, and obtains observation data. Direct output, by mapping the direct output and sampling the action distribution, the action of interacting with the environment at the next moment is obtained. The trajectory data in the interaction process is stored in the Buffer, and the number of steps to be interacted is generally a hyperparameter, which needs to be dynamically adjusted according to the problem. The training layer uses the batch trajectory data to optimize the parameters of the deep neural network model, and updates the optimization results to the inference layer. At this point, the one-step training process is completed. This training process is repeated continuously until the evaluation indicators that need to be optimized converge, and the overall training process is completed.

图2示出了本发明实施例提供的一种具体的配置神经网络架构的方法流程图,如图2所示,该方法通过以下四点实现了基于深度强化学习训练决策的通用训练架构,该架构可以提供标准的引擎/环境、模型和算法接入方法,降低了将深度强化学习用于各类智能决策问题的难度和工作量:FIG. 2 shows a flowchart of a specific method for configuring a neural network architecture provided by an embodiment of the present invention. As shown in FIG. 2 , the method implements a general training architecture based on deep reinforcement learning training decisions through the following four points. The architecture can provide standard engine/environment, model and algorithm access methods, reducing the difficulty and workload of applying deep reinforcement learning to various intelligent decision-making problems:

第一,建立通用的环境接入方法;First, establish a common environment access method;

第二,在只有裁决引擎而没有具体环境的场景下,提供一种通用的引擎接入方法;Second, in a scenario where there is only an adjudication engine but no specific environment, a general engine access method is provided;

第三,建立通用的算法接入方法;Third, establish a general algorithm access method;

第四,建立通用的策略抽象,满足各类深度神经网络模型以及人为规则模型的接入需要。Fourth, establish a general strategy abstraction to meet the access needs of various deep neural network models and artificial rule models.

具体的,针对引擎和环境,提供通用接入方法;针对各类深度强化算法,提供通用接入方法;针对各类深度神经网络模型或者人为规则模型,提供通用接入方法。Specifically, a general access method is provided for engines and environments; a general access method is provided for various deep enhancement algorithms; and a general access method is provided for various deep neural network models or artificial rule models.

以下以一种具体的场景来说明针对待训练神经网络是如何通过图2所示的具体的配置神经网络架构的方法,来实现接入开始到接入完成的整体流程:The following uses a specific scenario to illustrate how the neural network to be trained implements the overall process from the start of the access to the completion of the access through the specific method of configuring the neural network architecture shown in FIG. 2 :

“XX足球环境”是一个新颖的开源强化学习环境,足球环境以流行的足球视频游戏为模型,提供了基于物理学的3D足球模拟。智能体控制球员,学习如何传球,以及如何克服对手的防守以取得进球,是具有挑战性的决策问题。"XX Soccer Environment" is a novel open source reinforcement learning environment that provides physics-based 3D soccer simulations modeled on popular soccer video games. Agents controlling players, learning how to pass the ball, and how to overcome opponents' defenses to score goals are challenging decision-making problems.

基于图2所示的具体的配置神经网络架构的方法,可以按照以下标准步骤对足球游戏决策问题使用单智能体PPO算法进行求解:Based on the specific method of configuring the neural network architecture shown in Figure 2, the football game decision problem can be solved using the single-agent PPO algorithm according to the following standard steps:

第一步,环境接入:足球环境本身满足gym标准接口,不需要接入引擎层、问题定义层和Gym兼容环境层。将环境已有的step和reset方法接入到环境封装层提供的step和reset标准接口,step接口返回与环境一步交互后的奖励、状态等信息。根据动作空间和状态空间的定义,构造获得环境动作空间和状态空间的两个标准接口,包括get_observationspace(观测空间)和get_actionspace(行动空间)。The first step, environment access: The football environment itself satisfies the gym standard interface, and does not need to access the engine layer, the problem definition layer and the Gym compatible environment layer. Connect the existing step and reset methods of the environment to the step and reset standard interfaces provided by the environment encapsulation layer, and the step interface returns the reward, status and other information after interacting with the environment one step. According to the definition of action space and state space, construct two standard interfaces for obtaining environment action space and state space, including get_observationspace (observation space) and get_actionspace (action space).

第二步,模型接入:可以选用多种深度神经网络模型进行足球问题求解,例如考虑到足球比赛过程的连续性,基于长短期记忆网络( Long Short Term,LSTM)构造模型。基于构造的模型,定义初始化(init)、获得参数(get_parameter)、设置参数(set_parameter)、获得梯度(get_gradient)、设置梯度(set_gradient)和推理(inference)等推理层的标准接口,完成模型接入。The second step, model access: You can choose a variety of deep neural network models to solve the football problem. For example, considering the continuity of the football game process, the model is constructed based on the Long Short Term Memory network (Long Short Term, LSTM). Based on the constructed model, define the standard interfaces of the inference layer such as initialization (init), obtaining parameters (get_parameter), setting parameters (set_parameter), obtaining gradients (get_gradient), setting gradients (set_gradient), and inference (inference), and completing model access .

第三步,算法接入:算法使用轨迹数据进行训练,轨迹数据中一般有多个属性,每个属性的数据维度并不一样。根据选用算法,定义数据缓存器(Buffer)数据维度(A,T,N,D)中各个轨迹属性的数据维度。对于轨迹数据中状态和动作属性等强化学习通用的数据维度,可以从第一步中定义的获得环境动作空间和状态空间的两个标准接口中确定;对于算法需要的额外的属性数据,可以按算法需求定义,比如PPO算法中,需要价值(value)属性来表示critic网络的输出,假设输出的数据维度为(123,),则为value属性定义的数据维度为(123,)。由于选用了单智能体算法,Buffer数据维度(A,T,N,D)中动作输出个数为1。根据当前计算资源,可以确定Buffer数据维度(A,T,N,D)中并行运行的环境数。T为足球环境交互多少步进行一次训练,可以作为训练过程的超参进行调整。可以选用多种深度强化算法进行足球问题求解,比如考虑到训练效率和收敛稳定性,选用PPO作为训练算法。基于训练算法的核心逻辑,定义训练层的更新(_update)标准接口。基于动作空间,定义智能体适配层的动作采样(handle_sample)标准接口。至此,算法接入完成。The third step, algorithm access: The algorithm uses trajectory data for training. There are generally multiple attributes in the trajectory data, and the data dimensions of each attribute are different. According to the selected algorithm, define the data dimension of each track attribute in the data buffer (Buffer) data dimension (A, T, N, D). For the general data dimensions of reinforcement learning such as state and action attributes in the trajectory data, it can be determined from the two standard interfaces for obtaining the environment action space and state space defined in the first step; for the additional attribute data required by the algorithm, you can press The definition of algorithm requirements, such as the PPO algorithm, requires the value attribute to represent the output of the critic network. Assuming that the output data dimension is (123,), the data dimension defined for the value attribute is (123,). Since the single-agent algorithm is selected, the number of action outputs in the Buffer data dimension (A, T, N, D) is 1. According to the current computing resources, the number of parallel running environments in the Buffer data dimension (A, T, N, D) can be determined. T is how many steps to interact with the football environment to perform a training, which can be adjusted as a hyperparameter of the training process. A variety of deep reinforcement algorithms can be used to solve the football problem. For example, considering the training efficiency and convergence stability, PPO is selected as the training algorithm. Based on the core logic of the training algorithm, define the update (_update) standard interface of the training layer. Based on the action space, define the standard interface of the action sampling (handle_sample) of the agent adaptation layer. So far, the algorithm access is completed.

第四步,训练收敛:经过上述三步后,完成了对足球问题求解的训练架构接入。接入完成后即可开始训练流程,调整算法超参,直到得分收敛,完成对足球问题的求解。The fourth step, training convergence: After the above three steps, the access to the training framework for solving the football problem is completed. After the connection is completed, the training process can be started, and the algorithm hyperparameters can be adjusted until the score converges, and the solution to the football problem is completed.

图3示出了图2所示的一种具体的配置神经网络架构的方法中的架构抽象分层示意图,如图3所示,在上述实施例中,包括但不限于对应图2给出的实施例,将配置神经网络的框架进行了分层抽象,一共分为七层:Fig. 3 shows a schematic diagram of an abstract layered architecture in a specific method for configuring a neural network architecture shown in Fig. 2. As shown in Fig. 3, in the above-mentioned embodiment, including but not limited to corresponding to the In the embodiment, the framework for configuring the neural network is abstracted in layers, and is divided into seven layers in total:

第一层,训练层:训练层进行深度强化学习算法的实现,使用Buffer中存储的采样轨迹数据进行网络模型的优化,并将网络参数更新到推理层中;The first layer, training layer: the training layer implements the deep reinforcement learning algorithm, uses the sampled trajectory data stored in the Buffer to optimize the network model, and updates the network parameters to the inference layer;

第二层,推理层:推理层对深度神经网络模型和规则进行统一抽象,并负责执行策略推理输出;The second layer, inference layer: the inference layer abstracts the deep neural network model and rules uniformly, and is responsible for executing the policy inference output;

第三层,智能体适配层:适配层调用推理层进行推理,并将推理结果进行策略映射和动作采样,并与环境进行交互,交互的轨迹数据存储到Buffer中,智能体适配层与环境交互时,屏蔽深度强化学习算法中的单智能体和多智能体等概念,将某方的所有智能体作为整体,实现与环境交互;The third layer, the agent adaptation layer: the adaptation layer calls the inference layer for inference, and performs policy mapping and action sampling on the inference results, and interacts with the environment. The interactive trajectory data is stored in the Buffer, and the agent adaptation layer When interacting with the environment, shield the concepts of single-agent and multi-agent in the deep reinforcement learning algorithm, and take all the agents of a party as a whole to achieve interaction with the environment;

第四层,环境封装层:环境封装层实现与智能体适配层的通用标准交互;The fourth layer, the environment encapsulation layer: the environment encapsulation layer realizes the common standard interaction with the agent adaptation layer;

第五层,Gym兼容环境层:兼容广泛使用的OpenAI gym(一种用于研发和比较强化学习算法的工具包)标准接口;The fifth layer, Gym compatible environment layer: compatible with the widely used OpenAI gym (a toolkit for developing and comparing reinforcement learning algorithms) standard interface;

第六层,问题定义层:对要解决的问题进行定义,并对是否完全信息、回合制和实时制等问题进行界定和实现;The sixth layer, the problem definition layer: defines the problem to be solved, and defines and realizes the problems of complete information, turn-based and real-time system;

第七层,引擎层:负责接入模拟器引擎。The seventh layer, the engine layer: responsible for accessing the simulator engine.

上述抽象得出的七层与配置神经网络的架构接合可以实现以下几点:The combination of the seven layers abstracted above and the architecture of the configuration neural network can achieve the following:

一、将基于深度强化学习的智能决策训练流程标准化,对训练流程中的环境、算法和模型在统一的架构上进行抽象,具备扩展性。训练架构对训练过程采用单机或分布式,同步或异步没有强制约束,适应未来接入新的深度强化学习算法。1. Standardize the intelligent decision-making training process based on deep reinforcement learning, and abstract the environment, algorithms and models in the training process on a unified architecture, with scalability. The training architecture adopts single-machine or distributed, synchronous or asynchronous without mandatory constraints for the training process, and adapts to the future access to new deep reinforcement learning algorithms.

二、通过清晰和详细的分层,确定层与层之间的数据流,简化了应用深度强化学习解决智能决策问题的复杂度,提供一种“按图索骥” 的解决智能体训练的方案。2. Through clear and detailed stratification, the data flow between layers is determined, which simplifies the complexity of applying deep reinforcement learning to solve intelligent decision-making problems, and provides a solution for intelligent body training that "sees the picture according to the picture".

三、基于图2中神经网络的架构,通过对抽象构件的灵活组合,可以实现支撑现有各类主流的分布式强化学习框架架构。3. Based on the architecture of the neural network in Figure 2, through the flexible combination of abstract components, it is possible to realize the distributed reinforcement learning framework that supports various mainstreams.

四、将深度神经网络模型和规则模型统一抽象,在基于知识和经验的博弈问题中有更好的通用性。Fourth, the deep neural network model and the rule model are unified and abstracted, which has better generality in game problems based on knowledge and experience.

五、通过将深度强化学习算法中的单智能体、多智能体等不同概念作为整体统一抽象,简化了不同类型智能体与环境交互的复杂性。5. By abstracting different concepts such as single agent and multi-agent in the deep reinforcement learning algorithm as a whole, it simplifies the complexity of the interaction between different types of agents and the environment.

以下以三种具体的场景来说明本发明实施例提供的配置神经网络架构的方法的有益效果:The beneficial effects of the method for configuring a neural network architecture provided by the embodiments of the present invention are described below with three specific scenarios:

场景一,基于通用架构解决智能决策问题:当面对一个智能决策问题时,可以按照本发明实施例提供的配置神经网络架构的方法构建一套训练方案,使用深度强化学习算法,生成智能体模型。训练架构中对各个组件的定义和行为,相互之间的数据流进行了约束,使得训练方案“有法可依”,“按图索骥”即可完成。Scenario 1: Solving an intelligent decision-making problem based on a general architecture: When faced with an intelligent decision-making problem, a set of training programs can be constructed according to the method for configuring a neural network architecture provided in the embodiment of the present invention, and a deep reinforcement learning algorithm is used to generate an agent model . In the training architecture, the definition and behavior of each component and the data flow between each other are constrained, so that the training plan can be completed with "laws to follow" and "follow the charts".

场景二:使用已有深度强化学习算法解决新的智能决策问题:当面对新的智能决策问题时,使用本发明实施例提供的配置神经网络架构的方法,不需要接入新的深度强化学习算法,只需要使用原来已接入的深度强化学习算法即可实现对问题的初步验证。极大地减少了构建问题解决流程的工作量。Scenario 2: Use existing deep reinforcement learning algorithms to solve new intelligent decision-making problems: When faced with new intelligent decision-making problems, the method for configuring the neural network architecture provided by the embodiment of the present invention does not need to access new deep reinforcement learning The initial verification of the problem can be achieved only by using the deep reinforcement learning algorithm that has been accessed. Dramatically reduces the effort to build problem-solving processes.

场景三:使用已有的智能决策问题验证新的深度强化学习算法:当需要研究和验证新的算法时,使用本发明实施例提供的配置神经网络架构的方法,不需要接入新的环境,只需要使用原来已接入的智能决策环境,即可搭建新算法的验证流程。Scenario 3: Use an existing intelligent decision-making problem to verify a new deep reinforcement learning algorithm: When a new algorithm needs to be researched and verified, the method for configuring the neural network architecture provided by the embodiment of the present invention is used, and there is no need to access a new environment, The verification process of the new algorithm can be built only by using the previously connected intelligent decision-making environment.

在本发明的上述实施例中,将基于深度强化学习的智能体训练流程进行通盘抽象,从引擎/环境接入、算法接入、模型接入和buffer数据接入等整个训练流程提供通用的分层划分和接入方法,极大地降低了接入新环境、新算法的难度和工作量;将单智能体、多智能体等各类不同算法与环境的交互抽象为整体智能体与环境的交互,极大地扩展了训练架构在不同算法与环境交互中的适用性;推理层对深度神经网络和规则的统一模型抽象,可以适用于各类基于知识的博弈对抗智能决策问题中。In the above-mentioned embodiment of the present invention, the agent training process based on deep reinforcement learning is comprehensively abstracted, and the whole training process such as engine/environment access, algorithm access, model access, and buffer data access provides a general analysis The layer division and access method greatly reduces the difficulty and workload of accessing new environments and new algorithms; abstracts the interaction between different algorithms and environments such as single agent and multi-agent as the interaction between the overall agent and the environment , which greatly expands the applicability of the training architecture in the interaction of different algorithms and environments; the unified model abstraction of deep neural networks and rules in the inference layer can be applied to various knowledge-based game confrontation intelligent decision-making problems.

图4示出了本发明实施例提供的配置神经网络架构的装置40的结构示意图。如图4所示,该装置包括:FIG. 4 shows a schematic structural diagram of an apparatus 40 for configuring a neural network architecture provided by an embodiment of the present invention. As shown in Figure 4, the device includes:

第一接入模块41,用于接入待训练神经网络的决策问题;The first access module 41 is used to access the decision problem of the neural network to be trained;

处理模块42,用于根据所述决策问题,得到所述决策问题的第一环境;将所述决策问题与所述第一环境进行封装,得到封装后的第二环境;The processing module 42 is configured to obtain the first environment of the decision-making problem according to the decision-making problem; encapsulate the decision-making problem and the first environment to obtain an encapsulated second environment;

第二接入模块43,用于接入所述待训练神经网络;a second access module 43, configured to access the neural network to be trained;

第三接入模块44,用于根据所述第二环境以及所述待训练神经网络,接入架构算法;A third access module 44, configured to access the architecture algorithm according to the second environment and the neural network to be trained;

适配模块45,用于将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性;根据所述轨迹数据属性对所述待训练神经网络进行优化,得到配置架构后的待训练神经网络。The adaptation module 45 is configured to adapt the second environment, the neural network to be trained and the architecture algorithm to generate trajectory data attributes; optimize the neural network to be trained according to the trajectory data attributes, Get the neural network to be trained after configuring the architecture.

可选的,所述处理模块42还用于若所述决策问题中带有环境,则根据所述决策问题中带有的环境得到所述第一环境;Optionally, the processing module 42 is further configured to obtain the first environment according to the environment in the decision-making problem if there is an environment in the decision-making problem;

若所述决策问题中没有环境,则对所述决策问题定义新环境,得到定义后的第三环境,再根据所述第三环境,得到所述第一环境。If there is no environment in the decision-making problem, define a new environment for the decision-making problem, obtain a defined third environment, and then obtain the first environment according to the third environment.

可选的,所述处理模块42还用于若所述决策问题中带有的环境满足预设条件,则将所述决策问题中带有的环境确定为所述第一环境;Optionally, the processing module 42 is further configured to determine the environment included in the decision-making problem as the first environment if the environment included in the decision-making problem satisfies a preset condition;

若所述决策问题中带有的环境不满足预设条件,则将所述决策问题中带有的环境与通用环境进行兼容,将兼容后的环境确定为所述第一环境。If the environment included in the decision-making problem does not satisfy the preset condition, the environment included in the decision-making problem is compatible with the general environment, and the compatible environment is determined as the first environment.

可选的,所述处理模块42还用于将所述第三环境与通用环境进行兼容,将兼容后的环境确定为所述第一环境。Optionally, the processing module 42 is further configured to make the third environment compatible with the general environment, and determine the compatible environment as the first environment.

可选的,所述处理模块42还用于根据所述第二环境,得到所述第二环境的状态空间,所述状态空间是指在所述待训练神经网络运行时,所产生状态的空间。Optionally, the processing module 42 is further configured to obtain the state space of the second environment according to the second environment, where the state space refers to the space of states generated when the neural network to be trained is running. .

可选的,所述第三接入模块44还用于根据所述架构算法,判断所述待训练神经网络预生成数据的属性。Optionally, the third access module 44 is further configured to determine attributes of the pre-generated data of the neural network to be trained according to the architecture algorithm.

可选的,所述适配模块45还用于将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,再根据适配后的所述第二环境的状态空间、适配后的所述待训练神经网络以及适配后的所述待训练神经网络预生成数据,生成轨迹数据属性。Optionally, the adaptation module 45 is further configured to adapt the second environment, the neural network to be trained, and the architecture algorithm, and then according to the state space of the second environment after adaptation, The adapted neural network to be trained and the pre-generated data of the adapted neural network to be trained are used to generate trajectory data attributes.

应理解,上述对图1至图3示意的方法实施例的说明,仅是以可选示例的方式对本发明技术方案的阐述,对本发明涉及的配置神经网络架构的方法不构成限制。另一些实施方式中,本发明涉及的配置神经网络架构的方法的执行步骤和顺序,可以不同于上述实施例,本发明实施例对此不限制。It should be understood that the above descriptions of the method embodiments illustrated in FIGS. 1 to 3 are merely illustrative of the technical solutions of the present invention by way of optional examples, and do not limit the method for configuring the neural network architecture involved in the present invention. In other embodiments, the execution steps and sequence of the method for configuring a neural network architecture involved in the present invention may be different from the foregoing embodiments, which are not limited in this embodiment of the present invention.

需要说明的是,该实施例是与上述方法实施例对应的装置实施例,上述方法实施例中的所有实现方式均适用于该装置的实施例中,也能达到相同的技术效果。It should be noted that this embodiment is an apparatus embodiment corresponding to the foregoing method embodiment, and all implementation manners in the foregoing method embodiment are applicable to this apparatus embodiment, and can achieve the same technical effect.

本发明实施例提供了一种非易失性计算机存储介质,所述计算机存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的配置神经网络架构的方法。An embodiment of the present invention provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the method for configuring a neural network architecture in any of the foregoing method embodiments.

图5示出了本发明实施例提供的计算设备的结构示意图,本发明具体实施例并不对计算设备的具体实现做限定。FIG. 5 shows a schematic structural diagram of a computing device provided by an embodiment of the present invention. The specific embodiment of the present invention does not limit the specific implementation of the computing device.

如图5所示,该计算设备可以包括:处理器(processor)、通信接口(Communications Interface)、存储器(memory)、以及通信总线。As shown in FIG. 5 , the computing device may include: a processor (processor), a communications interface (Communications Interface), a memory (memory), and a communication bus.

其中:处理器、通信接口、以及存储器通过通信总线完成相互间的通信。通信接口,用于与其它设备比如客户端或其它服务器等的网元通信。处理器,用于执行程序,具体可以执行上述用于计算设备的配置神经网络架构的方法实施例中的相关步骤。Among them: the processor, the communication interface, and the memory communicate with each other through the communication bus. The communication interface is used to communicate with network elements of other devices such as clients or other servers. The processor is configured to execute a program, and specifically, may execute the relevant steps in the above-mentioned embodiments of the method for configuring a neural network architecture for a computing device.

具体地,程序可以包括程序代码,该程序代码包括计算机操作指令。Specifically, the program may include program code, the program code including computer operation instructions.

处理器可能是中央处理器CPU,或者是特定集成电路ASIC(Application SpecificIntegrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路。计算设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。The processor may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present invention. The one or more processors included in the computing device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.

存储器,用于存放程序。存储器可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。memory for storing programs. The memory may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.

程序具体可以用于使得处理器执行上述任意方法实施例中的配置神经网络架构的方法。程序中各步骤的具体实现可以参见上述配置神经网络架构的方法实施例中的相应步骤和单元中对应的描述,在此不赘述。所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的设备和模块的具体工作过程,可以参考前述方法实施例中的对应过程描述,在此不再赘述。The program can specifically be used to cause the processor to execute the method for configuring the neural network architecture in any of the above method embodiments. For the specific implementation of the steps in the program, reference may be made to the corresponding descriptions in the corresponding steps and units in the above-mentioned embodiments of the method for configuring a neural network architecture, which will not be repeated here. Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices and modules, reference may be made to the corresponding process descriptions in the foregoing method embodiments, which will not be repeated here.

在此提供的算法或显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明实施例也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明实施例的内容,并且上面对特定语言所做的描述是为了披露本发明实施例的最佳实施方式。The algorithms or displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general-purpose systems can also be used with teaching based on this. The structure required to construct such a system is apparent from the above description. Furthermore, embodiments of the present invention are not directed to any particular programming language. It is to be understood that various programming languages can be used to implement the contents of the embodiments of the invention described herein and that the above descriptions of specific languages are intended to disclose the best mode of carrying out the embodiments of the invention.

在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

类似地,应当理解,为了精简本发明实施例并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。Similarly, it is to be understood that in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in order to simplify the embodiments of the invention and to aid in the understanding of one or more of the various aspects of the invention. in a single embodiment, figure, or description thereof.

本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。Those skilled in the art will understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies.

此外,本领域的技术人员能够理解,尽管在此的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。Furthermore, it will be understood by those skilled in the art that although some of the embodiments herein include certain features, but not others, included in other embodiments, that combinations of features of the different embodiments are intended to be within the scope of the present invention And form different embodiments.

本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的一些或者全部部件的一些或者全部功能。本发明实施例还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明实施例的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。Various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components according to the embodiments of the present invention. Embodiments of the present invention can also be implemented as apparatus or apparatus programs (eg, computer programs and computer program products) for performing part or all of the methods described herein. Such a program implementing embodiments of the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.

应该注意的是上述实施例对本发明实施例进行说明而不是对本发明进行限制,位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明实施例可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。上述实施例中的步骤,除有特殊说明外,不应理解为对执行顺序的限定。It should be noted that the above-described embodiments illustrate rather than limit embodiments of the invention, and that the word "a" or "an" preceding an element does not preclude the presence of a plurality of such elements. Embodiments of the invention may be implemented by means of hardware comprising several different elements and by means of suitably programmed computers. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names. The steps in the above embodiments should not be construed as limitations on the execution order unless otherwise specified.

Claims (10)

1.一种配置神经网络架构的方法,其特征在于,所述方法包括:1. A method for configuring a neural network architecture, wherein the method comprises: 接入待训练神经网络的决策问题;Access the decision-making problem of the neural network to be trained; 根据所述决策问题,得到所述决策问题的第一环境;According to the decision-making problem, obtain the first environment of the decision-making problem; 将所述决策问题与所述第一环境进行封装,得到封装后的第二环境;Encapsulating the decision problem and the first environment to obtain an encapsulated second environment; 接入所述待训练神经网络;access the neural network to be trained; 根据所述第二环境以及所述待训练神经网络,接入架构算法;Access the architecture algorithm according to the second environment and the neural network to be trained; 将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性;Adapting the second environment, the neural network to be trained, and the architecture algorithm to generate trajectory data attributes; 根据所述轨迹数据属性对所述待训练神经网络进行优化,得到配置架构后的待训练神经网络。The neural network to be trained is optimized according to the attributes of the trajectory data to obtain the neural network to be trained after the configuration architecture. 2.根据权利要求1所述的配置神经网络架构的方法,其特征在于,根据所述决策问题,得到所述决策问题的第一环境,包括:2. The method for configuring a neural network architecture according to claim 1, wherein, according to the decision-making problem, obtaining the first environment of the decision-making problem, comprising: 若所述决策问题中带有环境,则根据所述决策问题中带有的环境得到所述第一环境;If there is an environment in the decision-making problem, obtaining the first environment according to the environment in the decision-making problem; 若所述决策问题中没有环境,则对所述决策问题定义新环境,得到定义后的第三环境,再根据所述第三环境,得到所述第一环境。If there is no environment in the decision-making problem, define a new environment for the decision-making problem, obtain a defined third environment, and then obtain the first environment according to the third environment. 3.根据权利要求2所述的配置神经网络架构的方法,其特征在于,若所述决策问题中带有环境,则根据所述决策问题中带有的环境得到所述第一环境,包括:3. The method for configuring a neural network architecture according to claim 2, wherein if there is an environment in the decision-making problem, the first environment is obtained according to the environment in the decision-making problem, comprising: 若所述决策问题中带有的环境满足预设条件,则将所述决策问题中带有的环境确定为所述第一环境;If the environment included in the decision-making problem satisfies the preset condition, determining the environment included in the decision-making problem as the first environment; 若所述决策问题中带有的环境不满足预设条件,则将所述决策问题中带有的环境与通用环境进行兼容,将兼容后的环境确定为所述第一环境。If the environment included in the decision-making problem does not satisfy the preset condition, the environment included in the decision-making problem is compatible with the general environment, and the compatible environment is determined as the first environment. 4.根据权利要求2所述的配置神经网络架构的方法,其特征在于,根据所述第三环境,得到所述第一环境,包括:4. The method for configuring a neural network architecture according to claim 2, wherein obtaining the first environment according to the third environment comprises: 将所述第三环境与通用环境进行兼容,将兼容后的环境确定为所述第一环境。The third environment is compatible with the general environment, and the compatible environment is determined as the first environment. 5.根据权利要求1所述的配置神经网络架构的方法,其特征在于,在得到封装后的第二环境之后,还包括:5. The method for configuring a neural network architecture according to claim 1, wherein after obtaining the encapsulated second environment, further comprising: 根据所述第二环境,得到所述第二环境的状态空间,所述状态空间是指在所述待训练神经网络运行时,所产生状态的空间。According to the second environment, the state space of the second environment is obtained, and the state space refers to the space of states generated when the neural network to be trained is running. 6.根据权利要求5所述的配置神经网络架构的方法,其特征在于,在接入架构算法之后,还包括:6. The method for configuring a neural network architecture according to claim 5, characterized in that, after accessing the architecture algorithm, further comprising: 根据所述架构算法,判断所述待训练神经网络预生成数据的属性。According to the architecture algorithm, the attributes of the pre-generated data of the neural network to be trained are determined. 7.根据权利要求6所述的配置神经网络架构的方法,其特征在于,将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性,包括:7. The method for configuring a neural network architecture according to claim 6, wherein the second environment, the neural network to be trained and the architecture algorithm are adapted to generate trajectory data attributes, comprising: 将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,再根据适配后的所述第二环境的状态空间、适配后的所述待训练神经网络以及适配后的所述待训练神经网络预生成数据,生成轨迹数据属性。Adapting the second environment, the neural network to be trained and the architecture algorithm, and then according to the state space of the second environment after adaptation, the neural network to be trained after adaptation, and the adaptation The latter neural network to be trained pre-generates data to generate trajectory data attributes. 8.一种配置神经网络架构的装置,其特征在于,所述装置包括:8. A device for configuring a neural network architecture, wherein the device comprises: 第一接入模块,用于接入待训练神经网络的决策问题;The first access module is used to access the decision problem of the neural network to be trained; 处理模块,用于根据所述决策问题,得到所述决策问题的第一环境;将所述决策问题与所述第一环境进行封装,得到封装后的第二环境;a processing module, configured to obtain a first environment of the decision-making problem according to the decision-making problem; encapsulate the decision-making problem and the first environment to obtain an encapsulated second environment; 第二接入模块,用于接入所述待训练神经网络;a second access module, configured to access the neural network to be trained; 第三接入模块,用于根据所述第二环境以及所述待训练神经网络,接入架构算法;a third access module, configured to access the architecture algorithm according to the second environment and the neural network to be trained; 适配模块,用于将所述第二环境、所述待训练神经网络以及所述架构算法进行适配,生成轨迹数据属性;根据所述轨迹数据属性对所述待训练神经网络进行优化,得到配置架构后的待训练神经网络。an adaptation module, configured to adapt the second environment, the neural network to be trained and the architecture algorithm to generate trajectory data attributes; optimize the neural network to be trained according to the trajectory data attributes to obtain The neural network to be trained after configuring the architecture. 9.一种计算设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;9. A computing device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other through the communication bus; 所述存储器用于存储至少一可执行指令,所述至少一可执行指令运行时使所述处理器执行如权利要求1-7中任一项所述的配置神经网络架构的方法。The memory is used for storing at least one executable instruction, and when the at least one executable instruction is executed, the processor executes the method for configuring a neural network architecture according to any one of claims 1-7. 10.一种计算机存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令运行时使计算设备执行如权利要求1-7中任一项所述的配置神经网络架构的方法。10. A computer storage medium, wherein at least one executable instruction is stored in the storage medium, and when the executable instruction is executed, the computing device executes the method of configuring the neural network architecture according to any one of claims 1-7. method.
CN202210372773.7A 2022-04-11 2022-04-11 A method, apparatus and device for configuring a neural network architecture Pending CN114861870A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210372773.7A CN114861870A (en) 2022-04-11 2022-04-11 A method, apparatus and device for configuring a neural network architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210372773.7A CN114861870A (en) 2022-04-11 2022-04-11 A method, apparatus and device for configuring a neural network architecture

Publications (1)

Publication Number Publication Date
CN114861870A true CN114861870A (en) 2022-08-05

Family

ID=82628942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210372773.7A Pending CN114861870A (en) 2022-04-11 2022-04-11 A method, apparatus and device for configuring a neural network architecture

Country Status (1)

Country Link
CN (1) CN114861870A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114511A1 (en) * 2017-10-16 2019-04-18 Illumina, Inc. Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks
CN111782068A (en) * 2019-04-04 2020-10-16 阿里巴巴集团控股有限公司 Method, device and system for generating mouse track and data processing method
CN113095463A (en) * 2021-03-31 2021-07-09 南开大学 Robot confrontation method based on evolution reinforcement learning
CN113221444A (en) * 2021-04-20 2021-08-06 中国电子科技集团公司第五十二研究所 Behavior simulation training method for air intelligent game
CN113633994A (en) * 2021-07-16 2021-11-12 中国科学院自动化研究所 Man-machine intelligent game system
CN114053712A (en) * 2022-01-17 2022-02-18 中国科学院自动化研究所 Action generation method, device and device for virtual object

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114511A1 (en) * 2017-10-16 2019-04-18 Illumina, Inc. Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks
CN111782068A (en) * 2019-04-04 2020-10-16 阿里巴巴集团控股有限公司 Method, device and system for generating mouse track and data processing method
CN113095463A (en) * 2021-03-31 2021-07-09 南开大学 Robot confrontation method based on evolution reinforcement learning
CN113221444A (en) * 2021-04-20 2021-08-06 中国电子科技集团公司第五十二研究所 Behavior simulation training method for air intelligent game
CN113633994A (en) * 2021-07-16 2021-11-12 中国科学院自动化研究所 Man-machine intelligent game system
CN114053712A (en) * 2022-01-17 2022-02-18 中国科学院自动化研究所 Action generation method, device and device for virtual object

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐政等: "《海鹰智库丛书——人工智能篇》", 31 January 2021, 北京理工大学出版社, pages: 34 *

Similar Documents

Publication Publication Date Title
Du et al. A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications
CN111427549B (en) An artificial intelligence reinforcement learning service platform
CN103201719B (en) Device emulation in a virtualized computing environment
CN111282272B (en) Information processing method, computer readable medium and electronic device
CN115292044B (en) Data processing method, device, electronic equipment and storage medium
CN112488826A (en) Method and device for optimizing bank risk pricing based on deep reinforcement learning
US20240338570A1 (en) Deep reinforcement learning intelligent decision-making platform based on unified artificial intelligence framework
CN113887708A (en) Multi-agent learning method based on mean field, storage medium and electronic device
CN114861826A (en) Large-scale reinforcement learning training framework system based on distributed design
WO2024007919A1 (en) Lbm-based quantum flow simulation method and apparatus, medium, and device
CN118171554A (en) Automatic driving decision model training method, device and storage medium
CN116362349A (en) Reinforced learning method and device based on environment dynamic model
Kapukotuwa et al. MultiROS: ROS-based robot simulation environment for concurrent deep reinforcement learning
EP3805995A1 (en) Method of and apparatus for processing data of a deep neural network
Badica et al. An approach of temporal difference learning using agent-oriented programming
CN114861870A (en) A method, apparatus and device for configuring a neural network architecture
CN114327916B (en) A training method, device and equipment for a resource allocation system
CN111443806B (en) Interactive task control method and device, electronic equipment and storage medium
Araùjo et al. URNAI: A Multi-Game Toolkit for Experimenting Deep Reinforcement Learning Algorithms
CN113392952A (en) Dynamic dominance function modeling method and device, storage medium and electronic equipment
CN114881239A (en) Method and apparatus for constructing quantum generator, medium, and electronic apparatus
CN114037049A (en) Multi-agent reinforcement learning method and related device based on value function reliability
Leite et al. Evolving characters in role playing games
Caldeira et al. Torcs training interface: An auxiliary api for developing torcs drivers
Zhu et al. Deep neuro-evolution: Evolving neural network for character locomotion controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination