CN116126995A - Index information generation method and device and computer readable storage medium - Google Patents
Index information generation method and device and computer readable storage medium Download PDFInfo
- Publication number
- CN116126995A CN116126995A CN202211488395.5A CN202211488395A CN116126995A CN 116126995 A CN116126995 A CN 116126995A CN 202211488395 A CN202211488395 A CN 202211488395A CN 116126995 A CN116126995 A CN 116126995A
- Authority
- CN
- China
- Prior art keywords
- data
- machine learning
- learning model
- file
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/219—Managing data history or versioning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本说明书实施方式提供了一种索引信息的生成方法、装置及计算机可读存储介质,其包括:在每个机器学习模型完成训练后,获取训练产生的第一类数据和第二类数据;将每个机器学习模型的第一类数据均保存至所述模型训练平台的模型文件数据集,第二类数据均保存至所述模型训练平台的文本文件数据集;根据模型文件数据集和文本文件数据集,生成文件索引信息;所述文件索引信息包括对应机器学习模型的第一类数据和第二类数据的访问入口信息。该方法通过使用模型训练平台来对多个模型进行训练,在训练过程中,不断获取训练产生的数据再进行统一存储,解决了团队成员无法便捷的查看本团队中其他成员实验结果的技术问题,提高了团队成员查看试验结果的效率。
Embodiments of this specification provide a method, device, and computer-readable storage medium for generating index information, which includes: after each machine learning model is trained, acquiring the first type of data and the second type of data generated by the training; The first type of data of each machine learning model is saved to the model file data set of the model training platform, and the second type of data is saved to the text file data set of the model training platform; according to the model file data set and the text file The data set generates file index information; the file index information includes access entry information of the first type of data and the second type of data corresponding to the machine learning model. This method uses the model training platform to train multiple models. During the training process, the data generated by the training is continuously obtained and stored in a unified manner, which solves the technical problem that team members cannot conveniently view the experimental results of other members of the team. Improved efficiency for team members viewing test results.
Description
技术领域technical field
本公开涉及机器学习的技术领域,特别是涉及一种索引信息的生成方法、装置及计算机可读存储介质。The present disclosure relates to the technical field of machine learning, and in particular to a method, device and computer-readable storage medium for generating index information.
背景技术Background technique
近年来机器学习和大数据正变得越来越流行,它们对社会的影响也在不断扩大。许多行业越来越依赖机器学习算法和人工智能模型来做出每天影响企业和个人的关键决策。在一个完整的机器学习模型实验生命周期中,存在许多的机器学习制品,比如数据集、模型训练的代码、模型实验指标评估数据、模型文件等。Machine learning and big data are becoming more popular in recent years, and their impact on society is expanding. Many industries increasingly rely on machine learning algorithms and artificial intelligence models to make critical decisions that affect businesses and individuals every day. In a complete machine learning model experiment life cycle, there are many machine learning artifacts, such as datasets, model training code, model experiment index evaluation data, model files, etc.
在现有技术当中,对于一个团队化的多机器学习模型的开发任务,其中的团队成员无法便捷的查看本团队中其他成员的实验结果。In the existing technology, for a team-based multi-machine learning model development task, the team members cannot conveniently view the experimental results of other members of the team.
发明内容Contents of the invention
有鉴于此,本说明书多个实施方式致力于提供一种索引信息的生成方法,以一定程度上解决在一个团队化的多机器学习模型的开发任务中,其中的团队成员无法便捷的查看本团队中其他成员的实验结果的技术问题。In view of this, multiple implementations of this specification are dedicated to providing a method for generating index information to solve to a certain extent in the development task of a team-based multi-machine learning model, where team members cannot conveniently view the technical issues with the experimental results of other members of the
本说明书中多个实施方式提供一种索引信息的生成方法,所述方法应用于模型训练平台,所述方法包括:获取机器学习模型训练产生的第一类数据和第二类数据;其中,第一类数据至少包括机器学习模型的模型文件数据,所述第二类数据至少包括机器学习模型的代码文件数据和模型文件的元文件数据。将每个机器学习模型的第一类数据均保存至所述模型训练平台的模型文件数据集,第二类数据均保存至所述模型训练平台的文本文件数据集。根据所述模型文件数据集和所述文本文件数据集,生成文件索引信息;其中,所述文件索引信息包括所述机器学习模型的第一类数据和第二类数据的访问入口信息。Multiple implementations in this specification provide a method for generating index information, the method is applied to a model training platform, and the method includes: acquiring the first type of data and the second type of data generated by machine learning model training; wherein, the first One type of data includes at least model file data of the machine learning model, and the second type of data includes at least code file data of the machine learning model and metadata of the model file. The first type of data of each machine learning model is saved to the model file data set of the model training platform, and the second type of data is saved to the text file data set of the model training platform. File index information is generated according to the model file data set and the text file data set; wherein, the file index information includes access entry information of the first type of data and the second type of data of the machine learning model.
本说明书的一个实施方式提供一种索引信息的显示方法,所述方法包括:接收模型训练平台发送的索引信息;其中,所述索引信息采用前述的索引信息的生成方法得到。根据所述索引信息形成索引页面;所述索引页面中包括对应机器学习模型的第一类数据和第二类数据的访问入口信息或者与所述访问入口信息相绑定的文本标识。An embodiment of the present specification provides a method for displaying index information, the method comprising: receiving index information sent by a model training platform; wherein, the index information is obtained by using the aforementioned method for generating index information. An index page is formed according to the index information; the index page includes access entry information corresponding to the first type of data and the second type of data of the machine learning model or a text identifier bound to the access entry information.
本说明书的一个实施方式提供一种索引信息的生成装置,所述装置应用于模型训练平台,所述模型训练平台用于多个机器学习模型的训练;所述装置包括:获取单元,其用于获取机器学习模型训练产生的第一类数据和第二类数据;其中,第一类数据至少包括机器学习模型的模型文件数据,所述第二类数据至少包括机器学习模型的代码文件数据和模型文件的元文件数据。存储单元,其用于将每个机器学习模型的第一类数据均存入所述模型训练平台的模型文件数据集,第二类数据均存入所述模型训练平台的文本文件数据集。生成单元,其用于根据所述模型文件数据集和所述文本文件数据集,生成文件索引信息;其中,所述文件索引信息包括对应机器学习模型的第一类数据和第二类数据的访问入口信息。One embodiment of this specification provides a device for generating index information, the device is applied to a model training platform, and the model training platform is used for the training of multiple machine learning models; the device includes: an acquisition unit for Obtain the first type of data and the second type of data generated by machine learning model training; wherein, the first type of data includes at least the model file data of the machine learning model, and the second type of data includes at least the code file data and model of the machine learning model Metafile data for the file. The storage unit is used to store the first type of data of each machine learning model into the model file data set of the model training platform, and store the second type of data into the text file data set of the model training platform. A generating unit, configured to generate file index information according to the model file data set and the text file data set; wherein, the file index information includes access to the first type of data and the second type of data corresponding to the machine learning model Entry information.
本说明书的一个实施方式提供一种电子设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述的方法。An embodiment of the present specification provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor implements the above method when executing the computer program.
本说明书的一个实施方式提供一种计算机可读存储介质,所述可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现上述的方法。One embodiment of the present specification provides a computer-readable storage medium, where a computer program is stored in the readable storage medium, and the above-mentioned method is implemented when the computer program is executed by a processor.
本说明书提供的多个实施方式,在一个团队化的多机器学习模型的开发任务中,通过使用模型训练平台来对多个机器学习模型进行训练,并且在训练完成之后获取相应的第一类数据和第二类数据,再进行统一的存储形成模型文件数据集和文本文件数据集,进而构建了文件的索引信息,如此团队人员便可以通过索引信息便捷查看其他成员的实验结果,实现了团队成员可以对实验结果的快速查看,提高了查看和比对的效率。In the multiple implementations provided in this specification, in a team-based multi-machine learning model development task, multiple machine learning models are trained by using the model training platform, and the corresponding first type of data is obtained after the training is completed And the second type of data, and then store them in a unified manner to form a model file dataset and a text file dataset, and then build the index information of the file, so that team members can easily view the experimental results of other members through the index information, realizing the realization of team members. It can quickly view the experimental results, which improves the efficiency of viewing and comparison.
附图说明Description of drawings
图1为本说明书的一个实施方式提供的一种索引信息的生成方法的应用环境示意图。FIG. 1 is a schematic diagram of an application environment of a method for generating index information provided by an embodiment of this specification.
图2为本说明书的一个实施方式提供的一种索引信息的生成方法的流程示意图。FIG. 2 is a schematic flow chart of a method for generating index information provided by an embodiment of this specification.
图3为本说明书的一个实施方式提供的一种模型训练平台的框图。Fig. 3 is a block diagram of a model training platform provided by an embodiment of this specification.
图4为本说明书的一个实施方式提供的一种模型训练平台的工作过程逻辑示意图。FIG. 4 is a logical schematic diagram of a working process of a model training platform provided by an embodiment of this specification.
图5为本说明书的一个实施方式提供的一种模型训练平台的工作过程的流程示意图。FIG. 5 is a schematic flowchart of a working process of a model training platform provided by an embodiment of the present specification.
图6为本说明书的一个实施方式提供的一种索引信息的显示方法的流程示意图。FIG. 6 is a schematic flowchart of a method for displaying index information provided by an embodiment of the present specification.
图7为本说明书的一个实施方式提供的一种索引信息生成装置的框图。Fig. 7 is a block diagram of an index information generation device provided by an embodiment of this specification.
图8为本说明书的一个实施方式提供的一种电子设备的框图。Fig. 8 is a block diagram of an electronic device provided by an embodiment of this specification.
具体实施方式Detailed ways
在相关技术中,机器学习是通过一些算法来解析数据,并且从中学习,从而获得机器学习模型。可以使用获得的机器学习模型对一些数据进行推理,以及预测,从而实现一些具体的任务,例如分类任务。机器学习模型可以是执行训练后得到的一种神经网络模型、线性网络模型、深度学习模型、支持向量机或者其他类型的机器学习模型。该模型是采用一些算法,从数据中学习得到,可以实现特定映射的函数。该机器学习模型文件能够识别特定类型的模式。该机器学习模型文件一般包括描述模型结构(例如卷积神经网络的结构)的文件,例如meta文件;还包括描述模型参数(例如各层之间连接权重参数)的文件,例如ckpt文件。In related technologies, machine learning uses some algorithms to analyze data and learn from it to obtain a machine learning model. The obtained machine learning model can be used to reason and predict some data, so as to achieve some specific tasks, such as classification tasks. The machine learning model may be a neural network model, a linear network model, a deep learning model, a support vector machine or other types of machine learning models obtained after training. The model is a function that uses some algorithms and learns from data to achieve a specific mapping. This machine learning model file is capable of recognizing certain types of patterns. The machine learning model file generally includes a file describing the model structure (such as the structure of a convolutional neural network), such as a meta file; it also includes a file describing model parameters (such as connection weight parameters between layers), such as a ckpt file.
在一个完整的机器学习的任务当中,通常会涉及到多种类的数据,例如,机器学习模型训练用数据、模型训练的代码文件数据、模型的评价指标文件数据和模型文件数据。因此,在一个任务中对于各个数据的追踪和管理非常重要。对于一个团队化的多机器学习模型的开发任务中,在现有的场景下,可能是有的团队成员在本地进行开发,训练完成之后将模型文件数据和模型的评价指标文件数据直接存在本地或者存储在文件服务器中。也有的团队成员可能在模型开发服务器中进行开发,训练完成之后将模型文件数据和模型的评价指标文件数据存在云中,甚至有的团队成员就不管理上述数据。因此,整个团队成员不便于互相查看对方的实验结果,团队中某一成员无法便捷的查看本团队中其他成员的实验结果。也不便于团队成员之间共享实验结果。In a complete machine learning task, various types of data are usually involved, such as data for machine learning model training, code file data for model training, model evaluation index file data, and model file data. Therefore, it is very important to track and manage each data in a task. For a team-based multi-machine learning model development task, in the existing scenario, some team members may develop locally. After the training is completed, the model file data and model evaluation index file data are directly stored locally or stored on a file server. Some team members may also develop in the model development server. After the training is completed, the model file data and model evaluation index file data are stored in the cloud, and some team members even do not manage the above data. Therefore, it is inconvenient for members of the entire team to view each other's experimental results, and a member of the team cannot easily view the experimental results of other members of the team. It is also not convenient to share experimental results among team members.
综上所述,有必要提供一种索引信息的生成方法,通过使用模型训练平台来对多个机器学习模型进行训练,并且在训练完成之后获取相应的第一类数据和第二类数据,再进行统一的存储,解决了团队成员无法便捷的查看本团队中其他成员的实验结果的技术问题,实现了团队成员对实验结果的快速查看,提高了查看和比对的效率。To sum up, it is necessary to provide a method for generating index information, by using a model training platform to train multiple machine learning models, and after the training is completed, obtain the corresponding first-type data and second-type data, and then Unified storage solves the technical problem that team members cannot easily view the experimental results of other members of the team, realizes the rapid viewing of experimental results by team members, and improves the efficiency of viewing and comparison.
如图1所示,本说明书实施方式提供一种索引信息的生成系统,该索引信息的生成系统可以包括终端和服务器。服务器可以是具有一定运算处理能力的电子设备。比如服务器可以为分布式系统的服务器,可以是具有多个处理器、存储器、网络通信模块等协同运作的系统。服务器也可以是云服务器,或者是带人工智能技术的智能云计算服务器或智能云主机。服务器也可以是为若干服务器形成的服务器集群。或者,随着科学技术的发展,服务器还可以是能够实现说明书实施方式相应功能的新的技术手段。例如,可以是基于量子计算实现的新形态的“服务器”。As shown in FIG. 1 , the embodiment of this specification provides a system for generating index information, and the system for generating index information may include a terminal and a server. The server may be an electronic device with certain computing and processing capabilities. For example, the server may be a server of a distributed system, or a system having multiple processors, memories, network communication modules, etc. that operate cooperatively. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology. The server may also be a server cluster formed for several servers. Or, with the development of science and technology, the server may also be a new technical means capable of realizing the corresponding functions of the embodiments described in the description. For example, it can be a new form of "server" based on quantum computing.
在本说明书实施方式中,终端可以是具有网络访问能力的电子设备。具体的,例如,终端可以是台式电脑、平板电脑、笔记本电脑、智能手机、数字助理、导购终端、电视机、等。或者,终端也可以为能够运行于所述电子设备中的软件。In the implementation manner of this specification, the terminal may be an electronic device with network access capability. Specifically, for example, the terminal may be a desktop computer, a tablet computer, a notebook computer, a smart phone, a digital assistant, a shopping guide terminal, a TV, and the like. Alternatively, the terminal may also be software that can run on the electronic device.
网络可以是任何类型的网络,其可以使用多种可用协议中的任何一种(包括但不限于TCP/IP、SNA、IPX等)来支持数据通信。一个或多个网络可以是局域网(LAN)、基于以太网的网络、令牌环、广域网(WAN)、因特网、虚拟网络、虚拟专用网络(VPN)、内部网、外部网、公共交换电话网(PSTN)、红外网络、无线网络(例如蓝牙、WIFI)和/或这些和/或其他网络的任意组合。The network can be any type of network that can support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, and the like. The network or networks can be a Local Area Network (LAN), Ethernet-based network, Token Ring, Wide Area Network (WAN), Internet, Virtual Network, Virtual Private Network (VPN), Intranet, Extranet, Public Switched Telephone Network (PSTN) PSTN), infrared networks, wireless networks (eg Bluetooth, WIFI) and/or any combination of these and/or other networks.
索引信息的生成系统还可以包括一个或多个数据库。例如,由服务器使用的数据库可以在服务器本地,或者可以远离服务器且可以经由基于网络或专用的连接与服务器通信。数据库可以是不同的类型。在某些实施例中,由服务器使用的数据库可以为关系数据库。这些数据库中的一个或多个可以响应于命令而存储、更新和检索到数据库以及来自数据库的数据。The system for generating index information may also include one or more databases. For example, a database used by a server may be local to the server, or may be remote from the server and may communicate with the server via a network-based or dedicated connection. Databases can be of different types. In some embodiments, the database used by the server may be a relational database. One or more of these databases may store, update and retrieve the database and data from the database in response to commands.
如图2所示,本说明书的一个实施方式提供一种索引信息的生成方法。所述索引信息的生成方法应用于模型训练平台,所述模型训练平台用于多个机器学习模型的训练。所述索引信息的生成方法可以包括以下步骤。As shown in FIG. 2 , an embodiment of this specification provides a method for generating index information. The method for generating index information is applied to a model training platform, and the model training platform is used for training multiple machine learning models. The method for generating the index information may include the following steps.
步骤S101:在每个机器学习模型完成训练后,获取训练产生的第一类数据和第二类数据;其中,第一类数据至少包括机器学习模型的模型文件数据,所述第二类数据至少包括机器学习模型的代码文件数据和模型文件的元文件数据。Step S101: After each machine learning model is trained, obtain the first type of data and the second type of data generated by the training; wherein, the first type of data includes at least the model file data of the machine learning model, and the second type of data includes at least Including the code file data of the machine learning model and the metadata file data of the model file.
在一些情况下,对于一个团队化的多机器学习模型的开发任务,可能需要进行多个不同种类的模型的开发,对于同一种类的模型也可能会存在不同的版本。在具体某一个模型的开发过程中会涉及到许多种类的数据和对应文件。具体的,在执行训练之前需要准备用于训练的训练用数据。该训练用数据可以是数据集文件的形式,并且需要对数据集文件进行不同种类的预处理,进而形成不同版本的数据集文件。In some cases, for a team-based multi-machine learning model development task, it may be necessary to develop multiple different types of models, and there may also be different versions of the same type of model. Many types of data and corresponding files are involved in the development process of a specific model. Specifically, training data for training needs to be prepared before training is performed. The training data may be in the form of a data set file, and different types of preprocessing need to be performed on the data set file to form different versions of the data set file.
在进行了多次迭代之后损失函数收敛,得到了机器学习模型的模型文件数据和机器学习模型评价指标文件数据。对该机器学习模型评价指标文件数据进行解析发现本次得到的机器学习模型其对应的评价指标与期望的评价指标差异很大,那么可能就需要重新训练一次或者多次。在重新训练之前,可能需要对机器学习模型的超参数、模型的网络参数和模型的结构进行修改。因此,会得到多个不同版本的模型代码文件数据。在重新训练之后,同样会得到对应的机器学习模型的模型文件数据和机器学习模型评价指标文件数据。After multiple iterations, the loss function converges, and the model file data of the machine learning model and the evaluation index file data of the machine learning model are obtained. After analyzing the data of the machine learning model evaluation index file, it is found that the corresponding evaluation index of the machine learning model obtained this time is very different from the expected evaluation index, so it may need to be retrained one or more times. The hyperparameters of the machine learning model, the network parameters of the model, and the structure of the model may need to be modified before retraining. Therefore, multiple different versions of model code file data will be obtained. After retraining, the model file data and machine learning model evaluation index file data of the corresponding machine learning model will also be obtained.
从上述的过程可以看出,一个完整的机器学习模型训练的生命过程中,会涉及到多种类的数据,这些数据需要进行管理,并且同一种类的数据也会具有不同的版本。因此需要一套统一的数据版本管理方法。From the above process, it can be seen that in the life process of a complete machine learning model training, various types of data will be involved. These data need to be managed, and the same type of data will also have different versions. Therefore, a unified data version management method is needed.
在本实施方式中,可以按照数据的大小,将上述的数据分类成第一类数据和第二类数据。第一类数据除了可以包括机器学习模型的模型文件数据。在一些实施方式中,采用不同的机器学习框架下进行训练得到的机器学习模型的模型文件数据不同。例如,采用keras框架进行训练得到的机器学习模型的模型文件数据其中主要包括h5格式的文件数据。该h5格式的文件数据包括模型的结构、模型的权重、训练配置和优化器的状态。采用Tensorflow框架进行训练得到的机器学习模型的模型文件数据主要包meta格式的文件数据和ckpt格式的文件数据。该meta格式的文件数据主要用于保存模型图结构。该ckpt格式的文件数据主要用于保存网络权重参数等变量。In this embodiment, the above-mentioned data may be classified into the first type of data and the second type of data according to the size of the data. The first type of data may include model file data of machine learning models. In some embodiments, the model file data of the machine learning models obtained by training under different machine learning frameworks are different. For example, the model file data of the machine learning model obtained by training with the keras framework mainly includes file data in h5 format. The file data in the h5 format includes the structure of the model, the weight of the model, the training configuration and the state of the optimizer. The model file data of the machine learning model obtained by training with the Tensorflow framework mainly includes file data in meta format and file data in ckpt format. The file data in the meta format is mainly used to save the model diagram structure. The file data in the ckpt format is mainly used to save variables such as network weight parameters.
在本实施方式中,第一类数据除了可以包括机器学习模型的模型文件数据之外,还可以包括机器学习模型评价指标文件数据和机器学习模型训练用数据。其中,所述机器学习模型训练用数据可以包括作为训练样本使用的图像数据、视频数据以及文本数据等等。所述机器学习模型评价指标文件数据是表示评价机器学习模型性能指标的数据。该性能指标可以包括准确率(Accuracy)、精确率(Precision)、召回率(Recall)、P-R曲线(Precision-Recall Curve)、F1 Score等等。In this embodiment, in addition to the model file data of the machine learning model, the first type of data may also include the machine learning model evaluation indicator file data and machine learning model training data. Wherein, the data for machine learning model training may include image data, video data, text data, etc. used as training samples. The data of the evaluation index file of the machine learning model is the data representing the performance index of evaluating the machine learning model. The performance indicators can include accuracy (Accuracy), precision (Precision), recall (Recall), P-R curve (Precision-Recall Curve), F1 Score and so on.
第二类数据可以包括机器学习模型的代码文件数据和模型文件的元文件数据。所述代码文件数据表示在机器学习框架中具体运行的代码。元文件数据可以作为上述第一类数据具体存储位置的索引数据,该元文件数据中可以包括第一类数据的具体存储位置、第一类数据的数据量。在一些实施方式中,可以使用密码散列函数计算第一类数据的散列值,将所述散列值存入所述元文件数据中,并相应修改第一类数据的文件名称。如此,可以利用MD5值的唯一性,针对多个第一类文件进行区分,再者,通过元文件数据中对应第一类数据的具体存储位置、数据量和散列值,实现可以便捷的找到所需要的第一类数据。The second type of data may include code file data of the machine learning model and metadata file data of the model file. The code file data represents codes that are specifically run in the machine learning framework. Metafile data may be used as index data of the specific storage location of the first type of data, and the metadata may include the specific storage location of the first type of data and the data volume of the first type of data. In some implementations, a cryptographic hash function may be used to calculate a hash value of the first type of data, store the hash value into the metadata, and modify the file name of the first type of data accordingly. In this way, the uniqueness of the MD5 value can be used to distinguish multiple first-class files. Furthermore, through the specific storage location, data volume and hash value corresponding to the first-class data in the metadata, it can be easily found The first type of data required.
在本实施方式中,可以采用一些模型训练过程的跟踪工具来获取第一类数据和第二类数据,例如,可以使用MLFLOW工具来进行模型训练过程的跟踪。为了使得MLFLOW工具可以适用于模型训练平台,可以针对MLFLOW进行二次开发,通过结合Http协议,便可以通过访问请求调用MLFLOW工具,使得MLFLOW工具可以适用于跟踪模型训练平台中部署的机器学习模型。In this embodiment, some tracking tools of the model training process can be used to obtain the first type of data and the second type of data, for example, the MLFLOW tool can be used to track the model training process. In order to make the MLFLOW tool applicable to the model training platform, secondary development of MLFLOW can be carried out. By combining the Http protocol, the MLFLOW tool can be called through an access request, so that the MLFLOW tool can be applied to the machine learning model deployed in the tracking model training platform.
在一个团队化的多机器学习模型的开发任务,可以使用该MLFLOW工具及时的获取第一类数据和第二类数据。在一些实施方式中,还可以采用其他的模型训练过程的跟踪工具。例如,TensorBoard,Trains等等。In a team-based multi-machine learning model development task, the MLFLOW tool can be used to obtain the first type of data and the second type of data in a timely manner. In some implementations, other tracking tools for the model training process can also be used. For example, TensorBoard, Trains, etc.
在本实施方式中,如图3所示,一个开发团队可以通过使用模型训练平台来对多个机器学习模型进行训练,该模型训练平台主要为了实现在线编程环境。该模型训练平台可以配置在服务器端或者云环境中。该模型训练平台可以包括模型训练模块、数据版本控制模块、存储模块。具体的说,所述的模型训练模块可以通过云技术和容器技术实现,团队成员均可以在该模型训练模块中进行模型训练。并且其中还配置了MLFLOW工具用于对实验结果进行追踪。所述数据版本控制模块包括DVC(data version control,数据版本管理)模块和GIT(一个开源的分布式版本控制系统)模块。该数据版本控制模块用于对实验结果进行版本控制。所述存储模块用于对所述实验结果进行统一存储。具体的,数据版本控制模块可以针对DVC模块和GIT模块进行一定二次开发,封装成软件服务,并提供基于Http协议的访问端口。模型训练平台需要调用DVC模块或GIT模块时,可以向Http协议向访问端口发出访问请求,并可以通过在访问请求中附带有参数,并以访问请求中的参数作为控制指令,实现致使DVC模块或GIT模块执行相应的功能。In this embodiment, as shown in FIG. 3 , a development team can train multiple machine learning models by using a model training platform, and the model training platform is mainly for realizing an online programming environment. The model training platform can be configured on the server side or in a cloud environment. The model training platform may include a model training module, a data version control module, and a storage module. Specifically, the model training module can be implemented through cloud technology and container technology, and all team members can perform model training in the model training module. And the MLFLOW tool is also configured to track the experimental results. The data version control module includes a DVC (data version control, data version management) module and a GIT (an open source distributed version control system) module. The data version control module is used for version control of experimental results. The storage module is used to uniformly store the experimental results. Specifically, the data version control module can carry out secondary development for the DVC module and the GIT module, package it into a software service, and provide an access port based on the Http protocol. When the model training platform needs to call the DVC module or GIT module, it can send an access request to the access port through the Http protocol, and can cause the DVC module or GIT modules perform corresponding functions.
在本实施方式中,请参阅图4,标注人员可以在模型训练平台进行样本标注得到样本数据集。算法工程师可以使用模型训练平台提供的在线编程环境,进行模型开发和模型调试。进而,可以将模型调试之后的机器学习模型使用所述样本数据集进行模型训练,并得到训练结果。训练结果包括机器学习模型的模型文件数据、机器学习模型的指标文件数据和超参数文件数据,以及生成所述机器学习模型的模型文件数据和机器学习模型评价指标文件数据对应的元文件数据。算法工程师可以基于模型训练平台,验证训练后机器学习模型的效果是否理想。在认为不够理想的情况下,可以人工调整模型文件数据的参数,并进行模型调试之后,再次执行模型训练的过程。或者,在认为不够理想的情况下,也可以反馈一些针对样本数据的数据优化建议,以优化样本数据集中样本数据。如果认为机器学习模型已经取得理想效果,可以结束对机器学习模型的处理。In this embodiment, please refer to FIG. 4 , the labeler can perform sample labeling on the model training platform to obtain a sample data set. Algorithm engineers can use the online programming environment provided by the model training platform for model development and model debugging. Furthermore, the machine learning model after model debugging can be used for model training using the sample data set, and a training result can be obtained. The training results include the model file data of the machine learning model, the index file data and hyperparameter file data of the machine learning model, and the metadata file data corresponding to the generated model file data of the machine learning model and the machine learning model evaluation index file data. Algorithm engineers can verify whether the effect of the trained machine learning model is ideal based on the model training platform. In the case that it is not ideal, you can manually adjust the parameters of the model file data, and after model debugging, perform the model training process again. Or, if it is not considered ideal, you can also feed back some data optimization suggestions for the sample data to optimize the sample data in the sample data set. If you think that the machine learning model has achieved the desired effect, you can end the processing of the machine learning model.
在一些实施方式中,该在线编程环境中可以预先配置好上述的模型训练过程的跟踪工具。例如,MLFLOW工具。还可以预先配置好机器学习框架。例如,Pytorch、TensorFlow、XGBoost、scikit-learn等机器学习框架。在一些实施方式中,上述的在线编程环境可以采用如下的方式进行实现。例如,当所述的模型训练平台配置在云环境中时,开发团队的成员可以通过应用程序接口(application program interface,API)或者图形用户界面(graphical user interface,GUI)输入配置信息。该配置信息被发送至模型训练平台中。模型训练平台根据该配置信息创建对应的Docker。在具体执行训练时,启动Docker,根据所述的配置信息给Docker挂载GPU和存储。Docker根据配置信息执行机器学习框架、MLFLOW工具,并且读取数据集文件进行训练。在一些实施方式中,若是开发团队的成员的需要修改机器学习模型的代码,则可以使用Jupyter NoteBook在线调整机器学习模型的代码。In some implementation manners, the above-mentioned tracking tool for the model training process may be pre-configured in the online programming environment. For example, the MLFLOW tool. Machine learning frameworks can also be pre-configured. For example, machine learning frameworks such as Pytorch, TensorFlow, XGBoost, scikit-learn, etc. In some implementations, the above-mentioned online programming environment may be implemented in the following manner. For example, when the model training platform is configured in a cloud environment, members of the development team can input configuration information through an application program interface (application program interface, API) or a graphical user interface (graphical user interface, GUI). The configuration information is sent to the model training platform. The model training platform creates a corresponding Docker based on the configuration information. When performing training, start Docker, and mount GPU and storage to Docker according to the configuration information. Docker executes the machine learning framework and MLFLOW tool according to the configuration information, and reads the dataset file for training. In some embodiments, if a member of the development team needs to modify the code of the machine learning model, Jupyter NoteBook can be used to adjust the code of the machine learning model online.
在本实施方式中,考虑到机器学习模型开发任务的特性,将每个机器学习模型完成训练后得到的文件区分为第一类数据和第二类数据。为后续的文件存储管理打下了基础。In this embodiment, considering the characteristics of the machine learning model development task, the files obtained after the training of each machine learning model are divided into the first type of data and the second type of data. It lays the foundation for subsequent file storage management.
步骤S102:将每个机器学习模型的第一类数据均保存至所述模型训练平台的模型文件数据集,第二类数据均保存至所述模型训练平台的文本文件数据集。Step S102: Save the first type of data of each machine learning model to the model file data set of the model training platform, and save the second type of data to the text file data set of the model training platform.
在一些情况下,通过模型训练过程的跟踪工具来获取第一类数据和第二类数据之后,还需要对这些文件进行存储,以便于后续的查看或者使用。In some cases, after obtaining the first type of data and the second type of data through the tracking tool of the model training process, these files need to be stored for subsequent viewing or use.
在本实施方式中,模型文件数据集用于存储第一类数据。相应的,文本文件数据集用于存储第二类数据。在一些实施方式中,模型文件数据集可以采用远程存储库。例如,FS/HDFS/NFS/NAS等。文本文件数据集可以采用远程代码仓库。例如,Gitee/Gitlab/Github等。具体的说,针对上述不同的数据类型(机器学习模型的模型文件数据、机器学习模型评价指标文件数据、机器学习模型训练用数据、机器学习模型的代码文件数据以及相应的元文件数据)构建一套通用的数据存储方案,进行统一的数据版本管理。其中,针对所述的机器学习模型的模型文件数据、机器学习模型评价指标文件数据和机器学习模型训练用数据,首先将其输入密码散列函数生成散列值,并将相应的散列值存储于对应的元文件数据中,将第一类数据的文件名称命名该文件对应的散列值,将其存储于所述的远程存储库中。将对应的元文件数据存储于远程代码仓库。如此,可以通过散列值,在元文件数据与第一类数据之间建立关联关系。可以实现便捷的通过元文件数据的内容查找到第一类数据。In this embodiment, the model file data set is used to store the first type of data. Correspondingly, the text file dataset is used to store the second type of data. In some implementations, the model file dataset may employ a remote repository. For example, FS/HDFS/NFS/NAS, etc. Text file datasets can use remote code repositories. For example, Gitee/Gitlab/Github etc. Specifically, for the above-mentioned different data types (model file data of machine learning models, file data of machine learning model evaluation indicators, data for machine learning model training, code file data of machine learning models, and corresponding metadata file data), a A common data storage solution for unified data version management. Wherein, for the model file data of the machine learning model, the machine learning model evaluation index file data and the machine learning model training data, it is first input into a password hash function to generate a hash value, and the corresponding hash value is stored In the corresponding metadata file data, name the file name of the first type of data with the hash value corresponding to the file, and store it in the remote storage repository. Store the corresponding metadata file data in the remote code repository. In this way, an association relationship can be established between the metadata file data and the first type of data through the hash value. The first type of data can be found conveniently through the content of the metadata file.
针对所述机器学习模型的代码数据存储于远程代码仓库。对于机器学习模型的代码文件数据可以采用Git进行代码的版本管理。对于上述的机器学习模型的模型文件数据、机器学习模型评价指标文件数据和机器学习模型训练用数据同样采用Git进行文件的版本控制,以减少对于其他组件的依赖。但是对于二进制文件来说,Git需要存储每次提交的改动,每次当二进制文件修改发生变化的时候,都会产生额外的提交量。这将会导致数据量大增,远程代码仓库的体积也会迅速增长。为了减少远程代码仓库本身的体积,在一些实施方式中,可以引入DVC来进行数据版本控制。例如,在所述的远程代码仓库中仅仅存储所述机器学习模型的模型文件数据、机器学习模型评价指标文件数据和机器学习模型训练用数据的元文件数据。而所述机器学习模型的模型文件数据、机器学习模型评价指标文件数据和机器学习模型训练用数据存储在远程存储库中。Code data for the machine learning model is stored in a remote code warehouse. For the code file data of the machine learning model, Git can be used for code version management. For the above-mentioned model file data of the machine learning model, machine learning model evaluation index file data, and machine learning model training data, Git is also used for file version control to reduce dependence on other components. But for binary files, Git needs to store the changes for each submission, and every time the binary file modification changes, additional submissions will be generated. This will lead to a large increase in the amount of data, and the volume of the remote code warehouse will also increase rapidly. In order to reduce the size of the remote code warehouse itself, in some implementations, DVC can be introduced to perform data version control. For example, only the model file data of the machine learning model, the machine learning model evaluation index file data, and the metadata file data of the machine learning model training data are stored in the remote code repository. The model file data of the machine learning model, the machine learning model evaluation index file data, and the machine learning model training data are stored in the remote repository.
在本实施方式中,所述的保存的动作可以通过所述的Git和DVC这两个工具来实现。具体的说,通过Git来处理所述机器学习模型的代码文件数据和所述机器学习模型的模型文件数据、机器学习模型评价指标文件数据和机器学习模型训练用数据的元文件数据。通过DVC来处理所述机器学习模型的模型文件数据、机器学习模型评价指标文件数据和机器学习模型训练用数据。更具体的说,先通过DVC将真正的数据源(所述机器学习模型的模型文件数据、机器学习模型评价指标文件数据和机器学习模型训练用数据)传到远程存储库。然后使用git push将DVC的映射数据推送到远程代码库,例如github或者gitlab,其中,该DVC的映射数据也即是上述的元文件数据。In this embodiment, the action of saving can be realized by the two tools of Git and DVC. Specifically, the code file data of the machine learning model, the model file data of the machine learning model, the evaluation indicator file data of the machine learning model, and the metadata of the machine learning model training data are processed through Git. The model file data of the machine learning model, the machine learning model evaluation index file data and the data for training the machine learning model are processed through the DVC. More specifically, the real data source (the model file data of the machine learning model, the machine learning model evaluation indicator file data, and the machine learning model training data) is first transmitted to the remote repository through the DVC. Then use git push to push the mapping data of the DVC to a remote code repository, such as github or gitlab, where the mapping data of the DVC is also the above-mentioned metadata file data.
在一些实施方式中,可以指定所述机器学习模型的模型文件数据、机器学习模型评价指标文件数据、机器学习模型训练用数据和代码文件数据在具体某一机器学习模型的开发任务中的目录结构。通过一个统一的目录结构,来保证团队成员在排查问题时能够很方便的定位相关的问题。同时,采用一个统一的目录结构,可以实现多版本的数据合并功能,通过定义相应的规约数据,能够有效避免数据文件的冲突,所述的统一的目录结构可以包括如下6个第一级目录。具体是,dvc目录(数据版本管理目录)、git目录(代码版本管理目录)、data_set目录(机器学习模型训练用数据目录)、model_file目录(机器学习模型的模型文件数据目录)、model_metric目录(机器学习模型评价指标文件数据目录)、source_code目录(代码文件数据目录)。In some implementations, the directory structure of the machine learning model's model file data, machine learning model evaluation indicator file data, machine learning model training data and code file data in a specific development task of a certain machine learning model can be specified . Through a unified directory structure, to ensure that team members can easily locate related problems when troubleshooting. At the same time, adopting a unified directory structure can realize the multi-version data merging function, and by defining corresponding specification data, the conflict of data files can be effectively avoided. The unified directory structure can include the following six first-level directories. Specifically, dvc directory (data version management directory), git directory (code version management directory), data_set directory (data directory for machine learning model training), model_file directory (model file data directory for machine learning models), model_metric directory (machine Learning model evaluation index file data directory), source_code directory (code file data directory).
在本实施方式中,可以以一个具体的例子来说明如何对第一类数据和第二类数据执行存入的操作。请参阅图5。具体的,以gitlab存储文本文件数据集,以hdfs存储模型文件数据集,以scikit-learn为机器学习框架,以MLflow为模型训练过程的跟踪工具,来进行举列说明。首先,可以在模型训练平台中创建本地新机器学习模型的实验任务,并判断该新机器学习模型的输出目录是否存在,如果存在,则不做处理,如果不存在,则创建新目录,创建新版本分支,并切换到新分支。算法工程师可以使用模型训练平台提供的在线编辑环境进行模型开发并调试,具体的,基于机器学习框架制定机器学习模型的初始版本,以及针对机器学习模型进行调试。In this implementation manner, a specific example may be used to illustrate how to perform the storage operation on the first type of data and the second type of data. See Figure 5. Specifically, gitlab is used to store text file datasets, hdfs is used to store model file datasets, scikit-learn is used as the machine learning framework, and MLflow is used as the tracking tool for the model training process, for example. First, you can create an experimental task for a new local machine learning model in the model training platform, and judge whether the output directory of the new machine learning model exists. If it exists, do nothing. If it does not exist, create a new directory. Release branch, and switch to the new branch. Algorithm engineers can use the online editing environment provided by the model training platform to develop and debug the model, specifically, formulate the initial version of the machine learning model based on the machine learning framework, and debug the machine learning model.
接下来,可以使用标注人员在模型训练平台中标注的样本数据形成的数据集,针对机器学习模型进行模型训练,MLflow跟踪训练过程,监控训练是否成功。如果训练失败,则结束实验任务;如果训练成功,得到训练结果后结束实验任务,训练结果包括机器学习模型的模型文件数据和机器学习模型评价指标文件,以及生成所述机器学习模型的模型文件数据和机器学习模型评价指标文件数据对应的元文件数据。将元文件数据添加到暂存区。提交元文件数据到本地代码仓库中,再将该本地代码仓库中的元文件数据推送到gitlab中。可以调用DVC将所述机器学习模型的模型文件数据和机器学习模型评价指标文件数据传送至hdfs中存储。Next, you can use the data set formed by the sample data marked by the labeler on the model training platform to carry out model training for the machine learning model. MLflow tracks the training process and monitors whether the training is successful. If the training fails, then end the experimental task; if the training is successful, end the experimental task after obtaining the training results, the training results include the model file data of the machine learning model and the machine learning model evaluation index file, and generate the model file data of the machine learning model Metafile data corresponding to the machine learning model evaluation indicator file data. Add metadata to the staging area. Submit the metafile data to the local code warehouse, and then push the metafile data in the local code warehouse to gitlab. The DVC can be called to transfer the model file data of the machine learning model and the machine learning model evaluation indicator file data to hdfs for storage.
步骤S103:根据所述模型文件数据集和所述文本文件数据集,生成文件索引信息;其中,所述文件索引信息包括所述机器学习模型的第一类数据和第二类数据的访问入口信息。Step S103: Generate file index information according to the model file data set and the text file data set; wherein, the file index information includes the access entry information of the first type of data and the second type of data of the machine learning model .
在一些情况下,对于一个团队化的多机器学习模型的开发任务中,再实现了对于开发过程的跟踪以及对于数据的统一存储和版本控制之后,还需要使得团队成员能够便捷的查看本团队中其他成员的实验结果。In some cases, for a team-based multi-machine learning model development task, after realizing the tracking of the development process and the unified storage and version control of the data, it is also necessary for team members to easily view the Experimental results of other members.
在本实施方式中,所述文件索引信息可以包括所述第一类数据和所述第二类数据的访问入口信息。所述的访问入口信息表示为对应机器学习模型的第一类数据和第二类数据实际存储地址的信息。在一些实施方式中,所述的文件索引信息除了可以包括对应机器学习模型的第一类数据和第二类数据的访问入口信息,还可以包括对应机器学习模型的第一类数据和第二类数据自身携带的信息。具体的说,例如,代码文件数据携带的信息,可以包括具体的代码内容数据和对应的版本。机器学习模型评价指标文件数据携带的信息,可以包括某一机器学习模型对应的评价指标。团队成员可以根据该文件索引信息来查询所述的第一类数据和所述第二类数据包括的信息。还可以根据该文件索引信息将所述的第一类数据和所述第二类数据下载到本地等等。In this implementation manner, the file index information may include access entry information of the first type of data and the second type of data. The access entry information is represented as information corresponding to the actual storage address of the first type of data and the second type of data of the machine learning model. In some implementations, in addition to the access entry information of the first type of data and the second type of data corresponding to the machine learning model, the file index information may also include the first type of data and the second type of data corresponding to the machine learning model Information carried by the data itself. Specifically, for example, the information carried by the code file data may include specific code content data and corresponding versions. The information carried in the machine learning model evaluation index file data may include the evaluation index corresponding to a certain machine learning model. Team members can query the information included in the first type of data and the second type of data according to the file index information. The first type of data and the second type of data may also be downloaded to the local according to the file index information.
具体的,在一些实施方式中,该第一类数据可以包括机器学习模型的模型文件数据和机器学习模型评价指标文件数据。该第二类数据可以包括机器学习模型的代码文件数据、模型文件数据和评价指标文件数据的元文件数据。在该种情况下,团队成员可以根据所述的文件索引信息查询某一机器学习模型的模型文件数据,从而得到该机器学习模型的模型参数、网络架构等等。团队成员可以根据所述的文件索引信息查询机器学习模型评价指标文件数据,从而得到该机器学习模型的具体评价指标。团队成员可以根据所述的文件索引信息查询机器学习模型评价指标文件数据的元文件数据,再根据该机器学习模型评价指标文件的元文件数据,来确定该机器学习模型评价指标文件数据的真实存储位置,从而将该机器学习模型评价指标文件数据下载到本地。进一步的,团队成员还可以根据该具体评价指标,来选择是否对该机器学习模型进行重新训练。Specifically, in some implementations, the first type of data may include model file data and machine learning model evaluation index file data of the machine learning model. The second type of data may include code file data of the machine learning model, model file data, and metadata file data of the evaluation index file data. In this case, team members can query the model file data of a certain machine learning model according to the file index information, so as to obtain the model parameters, network architecture, etc. of the machine learning model. Team members can query the machine learning model evaluation index file data according to the file index information, so as to obtain the specific evaluation index of the machine learning model. Team members can query the metadata of the machine learning model evaluation index file data according to the file index information, and then determine the actual storage of the machine learning model evaluation index file data according to the metadata of the machine learning model evaluation index file data location, so as to download the machine learning model evaluation index file data to the local. Furthermore, team members can also choose whether to retrain the machine learning model according to the specific evaluation index.
在本实施方式中,通过根据所述模型文件数据集和所述文本文件数据集,生成文件索引信息。团队成员可以根据该文件索引信息来便捷的查看本团队中其他成员的实验结果。解决了团队成员无法便捷的查看本团队中其他成员的实验结果的技术问题,实现了团队成员对实验结果的快速查看,提高了查看和比对的效率。在本实施方式中,提供了模型文件数据集和文本文件数据集对所述的第一类数据和第二类数据进行存储。因此,便于将所述的第一类数据和第二类数据进行容灾处理,从而避免了因磁盘损坏而导致数据丢失的情形。相应的,该种集中化的管理第一类数据和第二类数据的方式,也便于进行对实验结果进行版本化管理,便于对于实验进行跟踪。In this embodiment, file index information is generated according to the model file data set and the text file data set. Team members can easily view the experimental results of other members of the team based on the file index information. It solves the technical problem that team members cannot conveniently view the experimental results of other members of the team, realizes the rapid viewing of experimental results by team members, and improves the efficiency of viewing and comparison. In this embodiment, a model file data set and a text file data set are provided to store the first type of data and the second type of data. Therefore, it is convenient to perform disaster recovery processing on the first type of data and the second type of data, thereby avoiding data loss caused by disk damage. Correspondingly, this centralized management method of the first type of data and the second type of data is also convenient for version management of the experimental results, and convenient for tracking the experiments.
在一些实施方式中,在所述在每个机器学习模型完成训练后,获取训练产生的第一类数据和第二类数据的步骤之前,还包括:接收一个数据端口发送的执行多个机器学习模型训练任务的请求;其中,所述请求包括所述机器学习模型训练任务的配置信息;根据所述配置信息为所述多个机器学习模型训练任务创建对应的工作空间;在所述工作空间内执行所述机器学习模型训练任务。In some implementations, before the step of obtaining the first type of data and the second type of data generated by the training after each machine learning model is trained, it also includes: receiving a data port to perform multiple machine learning A request for a model training task; wherein, the request includes configuration information of the machine learning model training task; according to the configuration information, a corresponding workspace is created for the plurality of machine learning model training tasks; in the workspace Execute the machine learning model training task.
在一些情况下,对于一个团队化的多机器学习模型的开发任务中,在某一个时间节点,仅仅只有一个团队程序需要执行机器学习模型训练任务,并且需要同时执行多个机器学习模型训练任务的情况。因此,对于只有一个团队成员但是具有多个机器学习模型进行开发的情形,同样需要进行文件统一管理和存储。In some cases, for a team-based multi-machine learning model development task, at a certain time node, only one team program needs to execute the machine learning model training task, and it needs to execute multiple machine learning model training tasks at the same time. Condition. Therefore, for the situation where there is only one team member but multiple machine learning models are developed, unified management and storage of files is also required.
所述的一个数据端口表示为某一个团队成员基于一个终端装置,可以通过该终端装置的账户登录了所述的模型训练平台执机器学习模型训练任务。该团队成员可以登录该终端上配置的web页面,也可以是该终端上配置的客户端,或者其他类型的用户终端等等。该团队成员在所述的web页面或者客户端上提交执行多个机器学习模型训练任务的请求。然后该请求被发送至服务器中。The one data port means that a certain team member can log in to the model training platform through the account of the terminal device to perform machine learning model training tasks based on a terminal device. The team member may log in to a web page configured on the terminal, or may be a client configured on the terminal, or another type of user terminal, and the like. The team member submits requests for performing multiple machine learning model training tasks on the web page or client. The request is then sent to the server.
所述的配置信息可以是具体的配置文件。具体的说,当某一团队成员在所述的web页面或者客户端输入相应的信息,所述的web页面或者客户端根据该信息生成配置文件。其中,该信息可以是模型训练框架的种类、代码版本、运行参数等等。或者,团队成员直接通过所述的web页面或者客户端提供的配置模板进行选取相应的配置,然后所述的web页面或者客户端生成配置文件。该配置文件被携带在所述一个账户发送的执行多个机器学习模型训练任务的请求中,当服务器接收到该配置文件后,对该配置文件进行解析,得到对应该多个机器学习模型训练任务的模型训练框架的种类、代码版本、运行参数、数据集文件的路径等等。其中,该运行参数还可以包括系统版本、内存大小,GPU卡数等等。The configuration information may be a specific configuration file. Specifically, when a certain team member inputs corresponding information on the web page or client, the web page or client generates a configuration file according to the information. Wherein, the information may be the type of model training framework, code version, operating parameters, and so on. Or, the team members directly select the corresponding configuration through the configuration template provided by the web page or the client, and then the web page or the client generates the configuration file. The configuration file is carried in the request sent by the account to execute multiple machine learning model training tasks. After receiving the configuration file, the server parses the configuration file to obtain the corresponding multiple machine learning model training tasks. The type of model training framework, code version, running parameters, path of the dataset file, etc. Wherein, the operating parameters may also include system version, memory size, number of GPU cards, and the like.
所述的工作空间可以表示为使用容器化技术通过一个镜像来构建出一个隔离且完整的训练环境。针对多个机器学习模型训练任务分别构建对应的工作空间或者说训练环境。具体的说,在一些实施方式中,可以通过构建的容器镜像,快速部署一个容器实例来运行镜像中的交互式开发工作簿(例如Jupyter Notebook)、模型训练过程的跟踪工具(例如MLflow)、数据版本控制工具(Git和DVC)。某一个团队成员A可以登录该交互式开发工作簿种对多个机器学习模型训练过程进行控制。比例,可以修改某一机器学习模型的超参数等等。同时,该多个机器学习模型的实验结果可以被模型训练过程的跟踪工具进行采集,再通过数据版本控制工具进行分别存储。此时,另外一个团队成员B可以登录所述web页面或者客户端来查看该多个机器学习模型的训练结果。例如,具体的训练结果可以被MLflow渲染到前端页面进行展示。The workspace described above can be expressed as using containerization technology to build an isolated and complete training environment through a mirror image. Construct corresponding workspaces or training environments for multiple machine learning model training tasks. Specifically, in some implementations, a container instance can be quickly deployed to run an interactive development workbook (such as Jupyter Notebook), a tracking tool for the model training process (such as MLflow), and data Version control tools (Git and DVC). A certain team member A can log in to the interactive development workbook to control the training process of multiple machine learning models. Scale, you can modify the hyperparameters of a machine learning model, etc. At the same time, the experimental results of the multiple machine learning models can be collected by the tracking tool of the model training process, and then stored separately through the data version control tool. At this point, another team member B can log in to the web page or client to view the training results of the multiple machine learning models. For example, specific training results can be rendered to the front-end page by MLflow for display.
本方法针对仅仅只有一个团队成员但是需要执行多个机器学习模型训练任务的情形。采用容器化技术针对每一个机器学习模型训练任务创建对应的工作空间,在该工作空间中执行对应的机器学习模型训练任务,相应的,其他不需要执行机器学习模型训练任务的团队成员同样可以登录模型训练平台中查看所述多个机器学习模型的实验结果。该方法考虑到了团队化的多机器学习模型的开发任务中的一种特殊情形,保证该团队成员能够快速查看和对比实验结果,提升了团队化开发任务的效率。This method is aimed at the situation where there is only one team member but multiple machine learning model training tasks need to be performed. Use containerization technology to create a corresponding workspace for each machine learning model training task, and execute the corresponding machine learning model training task in this workspace. Correspondingly, other team members who do not need to perform machine learning model training tasks can also log in Check the experimental results of the multiple machine learning models in the model training platform. This method takes into account a special situation in the development task of team-based multi-machine learning models, ensures that the team members can quickly view and compare experimental results, and improves the efficiency of team-based development tasks.
在一些实施方式中,所述获取机器学习模型训练产生的第一类数据和第二类数据的步骤之前,还包括:接收多个数据端口发送的执行机器学习模型训练任务的请求;其中,所述请求包括执行多个机器学习模型训练任务以及所述机器学习模型训练任务的配置信息;根据所述配置信息为所述数据端口创建对应的工作空间;在所述工作空间内执行所述机器学习模型训练任务。In some implementations, before the step of acquiring the first type of data and the second type of data generated by machine learning model training, it also includes: receiving requests from multiple data ports to perform machine learning model training tasks; wherein, the The request includes executing multiple machine learning model training tasks and configuration information of the machine learning model training tasks; creating a corresponding workspace for the data port according to the configuration information; executing the machine learning in the workspace Model training tasks.
在一些情况下,对于一个团队化的多机器学习模型的开发任务中,在某一个时间节点,存在多个团队成员需要执行机器学习模型训练任务。并且其中至少一个团队成员需要同时执行多个机器学习模型训练任务的情况。因此,针对该种情况同样需要进行文件统一管理和存储。In some cases, for a team-based multi-machine learning model development task, at a certain time node, there are multiple team members who need to perform machine learning model training tasks. And where at least one team member needs to perform multiple machine learning model training tasks at the same time. Therefore, unified management and storage of files is also required for this situation.
所述的多个数据端口表示为多个团队成员通过各自的终端装置登录了所述的模型训练平台执行机器学习模型训练任务。所述根据所述配置信息为所述账户创建对应的工作空间表示为每一个数据端口创建其对应的工作空间。也即是,一个数据端口对应一个容器实例。其中,有的容器示例中需要执行一个机器学习模型训练任务,有的容器示例中需要执行多个机器学习模型训练任务。The multiple data ports indicate that multiple team members have logged into the model training platform through their respective terminal devices to perform machine learning model training tasks. The creating a corresponding workspace for the account according to the configuration information means creating a corresponding workspace for each data port. That is, one data port corresponds to one container instance. Among them, some container examples need to perform one machine learning model training task, and some container examples need to perform multiple machine learning model training tasks.
所述的请求可以表示为一个请求中包括了一个机器学习模型训练任务和该任务对应的配置信息。所述的请求也可以表示为一个请求中包括了多个机器学习模型训练任务和该任务对应的配置信息。所述的请求还可以表示为一个请求中仅仅包括了一个机器学习模型训练任务或者一个机器学习模型训练任务对应的配置信息。所述的请求还可以表示为一个请求中包括了多个机器学习模型训练任务或者该多个机器学习模型训练任务对应的配置信息。The request may be expressed as a request including a machine learning model training task and configuration information corresponding to the task. The request may also be expressed as a request including multiple machine learning model training tasks and configuration information corresponding to the tasks. The request may also be represented as a request including only one machine learning model training task or configuration information corresponding to one machine learning model training task. The request may also be represented as a request including multiple machine learning model training tasks or configuration information corresponding to the multiple machine learning model training tasks.
本方法针对一个团队化的多机器学习模型的开发任务中,存在多个团队成员需要执行机器学习模型训练任务,并且其中至少一个团队成员需要同时执行多个机器学习模型训练任务的情况。采用容器化技术针对每一个数据端口创建对应的工作空间,在该工作空间中执行对应的机器学习模型训练任务,相应的,该团队中任一团队成员均可以登录模型训练平台中查看其他团队成员的机器学习模型的实验结果。该方法考虑到了团队化的多机器学习模型的开发任务中的另外一种特殊情形,保证该团队成员能够快速查看和对比实验结果,提升了团队化开发任务的效率。This method is aimed at the situation that in a team-based multi-machine learning model development task, multiple team members need to perform machine learning model training tasks, and at least one of the team members needs to simultaneously perform multiple machine learning model training tasks. Use containerization technology to create a corresponding workspace for each data port, and execute the corresponding machine learning model training tasks in the workspace. Correspondingly, any team member in the team can log in to the model training platform to view other team members The experimental results of the machine learning model. This method takes into account another special situation in the development task of team-based multi-machine learning models, and ensures that the team members can quickly view and compare experimental results, improving the efficiency of team-based development tasks.
在一些实施方式中,所述模型文件数据集中预先存储有机器学习模型训练用数据,所述训练用数据包括训练集数据,所述工作空间中预先配置有机器学习模型训练框架;所述在所述工作空间内执行所述机器学习模型训练任务的步骤,包括:根据所述配置信息从所述模型文件数据集中获取机器学习模型训练用数据以及从所述文本文件数据集中获取机器学习模型的代码文件数据;使用所述训练用数据、所述代码文件数据以及所述训练框架执行所述机器学习模型训练任务。In some implementations, the data set for machine learning model training is pre-stored in the model file data set, the training data includes training set data, and the machine learning model training framework is pre-configured in the workspace; The step of executing the machine learning model training task in the workspace includes: obtaining the machine learning model training data from the model file data set according to the configuration information and obtaining the machine learning model code from the text file data set file data; using the training data, the code file data and the training framework to execute the machine learning model training task.
在一些情况下,对于一个团队化的多机器学习模型的开发任务中,不同的机器学习模型采用的训练用数据不同。例如,对于用于进行图像识别的机器学习模型采用的训练用数据为图像数据,用于进行语音识别的机器学习模型采用的训练用数据为语音数据。对于同一个机器学习模型,也可能存在多个不同版本的数据集。因此,对于一个团队化的多机器学习模型的开发任务中,对于所述机器学习模型训练用数据也同样需要进行统一的管理。不同的机器学习模型可能采用不同的机器学习模型训练框架,相应的,采用的代码文件数据也可能不相同,因此,考虑到节省团队成员配置相应开发工具和环境的时间。可以直接将所述的机器学习模型训练框架预先配置在工作空间中。也相应的,预先将机器学习模型训练用数据存储在模型文件数据集中,预先将机器学习模型的代码文件数据存储在文本文件数据集中。当需要使用,从所述模型文件数据集和所述文本文件数据集中进行调用。In some cases, for a team-based multi-machine learning model development task, different machine learning models use different training data. For example, the training data used by the machine learning model for image recognition is image data, and the training data used by the machine learning model for speech recognition is voice data. For the same machine learning model, there may also be multiple different versions of the dataset. Therefore, for a team-based development task of multiple machine learning models, the data used for training the machine learning models also needs to be managed in a unified manner. Different machine learning models may use different machine learning model training frameworks. Correspondingly, the code file data used may also be different. Therefore, it is considered to save time for team members to configure corresponding development tools and environments. The machine learning model training framework can be directly preconfigured in the workspace. Correspondingly, the data for machine learning model training is stored in the model file data set in advance, and the code file data of the machine learning model is stored in the text file data set in advance. When needed, it is called from the model file data set and the text file data set.
所述机器学习模型训练用数据包括训练集数据,该训练集数据可以分为图像数据文件、文本数据文件、语音数据文件以及视频数据文件等等。The data for machine learning model training includes training set data, which can be divided into image data files, text data files, voice data files, video data files, and the like.
所述机器学习模型训练框架为预先配置好的,例如,Pytorch、TensorFlow、XGBoost、scikit-learn等机器学习框架。所述代码文件为所述机器学习模型对应代码的文件。The machine learning model training framework is preconfigured, for example, machine learning frameworks such as Pytorch, TensorFlow, XGBoost, and scikit-learn. The code file is a file corresponding to the code of the machine learning model.
本方法可以极大节省团队成员准备数据集的时间、配置相应开发工具和环境的时间。提高团队成员的工作效率。This method can greatly save the time for team members to prepare data sets and configure corresponding development tools and environments. Improve team member productivity.
在一些实施方式中,所述第一类数据还包括机器学习模型评价指标文件数据;所述获取机器学习模型训练产生的第一类数据和第二类数据步骤之后,还包括:将所述评价指标文件数据中的评价指标与预先设定评价指标阈值进行比较,在满足预设条件的情况下,继续执行所述机器学习模型训练任务。In some implementations, the first type of data also includes machine learning model evaluation index file data; after the step of obtaining the first type of data and the second type of data generated by machine learning model training, it also includes: The evaluation index in the index file data is compared with the preset evaluation index threshold, and if the preset condition is satisfied, the machine learning model training task is continued.
在一些情况下,机器学习模型需要经过多轮次的迭代训练,机器学习模型的损失函数才能收敛。损失函数收敛之后,得到的机器学习模型不一定是本次任务所需要的。例如,这个机器学习模型的某些机器学习模型评价指标可能达不到本次任务的要求。因此,需要再进行训练,以满足本次任务的要求。In some cases, the machine learning model needs to undergo multiple rounds of iterative training before the loss function of the machine learning model can converge. After the loss function converges, the obtained machine learning model is not necessarily what is needed for this task. For example, some machine learning model evaluation indicators of this machine learning model may not meet the requirements of this task. Therefore, further training is required to meet the requirements of this task.
所述机器学习模型评价指标文件数据表示评价机器学习模型性能指标的数据。该性能指标可以包括准确率(Accuracy)、精确率(Precision)、召回率(Recall)、P-R曲线(Precision-Recall Curve)、误报率(FPR)、F1 Score等等。The machine learning model evaluation index file data represents the data for evaluating the performance index of the machine learning model. The performance indicators can include accuracy rate (Accuracy), precision rate (Precision), recall rate (Recall), P-R curve (Precision-Recall Curve), false positive rate (FPR), F1 Score and so on.
所述机器学习模型评价指标阈值可以是人为自主设定的,具体设定多少,需要对应具体的任务。所述机器学习模型评价指标阈值可以为所有的机器学习模型评价指标设定阈值,或者仅仅只为某几个关键的机器学习模型评价指标设定阈值。只要在某次机器学习模型完成训练后,得到的机器学习模型评价指标文件数据其包括的某几个关键的机器学习模型评价指标达到了阈值。也可以不用再进行训练。具体选择哪几个机器学习模型评价指标为关键的机器学习模型评价指标,可以对应具体的任务来进行设定。The threshold of the machine learning model evaluation index can be set independently by humans, and the specific setting needs to correspond to specific tasks. The machine learning model evaluation indicator threshold can be set for all machine learning model evaluation indicators, or only for some key machine learning model evaluation indicators. As long as certain key machine learning model evaluation indicators included in the obtained machine learning model evaluation index file data reach the threshold after a certain machine learning model is trained. You can also stop training. Which machine learning model evaluation indicators to choose as the key machine learning model evaluation indicators can be set corresponding to specific tasks.
所述约束条件可以表示为当某个评价指标低于所述预先设定评价指标阈值的情况下,继续执行所述机器学习模型训练任务。也可以表示为当某个评价指标高于所述预先设定评价指标阈值的情况下,继续执行所述机器学习模型训练任务。具体的,在一次具体的机器学习训练任务中,可以将所述准确率的评价指标阈值设定为99%,所述的约束条件则可以表示为当准确率低于99%的情况下,继续执行所述机器学习模型训练任务。在一次具体的机器学习训练任务重,可以将所述误报率的评价指标阈值设定为1%,所述的约束条件则可以表示为当误报率高于1%的情况下,继续执行所述机器学习模型训练任务。The constraint condition may be expressed as continuing to execute the machine learning model training task when a certain evaluation index is lower than the preset evaluation index threshold. It may also be expressed as continuing to execute the machine learning model training task when a certain evaluation index is higher than the preset evaluation index threshold. Specifically, in a specific machine learning training task, the evaluation index threshold of the accuracy rate can be set to 99%, and the constraint condition can be expressed as when the accuracy rate is lower than 99%, continue to Execute the machine learning model training task. In a specific machine learning training task, the evaluation index threshold of the false alarm rate can be set to 1%, and the constraint condition can be expressed as when the false alarm rate is higher than 1%, continue to execute The machine learning model training task.
本方法考虑到了实际的应用场景中,训练完成后得到的机器学习模型,其对应的评价指标可能达不到任务的实际需求,因此,需要再进行训练的情形。本方法通过将所述机器学习模型评价指标文件表示的机器学习模型评价指标与预先设定的机器学习模型评价指标阈值进行比较,来判断是否再执行训练。具有明显的使用价值,能够提升团队成员的工作效率。This method takes into account that in actual application scenarios, the corresponding evaluation index of the machine learning model obtained after training may not meet the actual needs of the task, so further training is required. This method judges whether to perform training again by comparing the machine learning model evaluation index represented by the machine learning model evaluation index file with a preset machine learning model evaluation index threshold. It has obvious use value and can improve the work efficiency of team members.
在一些实施方式中,所述方法还包括:接收对于所述文件索引信息的访问请求;响应于终端对所述文件索引信息的访问请求,将所述文件索引信息发送至所述终端用于形成进行展示的索引页面;其中,所述索引页面中包括所述机器学习模型的第一类数据和第二类数据的访问入口信息或者与所述访问入口信息相绑定的文本标识。In some embodiments, the method further includes: receiving an access request for the file index information; in response to the terminal's access request for the file index information, sending the file index information to the terminal for forming An index page for display; wherein, the index page includes the access entry information of the first type of data and the second type of data of the machine learning model or a text identifier bound to the access entry information.
在一些情况下,对于一个团队化的多机器学习模型的开发任务中,存在这样的一个需求。各个团队成员期望在一个终端的页面能够同时查询或者浏览各个机器学习模型训练完成之后,得到的评价指标、模型文件数据、代码文件数据等等。也即是,各个团队成员期望能够快速的查看和比对本团队其他成员的实验结果。具体的说,当团队成员A和B,同时训练同一种机器学习模型。当团队成员A训练的机器学习模型训练完成之后,团队成员B可以通过一个前端页面快速的查询到这一情形,则团队成员B就可以停止训练了。因此,若提供一个终端页面,使得所有的团队成员查询自己或者其他团队成员的实验结果,则可以提升团队成员的工作效率。In some cases, there is such a need for a team-based multi-machine learning model development task. Each team member expects to be able to simultaneously query or browse the evaluation indicators, model file data, code file data, etc. obtained after the training of each machine learning model is completed on a terminal page. That is, each team member expects to be able to quickly view and compare the experimental results of other members of the team. Specifically, when team members A and B train the same machine learning model at the same time. After the training of the machine learning model trained by team member A is completed, team member B can quickly query this situation through a front-end page, and then team member B can stop the training. Therefore, if a terminal page is provided to enable all team members to query the experimental results of themselves or other team members, the work efficiency of team members can be improved.
所述的索引页面是用于将所述第一类数据和第二类数据进行可视化的页面。该页面主要展示的是第一类数据和第二类数据所包含的信息。所述的访问入口信息可以表示为对应机器学习模型的第一类数据和第二类数据实际存储地址的信息。与所述访问入口信息相绑定的文本标识,该文本标识可以表示为所述第一类数据和第二类数据携带信息。The index page is a page for visualizing the first type of data and the second type of data. This page mainly displays the information contained in the first type of data and the second type of data. The access entry information may be represented as information corresponding to the actual storage address of the first type of data and the second type of data of the machine learning model. A text identifier bound to the access entry information, the text identifier may represent information carried by the first type of data and the second type of data.
具体的,例如,第一类数据包括机器学习模型的模型文件数据、机器学习模型评价指标文件数据和机器学习模型训练用数据。第二类数据包括机器学习模型的代码文件数据、模型文件的元文件数据以及机器学习模型评价指标文件数据的元文件数据以及机器学习模型训练用数据的元文件数据。则所述的访问入口信息可以表示为上述的这些数据的实际存储地址。所述的所述访问入口信息相绑定的文本标识则可以表示为上述的这些数据携带的信息。例如,机器学习模型的模型文件数据携带的信息包括机器学习模型的种类名称、模型参数等等。机器学习模型评价指标文件数据携带的信息包括具体的机器学习模型的多种评价指标数值。更具体的说,所述的索引页面表示为将所述机器学习模型的模型文件数据、机器学习模型评价指标文件数据以及机器学习模型的代码文件数据包括的数据进行可视化后得到的页面。可以将机器学习模型种类名称、机器学习模型评价指标、机器学习模型的代码版本进行可视化。具体可视化的形式,可以采用多种形式,例如,可以采用曲线图的形式、采用饼状图的形式,还可以采用表格的形式。当采用表格的形式时,表格的表头可以包括训练起始时间、训练时长、团队成员、机器学习模型名称、机器学习模型评价指标、超参数等等。Specifically, for example, the first type of data includes model file data of the machine learning model, file data of machine learning model evaluation index files, and data for machine learning model training. The second type of data includes the code file data of the machine learning model, the metadata of the model file, the metadata of the machine learning model evaluation index file data, and the metadata of the machine learning model training data. Then the access entry information may be expressed as the actual storage address of the above-mentioned data. The text identifier bound to the access entry information can be expressed as the information carried by the above data. For example, the information carried in the model file data of the machine learning model includes the type name of the machine learning model, model parameters, and so on. The information carried in the machine learning model evaluation index file data includes various evaluation index values of specific machine learning models. More specifically, the index page is represented as a page obtained by visualizing data included in the model file data of the machine learning model, the machine learning model evaluation index file data, and the code file data of the machine learning model. You can visualize the machine learning model type name, machine learning model evaluation index, and machine learning model code version. The specific visualization form can be in various forms, for example, it can be in the form of a graph, a pie chart, or a table. When a table is used, the header of the table may include training start time, training duration, team members, machine learning model name, machine learning model evaluation index, hyperparameters, and so on.
本方法考虑到了,对于一个团队化的多机器学习模型的开发任务中,各个团队成员能够在一个前端页面查询本团队其他成员的实验结果,提升了团队成员查询和比对实验结果的效率,进一步的提升了团队成员的工作效率,具有很好的实用价值。This method takes into account that, for a team-based multi-machine learning model development task, each team member can query the experimental results of other members of the team on a front-end page, which improves the efficiency of team members' query and comparison of experimental results, and further It has greatly improved the work efficiency of team members and has very good practical value.
如图6所示,本说明书的一个实施方式提供一种索引信息的显示方法,所述索引信息的显示方法可以应用于终端。所述方法包括以下步骤。As shown in FIG. 6 , an embodiment of this specification provides a method for displaying index information, and the method for displaying index information can be applied to a terminal. The method includes the following steps.
步骤S201:接收模型训练平台发送的索引信息;其中,所述索引信息是根据上述的一种索引信息的生成方法得到的。Step S201: Receive index information sent by the model training platform; wherein, the index information is obtained according to one of the above-mentioned methods for generating index information.
步骤S202:根据所述索引信息形成索引页面;所述索引页面中包括对应机器学习模型的第一类数据和第二类数据的访问入口信息或者与所述访问入口信息相绑定的文本标识。Step S202: Form an index page according to the index information; the index page includes access entry information corresponding to the first type of data and the second type of data of the machine learning model or a text identifier bound to the access entry information.
步骤S203:显示所述索引页面。Step S203: Display the index page.
在一些情况下,对于一个团队化的多机器学习模型的开发任务中,存在这样的一个需求。各个团队成员期望在一个前端的页面能够同时查询或者浏览各个机器学习模型训练完成之后,得到的评价指标、模型文件、代码文件等等。因此,可以提供一种索引信息的显示方法,应用在终端。终端接收到服务器发来的索引信息,根据该索引信息再前端页面形成一个索引页面。团队成员则可以通过该索引页面来实现快速查看和对比实验结果。In some cases, there is such a need for a team-based multi-machine learning model development task. Each team member expects to be able to simultaneously query or browse the evaluation indicators, model files, code files, etc. obtained after the training of each machine learning model is completed on a front-end page. Therefore, a method for displaying index information can be provided and applied in a terminal. The terminal receives the index information sent by the server, and forms an index page on the front-end page according to the index information. Team members can use this index page to quickly view and compare experimental results.
在一些实施方式中,可以基于HTTP协议或WebSocket2协议将索引信息从服务器中的索引信息发送至终端中。该终端中对应的前端系统可以采用VUE(用于构建用户界面的渐进式框架)框架搭建,并且前端系统可以采用MVVM(Model-View-ViewModel)架构模式。对于索引信息可以利用数据可视化图表库echarts、antv等工具进行展示。In some implementation manners, the index information may be sent from the server to the terminal based on the HTTP protocol or the WebSocket2 protocol. The corresponding front-end system in the terminal can be built using the VUE (progressive framework for building user interface) framework, and the front-end system can use the MVVM (Model-View-ViewModel) architecture pattern. For index information, tools such as echarts and antv, a data visualization chart library, can be used to display.
如图7所示,本说明书的一个实施方式还提供一种索引信息的生成装置。所述装置应用于模型训练平台,所述模型训练平台用于多个机器学习模型的训练;所述装置包括:获取单元,其用于获取机器学习模型训练产生的第一类数据和第二类数据;其中,第一类数据至少包括机器学习模型的模型文件数据,所述第二类数据至少包括机器学习模型的代码文件数据和模型文件的元文件数据;存储单元,其用于将每个机器学习模型的第一类数据均存入所述模型训练平台的模型文件数据集,第二类数据均存入所述模型训练平台的文本文件数据集;生成单元,其用于根据所述模型文件数据集和所述文本文件数据集,生成文件索引信息;其中,所述文件索引信息包括对应机器学习模型的第一类数据和第二类数据的访问入口信息。As shown in FIG. 7 , an embodiment of the present specification also provides an apparatus for generating index information. The device is applied to a model training platform, and the model training platform is used for the training of multiple machine learning models; the device includes: an acquisition unit, which is used to acquire the first type of data and the second type of data generated by machine learning model training data; wherein, the first type of data includes at least the model file data of the machine learning model, and the second type of data includes at least the code file data of the machine learning model and the metadata file data of the model file; a storage unit, which is used to store each The first type of data of the machine learning model is all stored in the model file data set of the model training platform, and the second type of data is all stored in the text file data set of the model training platform; the generation unit is used to The file data set and the text file data set generate file index information; wherein, the file index information includes access entry information of the first type of data and the second type of data corresponding to the machine learning model.
在一些实施方式中,所述生成装置还包括:请求接收单元,用于接收一个数据端口发送的执行多个机器学习模型训练任务的请求;其中,所述请求包括所述机器学习模型训练任务的配置信息;空间创建单元,用于根据所述配置信息为所述多个机器学习模型训练任务创建对应的工作空间;任务执行单元,用于在所述工作空间内执行所述机器学习模型训练任务。In some embodiments, the generating device further includes: a request receiving unit, configured to receive a request for executing multiple machine learning model training tasks sent by a data port; wherein, the request includes the machine learning model training task Configuration information; a space creation unit, configured to create corresponding workspaces for the plurality of machine learning model training tasks according to the configuration information; a task execution unit, configured to execute the machine learning model training tasks in the workspace .
在一些实施方式中,所述生成装置还包括:请求接收单元,用于接收多个数据端口发送的执行机器学习模型训练任务的请求;其中,所述请求包括执行多个机器学习模型训练任务以及所述机器学习模型训练任务的配置信息;空间创建单元,用于根据所述配置信息为所述数据端口创建对应的工作空间;任务执行单元,用于在所述工作空间内执行所述机器学习模型训练任务。In some implementations, the generating device further includes: a request receiving unit, configured to receive requests for executing machine learning model training tasks sent by multiple data ports; wherein, the request includes executing multiple machine learning model training tasks and Configuration information of the machine learning model training task; a space creation unit, configured to create a corresponding workspace for the data port according to the configuration information; a task execution unit, configured to execute the machine learning in the workspace Model training tasks.
在一些实施方式中,所述模型文件数据集中预先存储有机器学习模型训练用数据,所述训练用数据包括训练集数据,所述工作空间中预先配置有机器学习模型训练框架。任务执行单元,包括:数据获取模块,用于根据所述配置信息从所述模型文件数据集中获取机器学习模型训练用数据以及从所述文本文件数据集中获取机器学习模型的代码文件数据;执行模块,用于使用所述训练用数据、所述代码文件数据以及所述训练框架执行所述机器学习模型训练任务。In some implementations, the data set for machine learning model training is pre-stored in the model file data set, and the data for training includes training set data, and a machine learning model training framework is pre-configured in the workspace. The task execution unit includes: a data acquisition module, which is used to obtain data for machine learning model training from the model file data set and obtain code file data of the machine learning model from the text file data set according to the configuration information; execution module , configured to use the training data, the code file data, and the training framework to execute the machine learning model training task.
在一些实施方式中,所述生成装置还包括:计算单元,用于使用密码散列函数,计算所述第一类数据的散列值;散列值处理模块,用于将所述散列值存入所述元文件数据,并将所述散列值作为第一类数据的文件名称。In some embodiments, the generating device further includes: a calculation unit, configured to calculate a hash value of the first type of data using a cryptographic hash function; a hash value processing module, configured to convert the hash value Store the metadata file data, and use the hash value as the file name of the first type of data.
在一些实施方式中,所述生成装置还包括:响应模块,用于响应于终端对所述文件索引信息的访问请求,将所述文件索引信息发送至所述终端用于形成进行展示的索引页面;其中,所述索引页面中包括所述机器学习模型的第一类数据和第二类数据的访问入口信息或者与所述访问入口信息相绑定的文本标识。In some embodiments, the generating device further includes: a response module, configured to send the file index information to the terminal to form an index page for display in response to a terminal's access request for the file index information ; Wherein, the index page includes the access entry information of the first type of data and the second type of data of the machine learning model or a text identifier bound to the access entry information.
本说明书实施方式还提供一种索引信息的显示装置。所述显示装置包括:接收单元,用于接收模型训练平台发送的索引信息;其中,所述索引信息采用前述的索引信息的生成方法得到;索引形成单元,用于根据所述索引信息形成索引页面;所述索引页面中包括对应机器学习模型的第一类数据和第二类数据的访问入口信息或者与所述访问入口信息相绑定的文本标识;显示单元,用于显示所述索引页面。The embodiment of this specification also provides a display device for index information. The display device includes: a receiving unit, configured to receive index information sent by the model training platform; wherein, the index information is obtained by using the aforementioned index information generating method; an index forming unit, configured to form an index page according to the index information The index page includes access entry information corresponding to the first type of data and the second type of data of the machine learning model or a text identifier bound to the access entry information; a display unit configured to display the index page.
如图8所示,本说明书实施方式还提供一种电子设备,该电子设备可以是智能手机、平板电脑、电子书等能够运行应用程序的电子设备。本实施方式中的电子设备可以包括一个或多个如下部件:处理器、网络接口、内存、非易失性存储器以及一个或多个应用程序,其中一个或多个应用程序可以被存储在非易失性存储器中并被配置为由一个或多个处理器执行,一个或多个程序配置用于执行如前述方法实施例所描述的方法。As shown in FIG. 8 , the embodiment of this specification also provides an electronic device, which may be an electronic device capable of running application programs, such as a smart phone, a tablet computer, and an e-book. The electronic device in this embodiment may include one or more of the following components: a processor, a network interface, a memory, a non-volatile memory, and one or more application programs, wherein one or more application programs may be stored in a non-volatile The volatile memory is configured to be executed by one or more processors, and one or more programs are configured to execute the methods described in the foregoing method embodiments.
本说明书实施方式还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得,该计算机执行上述任一实施方式中的索引信息的生成方法。The embodiments of this specification also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer executes the method for generating index information in any of the above-mentioned embodiments.
本说明书实施方式还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述任一实施方式中的索引信息的生成方法。The embodiments of this specification also provide a computer program product including instructions, which, when executed by a computer, cause the computer to execute the method for generating index information in any of the above embodiments.
可以理解,本文中的具体的例子只是为了帮助本领域技术人员更好地理解本说明书实施方式,而非限制本发明的范围。It can be understood that the specific examples herein are only intended to help those skilled in the art better understand the implementation of the present specification, rather than limiting the scope of the present invention.
可以理解,在本说明书中的各种实施方式中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本说明书实施方式的实施过程构成任何限定。It can be understood that in various implementations in this specification, the serial numbers of the processes do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, rather than by the implementation methods of this specification. The implementation process constitutes any limitation.
可以理解,本说明书中描述的各种实施方式,既可以单独实施,也可以组合实施,本说明书实施方式对此并不限定。It can be understood that the various implementation manners described in this specification can be implemented alone or in combination, which is not limited by the implementation manners of this specification.
除非另有说明,本说明书实施方式所使用的所有技术和科学术语与本说明书的技术领域的技术人员通常理解的含义相同。本说明书中所使用的术语只是为了描述具体的实施方式的目的,不是旨在限制本说明书的范围。本说明书所使用的术语“和/或”包括一个或多个相关的所列项的任意的和所有的组合。在本说明书实施方式和所附权利要求书中所使用的单数形式的“一种”、“上述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。Unless otherwise stated, all technical and scientific terms used in the embodiments of this specification have the same meaning as commonly understood by those skilled in the technical field of this specification. The terms used in this specification are only for the purpose of describing specific embodiments, and are not intended to limit the scope of this specification. The term "and/or" used in this specification includes any and all combinations of one or more of the associated listed items. As used in the embodiments of this specification and the appended claims, the singular forms "a", "above" and "the" are also intended to include the plural forms unless the context clearly dictates otherwise.
可以理解,本说明书实施方式的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施方式的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(Digital SignalProcessor,DSP)、专用集成电路(Application Specific IntegratedCircuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本说明书实施方式中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本说明书实施方式所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。It can be understood that the processor in the embodiments of this specification may be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the above-mentioned method embodiment may be completed by an integrated logic circuit of hardware in a processor or an instruction in the form of software. The above-mentioned processor can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The various methods, steps and logic block diagrams disclosed in the implementation manners of this specification can be realized or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the methods disclosed in conjunction with the embodiments of this specification may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
可以理解,本说明书实施方式中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasablePROM,EPROM)、电可擦除可编程只读存储器(EEPROM)或闪存。易失性存储器可以是随机存取存储器(RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory in the embodiments of this specification may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. Among them, the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory Memory (EEPROM) or Flash. Volatile memory can be random access memory (RAM). It should be noted that the memory of the systems and methods described herein is intended to include, but not be limited to, these and any other suitable types of memory.
本领域普通技术人员可以意识到,结合本文中所公开的实施方式描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本说明书的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of this specification.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施方式中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method implementation, and will not be repeated here.
在本说明书所提供的几个实施方式中,应所述理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several implementations provided in this specification, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device implementation described above is only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施方式方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本说明书各个实施方式中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present specification may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本说明书的技术方案本质上或者说对现有技术做出贡献的部分或者所述技术方案的部分可以以软件产品的形式体现出来,所述计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书各个实施方式所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM)、随机存取存储器(RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution in this specification is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various implementation modes of this specification. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, and other media capable of storing program codes.
以上所述,仅为本说明书的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本说明书揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本说明书的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。The above is only a specific implementation of this specification, but the scope of protection of the present invention is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in this specification. It should be covered within the protection scope of this manual. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211488395.5A CN116126995A (en) | 2022-11-25 | 2022-11-25 | Index information generation method and device and computer readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211488395.5A CN116126995A (en) | 2022-11-25 | 2022-11-25 | Index information generation method and device and computer readable storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116126995A true CN116126995A (en) | 2023-05-16 |
Family
ID=86298133
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211488395.5A Pending CN116126995A (en) | 2022-11-25 | 2022-11-25 | Index information generation method and device and computer readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116126995A (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110554995A (en) * | 2019-08-13 | 2019-12-10 | 武汉中海庭数据技术有限公司 | Deep learning model management method and system |
| CN111177100A (en) * | 2020-01-02 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Training data processing method and device and storage medium |
| US20210295104A1 (en) * | 2020-03-17 | 2021-09-23 | Microsoft Technology Licensing, Llc | Storage and automated metadata extraction using machine teaching |
| CN114116684A (en) * | 2022-01-27 | 2022-03-01 | 中国传媒大学 | Docker containerization-based deep learning large model and large data set version management method |
| CN114861773A (en) * | 2022-04-18 | 2022-08-05 | 深圳市欢太科技有限公司 | Model training visualization method and device and cloud platform |
-
2022
- 2022-11-25 CN CN202211488395.5A patent/CN116126995A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110554995A (en) * | 2019-08-13 | 2019-12-10 | 武汉中海庭数据技术有限公司 | Deep learning model management method and system |
| CN111177100A (en) * | 2020-01-02 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Training data processing method and device and storage medium |
| US20210295104A1 (en) * | 2020-03-17 | 2021-09-23 | Microsoft Technology Licensing, Llc | Storage and automated metadata extraction using machine teaching |
| CN114116684A (en) * | 2022-01-27 | 2022-03-01 | 中国传媒大学 | Docker containerization-based deep learning large model and large data set version management method |
| CN114861773A (en) * | 2022-04-18 | 2022-08-05 | 深圳市欢太科技有限公司 | Model training visualization method and device and cloud platform |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220261413A1 (en) | Using specified performance attributes to configure machine learning pipepline stages for an etl job | |
| US20240220319A1 (en) | Automated visual information context and meaning comprehension system | |
| US20240250996A1 (en) | System and method for algorithm crowdsourcing, monetization, and exchange | |
| US11687527B2 (en) | System and method for analysis of graph databases using intelligent reasoning systems | |
| CN109997126B (en) | Event-driven extract, transform, load (ETL) processing | |
| US10515002B2 (en) | Utilizing artificial intelligence to test cloud applications | |
| US12147395B2 (en) | Self-correcting pipeline flows for schema drift | |
| US12143425B1 (en) | Rapid predictive analysis of very large data sets using the distributed computational graph | |
| US11250073B2 (en) | Method and apparatus for crowdsourced data gathering, extraction, and compensation | |
| US10169433B2 (en) | Systems and methods for an SQL-driven distributed operating system | |
| US10861014B2 (en) | Data monetization and exchange platform | |
| US11507858B2 (en) | Rapid predictive analysis of very large data sets using the distributed computational graph using configurable arrangement of processing components | |
| JP6816136B2 (en) | Unified interface specification for interacting with and running models in a variety of runtime environments | |
| US10248919B2 (en) | Task assignment using machine learning and information retrieval | |
| CN111966361B (en) | Method, device, equipment and storage medium for determining model to be deployed | |
| WO2021037066A1 (en) | System and method for batch and real-time feature calculation | |
| CN114546365A (en) | Flow visualization modeling method, server, computer system and medium | |
| CN117916675A (en) | Method and system for generating user-specific engineering programs in a multi-user engineering environment | |
| JP2019531539A (en) | Method and apparatus for performing distributed computing tasks | |
| US20160042097A1 (en) | System and method for concurrent multi-user analysis of design models | |
| CN110378564A (en) | Monitoring model generation method, device, terminal device and storage medium | |
| US20180181914A1 (en) | Algorithm monetization and exchange platform | |
| CN119691039A (en) | Data standardization method and device, computer equipment and storage medium | |
| US11521089B2 (en) | In-database predictive pipeline incremental engine | |
| CN116126995A (en) | Index information generation method and device and computer readable storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |