WO2025063857A1

WO2025063857A1 - Method and system for moving data in a cloud environment

Info

Publication number: WO2025063857A1
Application number: PCT/RU2023/000276
Authority: WO
Inventors: Роман Игоревич БОРИСОВ; Евгений Сергеевич ТРЕТЬЯКОВ; Евгений Александрович ЖИХАРЕВ; Владислав Викторович ЛЕВШИНСКИЙ; Павел Владимирович БУЗИН; Юлия Александровна ЛАРИОНОВА; Савелий Алексеевич ВИНОКУРОВ
Original assignee: Obshchestvo S Ogranichennoj Otvetstvennost'yu "oblachnye Tekhnologii"
Current assignee: Obshchestvo S Ogranichennoj Otvetstvennost'yu "oblachnye Tekhnologii"
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2025-03-27
Anticipated expiration: 2026-03-18

Abstract

A method for moving data in a cloud environment includes receiving a request to move data from data sources to a target source, generating a sequence of data movement operations with the aid of an API service, and sending said sequence to a task scheduler. The sequence of operations is added to a task queue of a worker node in accordance with a given priority parameter, a set of interfaces for working with a data source is determined, and the possibility of performing the task using the resources available is checked. An optimal route for movement of the data is then determined and the data are read from an initial data source and transferred to a target data source in accordance with the determined route. The technical result is an increase in the reliability of data movement.

Description

СПОСОБ И СИСТЕМА ПЕРЕМЕЩЕНИЯ ДАННЫХ В ОБЛАЧНОЙ СРЕДЕ METHOD AND SYSTEM FOR MOVING DATA IN A CLOUD ENVIRONMENT

ОБЛАСТЬ ТЕХНИКИ AREA OF TECHNOLOGY

[0001] Изобретение относится, в общем, к области хранения и обработки данных, а в частности к способу и системе перемещения данных в облачной среде,[0001] The invention relates generally to the field of data storage and processing, and in particular to a method and system for moving data in a cloud environment,

5 обеспечивающие своевременную бесперебойную поставку и перемещение больших объемов (от единиц терабайтов) структурированных и/или неструктурированных и/или разнородных (мультимодальных) данных, в том числе - для целей обучения моделей машинного обучения. 5 ensuring timely, uninterrupted delivery and movement of large volumes (from several terabytes) of structured and/or unstructured and/or heterogeneous (multimodal) data, including for the purposes of training machine learning models.

УРОВЕНЬ ТЕХНИКИ 0 [0002] Из уровня техники известно решение для управления хранением данных, раскрытое в патенте US 8156086 В2, опубл. 10.04.2012. В известном решении для управления хранения данных осуществляют: LEVEL OF TECHNOLOGY 0 [0002] A solution for managing data storage is known from the prior art, disclosed in patent US 8156086 B2, published 10.04.2012. In the known solution for managing data storage, the following is carried out:

- выполнение первой операции хранения на первом наборе данных в первом местоположении для генерации множества блоков данных, при этом указанное выполнение первой операции хранения по меньшей мере для одного из множества блоков данных дополнительно включает этапы: генерации второго набора данных, связанных с первым набором данных; создания первого набора метаданных, связанных с первым набором данных; и хранения второго набора данных и первого набора метаданных на запоминающем устройстве; - performing a first storage operation on a first set of data at a first location to generate a plurality of data blocks, wherein said performing a first storage operation for at least one of the plurality of data blocks further includes the steps of: generating a second set of data associated with the first set of data; creating a first set of metadata associated with the first set of data; and storing the second set of data and the first set of metadata on a storage device;

20 - хранение копии первого набора метаданных в индексе, удаленном от запоминающего устройства; 20 - storing a copy of the first set of metadata in an index remote from the storage device;

- генерация второго набора метаданных, связанных со вторым набором данных, хранящихся на запоминающем устройстве; - generating a second set of metadata associated with a second set of data stored on the storage device;

- сравнение первого набора метаданных в индексе со вторым набором метаданных для проверки наличия различий между первым и вторым наборами данных без доступа к первому и второму наборам данных; - comparing the first set of metadata in the index with the second set of metadata to check for differences between the first and second sets of data without access to the first and second sets of data;

- выполнение второй операции хранения второго набора данных, вторая операция хранения включает этапы: генерации третьего набора данных, связанных со вторым набором данных; генерации третьего набора метаданных, содержащего зо информацию для извлечения третьего набора данных; и хранения третьего набора данных во втором месте; и - сравнение первого набора метаданных в индексе с третьим набором метаданных, чтобы проверить, существуют ли различия между первым и третьим наборами данных. - performing a second operation of storing a second set of data, the second storage operation includes the steps of: generating a third set of data associated with the second set of data; generating a third set of metadata containing information for retrieving the third set of data; and storing the third set of data in a second location; and - compare the first set of metadata in the index with the third set of metadata to check if there are differences between the first and third sets of data.

[0003] Также известны способ и система мультиплексирования конвейерных данных для целей создания резервных копий, описанные в патенте US 7315923 В2, опубл. 01.01.2008. В известном решении выполняют получение первого потока данных, содержащего первые данные, причем первые данные получены первым агентом данных, специфичным для конкретного приложения; получение второго потока данных, содержащего вторые данные, причем вторые данные получаются вторым агентом данных для конкретного приложения; объединение первого и второго потоков данных в один поток из одного или нескольких блоков данных архивного файла, включая запись первых данных из первого потока данных и вторых данных из второго потока данных в первый блок данных одного или нескольких блоков данных; передача одного или нескольких блоков данных по транспортному каналу на резервный носитель; и хранение одного или нескольких блоков данных на носителе резервной копии. [0003] Also known are a method and system for multiplexing pipeline data for the purposes of creating backup copies, described in patent US 7315923 B2, published 01.01.2008. In the known solution, a first data stream is received, containing first data, wherein the first data is received by a first data agent specific to a particular application; a second data stream is received, containing second data, wherein the second data is received by a second data agent for a particular application; the first and second data streams are combined into a single stream from one or more data blocks of an archive file, including recording the first data from the first data stream and the second data from the second data stream into the first data block of the one or more data blocks; transmitting one or more data blocks via a transport channel to a backup medium; and storing one or more data blocks on the backup medium.

[0004] Также известны способ и система управления информацией, описанные в заявке US 2020/0319694 А1, опубл. 14.02.2023. Известная система содержит: сетевую систему хранения данных, содержащую компьютерное оборудование, сконфигурированное для: настройки параметров управления питанием для медиаагента, при этом медиа-агент является компонентом в системе управления информацией, который взаимодействует с вторичными запоминающими устройствами для выполнения операций с данными, хранящимися с использованием вторичных запоминающих устройств; идентификации с помощью модуля управления питанием задачи, которая должна быть выполнена в сетевой системе хранения данных, содержащая тип операции, в которой задача связана с медиаагентом; основываясь, по крайней мере частично, на типе операции задачи, определения состояния питания для медиаагента, связанного с задачей; и при определении того, что состояние питания медиаагента находится на указанном уровне или ниже: добавления задачи к списку задач, связанному с медиа-агентом, при этом список задач содержит задачи типа операции; определения, превышает ли список задач заданное пороговое значение; и на основе того, что модуль управления питанием определяет, что указанное пороговое значение превышено, направления указание агенту мультимедиа выполнить одну или несколько задач в списке задач. [0005] Также известны системы и способы для создания единичных экземпляров блоков данных в системе хранения данных, описанные в патенте US 8578120 В2, опубл. 05.11.2013. В данном документе упоминается вычислительная система, содержащая: одно или несколько запоминающих устройств, хранящихся на физическом носителе; один или несколько логических контейнеров, включающих несколько дедуплицированных блоков данных, соответствующих объектам данных; и одну или несколько структур данных, указывающих, имеются ли ссылки на блоки данных; одну или несколько баз данных, хранящих информацию, указывающую, имеются ли ссылки на блоки данных; и вторичное запоминающее вычислительное устройство, запрограммированное на: получение указаний на удаление первого набора блоков данных из первого логического контейнера; для каждого из блоков данных в первом наборе - определение по базам данных ссылки на блок данных; и если на блок данных нет ссылки, обновление структуры данных, чтобы указать, что на блок данных не ссылаются; определения по структурам данных, что достигнуто пороговое число непрерывных блоков данных в первом логическом контейнере, на которые нет ссылок; и предоставление доступа для хранения части одного или нескольких физических носителей, соответствующих смежным блокам данных в первом логическом контейнере, при этом структуры данных и базы данных не являются частью собственных файловых систем устройств хранения. [0004] Also known are a method and an information management system described in application US 2020/0319694 A1, published 02/14/2023. The known system comprises: a network storage system comprising computer hardware configured to: configure power management parameters for a media agent, wherein the media agent is a component in the information management system that interacts with secondary storage devices to perform operations with data stored using the secondary storage devices; identify, using a power management module, a task to be performed in the network storage system, comprising an operation type in which the task is associated with the media agent; based, at least in part, on the operation type of the task, determine a power state for the media agent associated with the task; and upon determining that the power state of the media agent is at or below a specified level: add the task to a task list associated with the media agent, wherein the task list comprises tasks of the operation type; determining whether the task list exceeds a specified threshold; and based on the power management module determining that the specified threshold is exceeded, directing the media agent to execute one or more tasks in the task list. [0005] Also known are systems and methods for creating single instances of data blocks in a data storage system, described in patent US 8578120 B2, published 05.11.2013. This document mentions a computing system comprising: one or more storage devices stored on a physical medium; one or more logical containers including several deduplicated data blocks corresponding to data objects; and one or more data structures indicating whether the data blocks are referenced; one or more databases storing information indicating whether the data blocks are referenced; and a secondary storage computing device programmed to: receive instructions for deleting a first set of data blocks from a first logical container; for each of the data blocks in the first set, determine from the databases a reference to the data block; and if the data block is not referenced, update the data structure to indicate that the data block is not referenced; determining, based on data structures, that a threshold number of contiguous data blocks in the first logical container that are not referenced has been reached; and providing access for storing a portion of one or more physical media corresponding to contiguous data blocks in the first logical container, wherein the data structures and databases are not part of the native file systems of the storage devices.

[0006] Также известны системы и способы хранения данных, описанные в патенте US 7343453 В2, опубл. 11.03.2008. В данном документе раскрывается иерархическая система хранения данных, предоставляющая методы оценки состояния хранимых данных относительно потребностей потребителей с помощью взвешенных параметров, которые могут быть определены пользователем[0006] Also known are data storage systems and methods described in patent US 7343453 B2, published 11.03.2008. This document discloses a hierarchical data storage system that provides methods for assessing the state of stored data relative to consumer needs using weighted parameters that can be defined by the user.

[0007] Развитие технологий и методов обработки больших данных на мощных вычислительных комплексах и в облачной среде дополнительно требует реализацию новых механизмов управления большими наборами данных в связи с увеличением времени передачи в силу ограниченности сетевого ресурса и пропускной способности хранилищ данных, и обеспечения готовности данных к нужному моменту времени при создании непрерывных конвейеров обработки данных. РАСКРЫТИЕ ИЗОБРЕТЕНИЯ [0007] The development of technologies and methods for processing big data on powerful computing complexes and in a cloud environment additionally requires the implementation of new mechanisms for managing large data sets due to the increase in transmission time due to the limited network resource and throughput of data storage, and ensuring the readiness of data at the required point in time when creating continuous data processing pipelines. DISCLOSURE OF INVENTION

[0008] Технической проблемой или задачей, поставленной в данном техническом решении, является создание простого и надежного способа и системы перемещения больших объемов данных в облачной среде. [0008] The technical problem or task posed in this technical solution is the creation of a simple and reliable method and system for moving large volumes of data in a cloud environment.

[0009] Техническим результатом является повышение надежности при перемещении данных из разных источников данных в соответствии с заданным временем, до которого следует выполнить упомянутое перемещение, и условиями перемещения данных. [0009] The technical result is an increase in reliability when moving data from different data sources in accordance with a specified time by which the said movement should be performed, and the conditions for moving the data.

[0010] Указанный технический результат достигается благодаря осуществлению способа перемещения данных в облачной среде, содержащий этапы, на которых: [0010] The specified technical result is achieved by implementing a method for moving data in a cloud environment, comprising the steps of:

- получают посредством API-сервиса 20 запрос на перемещение первого набора данных из первого источника данных (ИД) 30 и второго набора данных из второго ИД 31 в целевой ИД 40, содержащий по меньшей мере информацию об условиях перемещения данных; формируют посредством API-сервиса 20 запрос, содержащий последовательность операций по перемещению данных, а также параметр приоритета перемещения, определяющий очередность выполнения операции перемещения, и направляют сформированный запрос в адрес Планировщика задач 50; - a request is received via API service 20 for moving the first data set from the first data source (ID) 30 and the second data set from the second ID 31 to the target ID 40, containing at least information about the conditions for moving the data; a request is generated via API service 20, containing a sequence of operations for moving the data, as well as a parameter of the priority of moving, determining the order of execution of the moving operation, and the generated request is sent to the address of Task Scheduler 50;

- добавляют посредством Планировщика задач 50 новую задачу на перемещение данных в очередь в соответствии с упомянутым параметром приоритета; - add a new task to move data to the queue using Task Scheduler 50 in accordance with the mentioned priority parameter;

- при поступлении задачи из очереди на исполнение передают задачу в Рабочий Узел 60; - when a task is received from the execution queue, the task is transferred to Work Node 60;

- определяют посредством Рабочего Узла 60 список интерфейсов для работы с ИД 30, ИД 31 и ИД 40; - define, by means of Work Node 60, a list of interfaces for working with ID 30, ID 31 and ID 40;

- выполняют посредством Рабочего Узла 60 оценку возможности проведения задачи по перемещению данных на основе имеющихся ресурсов; - perform, via Work Node 60, an assessment of the possibility of carrying out the data movement task based on the available resources;

- определяют посредством Рабочего Узла 60 способ взаимодействия с целевым ИД 40 на основе объема данных и формата перемещаемых данных и сохраненных в упомянутых ИД; - determine by means of the Work Node 60 the method of interaction with the target ID 40 based on the volume of data and the format of the data being moved and stored in the said IDs;

- определяют посредством Рабочего Узла 60 оптимальный маршрут перемещения данных от ИД 30 и ИД 31 в ИД 40; - проверяют посредством Рабочего Узла 60 работоспособность ИД, задействованных для перемещения данных; - determine by means of Work Node 60 the optimal route for moving data from ID 30 and ID 31 to ID 40; - check the operability of the IDs used to move data using the Working Node 60;

- посредством Рабочего Узла 60 осуществляют чтение первого и второго набора данных из ИД 30 и ИД 31 и передачу упомянутых данных в адрес целевого ИД 40 в соответствии с условиями перемещения данных. - by means of the Work Node 60, the first and second sets of data are read from the ID 30 and ID 31 and the said data are transferred to the address of the target ID 40 in accordance with the conditions of data movement.

[0011] В одном частном примере осуществления способа ИД 30, ИД 31 и ИД 40 находятся в одном облаке (т.е. логическом пуле, logical pools). [0011] In one particular example of implementing the method, ID 30, ID 31 and ID 40 are in the same cloud (i.e., logical pool).

[0012] В другом частном примере осуществления способа по меньшей мере один ИД 30, ИД 31 или ИД 40 находится в другом облаке по отношению упомянутыми ИД. [0013] В другом частном примере осуществления способа передачу данных в адрес целевого ИД 40 осуществляют с учетом кода коррекции ошибок, инкремента данных или параметров параллельной передачи данных. [0012] In another particular example of implementing the method, at least one ID 30, ID 31 or ID 40 is located in another cloud relative to the said IDs. [0013] In another particular example of implementing the method, data transmission to the address of the target ID 40 is carried out taking into account the error correction code, data increment or parallel data transmission parameters.

[0014] В другом частном примере осуществления способа дополнительно выполняют этап, на котором проверяют регистрацию всех источников данных, участвующих в перемещении данных. [0014] In another particular example of implementing the method, an additional step is performed in which the registration of all data sources participating in the data movement is checked.

[0015] В другом частном примере осуществления способа запрос на перемещение данных содержит информацию о по меньше мере одном дополнительном целевом ИД, в который следует переместить первый и второй наборы данных, причем Планировщик задач 50 формирует несколько задач на перемещение данных с одинаковым параметром приоритета в зависимости от количества дополнительных ИД. [0015] In another particular example of implementing the method, the request for data movement contains information about at least one additional target ID to which the first and second sets of data should be moved, and the Task Scheduler 50 generates several tasks for data movement with the same priority parameter depending on the number of additional IDs.

[0016] В другом частном примере осуществления способа дополнительно выполняют посредством Планировщика задач 50 этапы, на которых: извлекают из запроса на перемещение данных значение времени, до истечения которого следует завершить перенос данных; извлекают из памяти данные, характеризующие время выполнения аналогичных задач на перемещение по заданному маршруту; на основе извлеченных данных определяют спрогнозированное время выполнения задачи на перемещение; осуществляет перестановку задачи на перемещение данных в очереди для ее выполнения в указанное значение времени с учетом спрогнозированного времени выполнения других задач, содержащихся в очереди, и значений их приоритетов. [0016] In another particular example of implementing the method, the following steps are additionally performed by means of the Task Scheduler 50: extracting from the data movement request a time value before the expiration of which the data movement should be completed; extracting from memory data characterizing the execution time of similar tasks for movement along a given route; determining, on the basis of the extracted data, the predicted execution time of the movement task; rearranging the data movement task in the queue for its execution at the specified time value, taking into account the predicted execution time of other tasks contained in the queue and the values of their priorities.

[0017] В другом частном примере осуществления способа для определения оптимального маршрута посредством Рабочего Узла 60 выполняют этапы, на которых; определяют маршрут от ИД 30 и ИД 31 до ИД 40, а также от ИД, в которых хранятся копии данных ИД 30 или ИД 31 , которые следует переместить, до ИД 40; выбирают оптимальный маршрут для перемещения данных с учетом ИД, в которых хранятся копии данных. [0017] In another particular example of implementing the method for determining the optimal route by means of the Work Node 60, the following steps are performed: determining the route from ID 30 and ID 31 to ID 40, as well as from the IDs in which copies of the data of ID 30 or ID 31 that should be moved are stored to ID 40; select the optimal route for moving data, taking into account the IDs in which copies of the data are stored.

[0018] В другом частном примере осуществления способа дополнительно выполняют этап проверки целостности перемещенных данных. [0018] In another particular example of implementing the method, an additional step of checking the integrity of the transferred data is performed.

[0019] В другом частном примере осуществления способа дополнительно выполняют посредством Рабочего Узла 60 этап синхронизации завершения процесса перемещения данных из ИД 30 и 31 , в том числе с учетом инкремента. [0019] In another particular example of implementing the method, the Work Node 60 additionally performs the step of synchronizing the completion of the process of moving data from ID 30 and 31, including taking into account the increment.

[0020] В другом частном примере осуществления способа дополнительно выполняют посредством Рабочего Узла 60 управление версиями перемещаемых данных. [0020] In another particular example of implementing the method, the Work Node 60 additionally performs version control of the moved data.

[0021] В другом частном примере осуществления способа дополнительно фиксируют метрики, связанные с операцией перемещения данных и использованием ресурсов. [0021] In another particular example of implementing the method, metrics associated with the data movement operation and resource usage are additionally recorded.

[0022] В другом частном примере осуществления способа выполнение операции транспортировки данных осуществляется в соответствии с заданным расписанием. [0023] В другом частном примере осуществления способа первый или второй наборы данных содержат структурированные и/или неструктурированные данные и/или разнородные (мультимодальные) данных. [0022] In another particular example of implementing the method, the data transport operation is performed in accordance with a specified schedule. [0023] In another particular example of implementing the method, the first or second data sets contain structured and/or unstructured data and/or heterogeneous (multimodal) data.

[0024] В другом предпочтительном варианте осуществления заявленного решения представлена система перемещения данных в облачной среде, содержащая по меньшей мере три ИД, API-сервис, Планировщик задач, по меньшей мере один Рабочий Узел, причем система выполнена с возможность осуществлять вышеуказанный способ. [0024] In another preferred embodiment of the claimed solution, a data movement system in a cloud environment is presented, comprising at least three IDs, an API service, a Task Scheduler, at least one Worker Node, and the system is configured to implement the above-mentioned method.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙ BRIEF DESCRIPTION OF DRAWINGS

[0025] Признаки и преимущества настоящего технического решения станут очевидными из приводимого ниже подробного описания изобретения и прилагаемых чертежей, на которых: [0025] The features and advantages of the present technical solution will become apparent from the following detailed description of the invention and the accompanying drawings, in which:

- на Фиг. 1 представлена общая схема регистрации источников данных; - Fig. 1 shows the general scheme of registration of data sources;

- на Фиг. 2 представлена общая схема системы перемещения данных; - Fig. 2 shows a general diagram of the data movement system;

- на Фиг. 3 и 4 представлена общая схема обработки данных; - Fig. 3 and 4 show the general scheme of data processing;

- на Фиг. 5 представлен пример общего вида вычислительного устройства. ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯ - Fig. 5 shows an example of the general appearance of a computing device. IMPLEMENTATION OF THE INVENTION

[0026] Ниже будут описаны понятия и термины, необходимые для понимания данного технического решения. [0026] Below, the concepts and terms necessary for understanding this technical solution will be described.

[0027] В данном техническом решении под системой подразумевается, в том числе компьютерная система (в частности, информационная система, вычислительные комплексы, вычислительные кластеры), ЭВМ (электронно- вычислительная машина), ЧПУ (числовое программное управление), ПЛК (программируемый логический контроллер), компьютеризированные системы управления, специализированные устройства, реализующие вычисления по технологиям FPGA и ASIC, и любые другие устройства, способные выполнять заданную, четко определенную последовательность операций (действий, инструкций). [0027] In this technical solution, the term “system” means, among other things, a computer system (in particular, an information system, computing complexes, computing clusters), a computer (electronic computer), a CNC (computer numerical control), a PLC (programmable logic controller), computerized control systems, specialized devices implementing calculations using FPGA and ASIC technologies, and any other devices capable of performing a given, clearly defined sequence of operations (actions, instructions).

[0028] Под устройством обработки команд подразумевается электронный блок, вычислительное устройство, либо интегральная схема (микропроцессор), исполняющая машинные инструкции (программы). [0028] A command processing unit is an electronic unit, computing device, or integrated circuit (microprocessor) that executes machine instructions (programs).

[0029] Устройство обработки команд считывает и выполняет машинные инструкции (программы) с одного или более устройств хранения данных. В роли устройства хранения данных могут выступать, но не ограничиваясь, жесткие диски (HDD), флеш-память, ПЗУ (постоянное запоминающее устройство), твердотельные накопители (SSD), оптические приводы. [0029] The command processing unit reads and executes machine instructions (programs) from one or more data storage devices. The data storage device may include, but is not limited to, hard disk drives (HDD), flash memory, ROM (read-only memory), solid state drives (SSD), and optical drives.

[0030] Программа - последовательность инструкций, предназначенных для исполнения устройством управления вычислительной машины или устройством обработки команд. [0030] A program is a sequence of instructions intended for execution by a computer control unit or command processing device.

[0031] База данных (БД) - совокупность данных, организованных в соответствии с концептуальной структурой, описывающей характеристики этих данных и взаимоотношения между ними, причем такое собрание данных, которое поддерживает одну или более областей применения (ISO/IEC 2382:2015, 2121423 «database»). [0031] A database (DB) is a collection of data organized according to a conceptual structure that describes the characteristics of the data and the relationships between them, and such a collection of data supports one or more application areas (ISO/IEC 2382:2015, 2121423 “database”).

[0032] Сигнал — материальное воплощение сообщения для использования при передаче, переработке и хранении информации. [0032] A signal is a material embodiment of a message for use in transmitting, processing and storing information.

[0033] Логический элемент — элемент, осуществляющий определенные логические зависимости между входными и выходными сигналами. Логические элементы обычно используются для построения логических схем вычислительных машин, дискретных схем автоматического контроля и управления. Для всех видов логических элементов, независимо от их физической природы, характерны дискретные значения входных и выходных сигналов. [0033] A logical element is an element that implements certain logical dependencies between input and output signals. Logical elements are usually used to construct logical circuits of computers, discrete circuits of automatic control and management. For all types Logical elements, regardless of their physical nature, are characterized by discrete values of input and output signals.

[0034] Автоматизированная система (АС) - организационно-техническая система, обеспечивающая выработку решений на основе автоматизации информационных процессов. [0034] Automated system (AS) is an organizational and technical system that ensures the development of decisions based on the automation of information processes.

[0035] В соответствии со схемой, приведенной на фиг. 1 , на первом этапе осуществляют регистрацию источника 30 данных в системе перемещения данных в облачной среде. Операция регистрации источника 30 данных выполняется заранее до начала перемещения данных как в отношении источника перемещаемых данных, так и в отношении объекта хранения, т.е. целевого источника данных, в котором данные будут сохранены после перемещения. [0035] In accordance with the diagram shown in Fig. 1, in the first stage, the data source 30 is registered in the data movement system in the cloud environment. The operation of registering the data source 30 is performed in advance before the start of the data movement both with respect to the source of the data being moved and with respect to the storage object, i.e. the target data source in which the data will be stored after the movement.

[0036] Для регистрации источника 30 данных пользователь посредством устройства 10 пользователя заполняет форму запроса к API-сервису 20 с использованием графического интерфейса 11 пользователя, либо в режиме командной строки (UI пользователя). Устройство 10 пользователя может представлять собой портативный или стационарный компьютер, телефон, смартфон, планшет или прочее вычислительное устройство, оснащенное проводными и/или беспроводными средствами связи. [0036] To register the data source 30, the user, via the user device 10, fills out a request form to the API service 20 using the user graphical interface 11, or in the command line mode (user UI). The user device 10 may be a portable or stationary computer, telephone, smartphone, tablet or other computing device equipped with wired and/or wireless communication means.

[0037] Форма запроса содержит идентификатор хранилища (имя либо идентификатор), тип источника данных («холодное хранилище», «горячее хранилище»), тип файловой системы (HDFS, S3, NFS и др.), авторизационную информацию (токен, пара «логин-пароль»). Заполненная форма может быть сохранена в модуле UI пользователя 11 для целей повторного ввода, а также для целей автоматизированной обработки данных. [0037] The request form contains the storage identifier (name or identifier), the data source type ("cold storage", "hot storage"), the file system type (HDFS, S3, NFS, etc.), authorization information (token, login-password pair). The completed form can be saved in the user UI module 11 for the purposes of repeated entry, as well as for the purposes of automated data processing.

[0038] Дополнительно пользователь указывает параметры доступа к источнику 30 данных - тип доступа (постоянный, сессионный или по расписанию), ключи шифрования для организации защищенного соединения, скорость передачи данных, протокол передачи данных (передача с подтверждением доставки либо без подтверждения, многоканальная передача), режиме передачи данных (синхронная или асинхронная передача). [0038] Additionally, the user specifies the parameters for access to the data source 30 - the type of access (permanent, session or scheduled), encryption keys for organizing a secure connection, data transfer rate, data transfer protocol (transfer with or without confirmation of delivery, multi-channel transfer), data transfer mode (synchronous or asynchronous transfer).

[0039] Соответственно, заполненная форма запроса направляется устройством 10 в API-сервис 20, который при получении запроса формирует запрос к источнику данных 30 для подтверждения его доступности и направляет сформированный запрос в адрес источника данных. Для приема и обработки данных API-сервис 20 может быть реализован на базе по меньшей мере одного сервера, оснащенного программными и логическими элементами для выполнения приписанных ему функций. API сервис предоставляет интерфейсы для регистрации источников данных и инициализации операции транспортировки данных, а также интерфейсы к релевантным метрикам операций, выполненных Системой, ведет журнал событий, и передает команды на выполнение заданий от Пользователя к Планировщику 50 и Рабочему Узлу 60, а также отчеты и уведомления о статусе запланированных и проведенных операций. [0039] Accordingly, the completed request form is sent by the device 10 to the API service 20, which, upon receiving the request, generates a request to the data source 30 to confirm its availability and sends the generated request to the address of the data source. For receiving and processing data, the API service 20 can be implemented on the basis of at least one server equipped with software and logical elements for performing the functions assigned to it. The API service provides interfaces for registering data sources and initializing the data transport operation, as well as interfaces to relevant metrics of operations performed by the System, maintains an event log, and transmits commands for performing tasks from the User to the Scheduler 50 and the Worker Node 60, as well as reports and notifications on the status of planned and performed operations.

[0040] При получении запроса на подтверждение доступности источник данных 30 запускает процедуру проверки, включающую проверку доступности источника (проверка доступности по сети / сетевой связанности), проверку совместимости (проверка форматов и протоколов передачи данных), авторизацию пользователя на источнике данных 30, проверку наличия разрешения на доступ к данным, хранимых в источнике данных, с уточнением набора прав доступа (только чтение данных, только запись данных, чтение и запись данных, отсутствие разрешений). API-сервис 20 ожидает поступление ответа от источника данных в течение периода ожидания, заранее заданного разработчиком в настройках API-сервиса 20. Например, период ожидания может составлять величину от доли секунды до десятков секунд. [0040] Upon receipt of a request for availability confirmation, the data source 30 starts a verification procedure, including a source availability check (network availability/network connectivity check), compatibility check (data transfer format and protocol check), user authorization on the data source 30, checking for permission to access the data stored in the data source, specifying a set of access rights (data read only, data write only, data read and write, no permissions). The API service 20 waits for a response from the data source within a waiting period pre-set by the developer in the settings of the API service 20. For example, the waiting period may be from a fraction of a second to tens of seconds.

[0041] Источник данных 30 возвращает в API-сервис 20 ответ с результатами проверки, включая подтверждение доступности по сети, подтверждение совместимости форматов и протоколов, подтверждение авторизации пользователя, подтверждение разрешения на доступ к данным. API-сервис 20 анализирует содержание ответа с учетом времени ожидания. [0041] Data source 30 returns a response to API service 20 with the results of the check, including confirmation of network availability, confirmation of format and protocol compatibility, confirmation of user authorization, confirmation of permission to access data. API service 20 analyzes the content of the response taking into account the waiting time.

[0042] В случае, если API-сервис 20 по истечении заданного периода ожидания не получил ответ от источника данных с подтверждением доступности источника данных по сети, API-сервис считает источник данных недоступным и направляет в устройство 10 пользователя сообщение о недоступности источника данных для отображения его пользователю посредством UI 11. [0042] If the API service 20, after a specified waiting period, has not received a response from the data source confirming the availability of the data source over the network, the API service considers the data source unavailable and sends a message to the user device 10 about the unavailability of the data source for displaying it to the user via the UI 11.

[0043] В случае, если API-сервис 20 в течение заданного периода ожидания получает ответ с подтверждением несовместимости либо по истечении периода ожидания не получает ответ от источника данных 30 с подтверждением совместимости форматов и/или протоколов передачи данных, API-сервис 20 считает источник данных 30 недоступным и направляет в устройство 10 пользователя сообщение об ошибке совместимости при установлении соединения для отображения его пользователю посредством UI 11. [0044] В случае, если API-сервис 20 в течение заданного периода ожидания получает ответ от источника данных 30, содержащий информацию об отказе в авторизации пользователя, API-сервис 20 направляет в устройство 10 пользователя сообщение об отказе в авторизации (запет доступа) для отображения его пользователю посредством UI 11. [0043] If the API service 20 receives a response confirming incompatibility within a specified waiting period or does not receive a response from the data source 30 confirming compatibility of formats and/or data transfer protocols after the waiting period has expired, the API service 20 considers the data source 30 unavailable and sends a compatibility error message to the user device 10 when establishing a connection to display it to the user via the UI 11. [0044] If the API service 20 receives a response from the data source 30 within the specified waiting period containing information about the user's authorization refusal, the API service 20 sends a message about the authorization refusal (access refusal) to the user's device 10 for displaying it to the user via the UI 11.

[0045] В случае, если API-сервис 20 по истечении заданного периода ожидания не получил ответ от источника данных 30 с подтверждением авторизации пользователя на источнике данных, API-сервис 20 считает источник данных 30 недоступным и направляет в устройство 10 пользователя сообщение о недоступности источника данных 30 с указанием статуса «источник данных не отвечает» для отображения его пользователю посредством U1 11. [0045] If the API service 20, after a specified waiting period, has not received a response from the data source 30 confirming the user's authorization on the data source, the API service 20 considers the data source 30 unavailable and sends a message to the user device 10 about the unavailability of the data source 30 indicating the status "data source is not responding" for display to the user via U1 11.

[0046] В случае, если API-сервис 20 в течение заданного периода ожидания получает ответ от источника данных 30, содержащий информацию об отказе в доступе к данным, хранимым на источнике, API-сервис 20 направляет в устройство 10 пользователя сообщение об отказе в доступе к данным для отображения его пользователю посредством UI 11. [0046] If the API service 20 receives a response from the data source 30 within a specified waiting period containing information about a denial of access to the data stored on the source, the API service 20 sends a message about a denial of access to the data to the user device 10 for displaying it to the user via the UI 11.

[0047] В случае, если API-сервис 20 по истечении заданного периода ожидания не получил ответ с подтверждением набора прав доступа к данным, хранимым на источнике, API-сервис 20 направляет в устройство 10 пользователя сообщение об отказе в доступе к данным для отображения его пользователю посредством UI 11. [0048] В случае успешного получения всех запрашиваемых подтверждений от источника данных 30 API-сервис 20 производит регистрацию источника данных 30, после чего API-сервис 20 направляет в устройство 10 пользователя подтверждение регистрации источника данных для отображения его пользователю посредством UI 11. [0047] If the API service 20 has not received a response confirming the set of access rights to the data stored on the source after a specified waiting period, the API service 20 sends a message to the user device 10 about the denial of access to the data for displaying it to the user via the UI 11. [0048] If all the requested confirmations are successfully received from the data source 30, the API service 20 registers the data source 30, after which the API service 20 sends a confirmation of the registration of the data source to the user device 10 for displaying it to the user via the UI 11.

[0049] После того, как необходимое число источников данных 30 зарегистрировано, может быть запущен процесс перемещения/транспортировки данных в облачной среде с использованием зарегистрированных источников, причем может быть выполнена единоразовая транспортировка данных между зарегистрированными источниками или транспортировка данных по расписанию между зарегистрированными источниками. [0049] After the required number of data sources 30 are registered, the process of moving/transporting data in the cloud environment using the registered sources can be started, and a one-time data transport between the registered sources or a scheduled data transport between the registered sources can be performed.

[0050] Единоразовая транспортировка является разовой операцией, при которой пользователь не планирует повторение этой операции без изменения параметров и не автоматизирует процесс повторного выполнения. Единоразовая транспортировка является, как правило, основной операцией при разработке и отладке алгоритмов обработки данных, а также при выполнении задач исследования данных. Далее процесс перемещения данных будет описан со ссылками на Фиг. 2, 3 и 4. [0050] A one-time transport is a one-time operation in which the user does not plan to repeat the operation without changing the parameters and does not automate the process of repeat execution. A one-time transport is usually the main operation in the development and debugging data processing algorithms, as well as when performing data mining tasks. The data movement process will be described below with reference to Figs. 2, 3 and 4.

[0051] На первом этапе (100) пользователь посредством устройства 10 формирует запрос на объединение/перемещение первого набора данных из первого источника данных (ИД) 30 и второго набора данных из второго ИД (31) в целевом ИД 40, причем ИД 30, ИД 31 или ИД 40 могут находиться как в одном облаке (т.е. логическом пуле, logical pools) с другими упомянутыми ИД, так и в разных облаках. В альтернативном варианте реализации представленного решения пользователь может задать несколько целевых ИД 40 для параллельного перемещения первого и второго наборов данных в несколько ИД 40. [0051] At the first stage (100), the user, by means of the device 10, forms a request for merging/moving the first data set from the first data source (DS) 30 and the second data set from the second DS (31) to the target DS 40, wherein DS 30, DS 31 or DS 40 may be located both in the same cloud (i.e., logical pool) with the other mentioned DS, and in different clouds. In an alternative embodiment of the presented solution, the user may specify several target DS 40 for parallel movement of the first and second data sets to several DS 40.

[0052] Для формирования запроса пользователь заполняет форму запроса к API- сервису 20 с использованием UI 11 пользователя. В упомянутой форме запроса пользователем могут быть указаны: идентификаторы ИД 30, ИД 31 и ИД 40 (например, имя ИД, IP-адрес или пр.), тип транспортировки (единоразовая или по расписанию), авторизационную информацию (токены, пары «логин-пароль»), адреса и идентификаторы объектов, содержащих перемещаемых данные, и перечень перемещаемых данных, адреса и идентификаторы объектов, в которые будут перемещены данные, параметры приоритета выполнения операции. Заполненная форма может быть сохранена в модуле U1 11 пользователя для целей транспортировки по расписанию, а также для целей автоматизированной обработки данных. Также указывается информация об условии перемещения данных (с удалением данных из источника после перемещения либо сохранения копий данных с указанием версии набора данных, либо без этого). [0052] To generate a request, the user fills out a request form to the API service 20 using the user UI 11. In said request form, the user may specify: identifiers ID 30, ID 31 and ID 40 (e.g., ID name, IP address, etc.), type of transportation (one-time or scheduled), authorization information (tokens, login-password pairs), addresses and identifiers of objects containing the moved data, and a list of the moved data, addresses and identifiers of objects to which the data will be moved, parameters of the priority of the operation. The filled out form may be saved in the user module U1 11 for the purposes of scheduled transportation, as well as for the purposes of automated data processing. Information on the condition of the data movement is also specified (with the deletion of data from the source after the movement or the saving of copies of the data with an indication of the version of the data set, or without this).

[0053] Форма запрос и запрос также могут содержать поля для ввода и передачи таких параметров как необходимость применение кода коррекции ошибок, передача инкремента (добавления) данных, применяемого алгоритма проверки целостности данных, условий в случае выполнения задачи Планировщиком при выполнении этих условий, информацию об источниках метаданных и инструкции по перемещению метаданных, параметрах параллельной передачи данных (перемещение данных с консолидацией или без консолидации из нескольких источников либо перемещение в адрес несколько целевых источников - получателей данных). [0053] The request form and the request may also contain fields for entering and transmitting such parameters as the need to use an error correction code, the transmission of an increment (addition) of data, the applied algorithm for checking the integrity of the data, conditions in the event of the execution of the task by the Scheduler when these conditions are met, information about the sources of metadata and instructions for the movement of metadata, parameters for parallel data transfer (movement of data with or without consolidation from several sources or movement to several target sources - data recipients).

[0054] Например, пользователь может направить запрос на объединение по меньшей мере одной фотографий, сохраненной в ИД 30, и метаданных, например, характеризующих свойства файла, а именно - имя файла, дата создания, размер файла, хэш сумма и т.д., сохраненных в ИД 31 в ИД 40. Соответственно, упомянутый запрос на объединение данных будет дополнительно содержать: [0054] For example, the user may request to combine at least one photograph stored in ID 30 and metadata, such as characterizing the properties of the file, namely - file name, creation date, file size, hash sum, etc., stored in ID 31 in ID 40. Accordingly, the mentioned data merging request will additionally contain:

- ID пользователя; - User ID;

- данные для авторизации пользователя; - user authorization data;

- инструкции по объединению данных: разово или по расписанию; - instructions for merging data: one-time or on a schedule;

- параметр приоритета перемещения, определяющего очередность выполнения операции перемещения; - the parameter of the move priority, which determines the order of execution of the move operation;

- по меньшей мере одно условие, выбираемое из: необходимость применения кода коррекции ошибок, например, для загрузки первого или второго набора данных аз альтернативного ИД, в котором хранится копия данных; передача инкремента (добавления) данных, например, указывающего автора первого набора данных; - at least one condition selected from: the need to apply an error correction code, for example, to load the first or second data set from an alternative ID in which a copy of the data is stored; the transmission of an increment (addition) of data, for example, indicating the author of the first data set;

ID применяемого алгоритма проверки целостности данных; условие для выполнения переноса данных, например, которое может указывать на то, что перенос данных в ИД 40 будет выполнено только при наличии свободного места для хранения данных или указывающее значение времени, до истечения которого следует завершить перенос данных; информацию об источниках метаданных и инструкции по перемещению метаданных, например, метаданных, характеризующих свойства изображения, такие как фамилии тех, кто изображен на фото или тэги с указанием местоположения, где была сделана фотография; параметрах параллельной передачи данных, указывающих на перемещение данных с консолидацией или без консолидации из нескольких источников либо перемещение в адрес несколько целевых источников - получателей данных. [0055] После получения упомянутого запроса (101 ) API-сервис 20 анализирует инструкции по объединению данных, и идентифицирует его, например, как запрос на единоразовую транспортировку данных между зарегистрированными ИД 30, ИД 31 и ИД 40, после чего проверяет регистрацию всех источников данных, участвующих в перемещении данных. Для проверки регистрации API-сервис 20 обращается к памяти, которой он оснащен, в которой содержится информация о зарегистрированных ИД. Если API-сервис 20 определил, что по меньшей мере один ИД не зарегистрирован, то на устройство 10 пользователя направляется уведомление о наличии незарегистрированного ИД. [0056] Если все ИД зарегистрированы, то API-сервис 20 формирует запрос, содержащий необходимую последовательность операций по перемещению данных, в частности команду на перемещение первого набора данных из ИД 30 и второго набора данных из ИД 31 в ИД 40, также параметр приоритета перемещения, определяющего очередность выполнения операции перемещения, и направляет сформированный запрос в адрес Планировщика задач 50, который может быть реализован на базе по меньшей мере одного вычислительного устройства, оснащенного логическими элементами для выполнения приписанным ему функций. Дополнительно в Планировщик 50 может быть направлена и другая информация, содержащаяся в запросе на перемещение данных. Планировщик задач 50 выполняет обработку очереди задач транспортировки, опрашивает систему о наличии подходящих событий, которые повлекут инициализацию искомых операций, ведет журнал событий. The ID of the applied data integrity verification algorithm; a condition for performing the data transfer, for example, which may indicate that the data transfer to ID 40 will be performed only if there is free space for storing the data or indicating a time value before the expiration of which the data transfer should be completed; information about the metadata sources and instructions for transferring the metadata, for example, metadata characterizing the properties of an image, such as the last names of those depicted in the photo or tags indicating the location where the photo was taken; parameters for parallel data transfer indicating the transfer of data with or without consolidation from several sources or the transfer to several target sources - data recipients. [0055] After receiving the mentioned request (101), the API service 20 analyzes the instructions for combining the data and identifies it, for example, as a request for a one-time data transfer between the registered IDs 30, ID 31 and ID 40, after which it checks the registration of all data sources participating in the data transfer. To check the registration, the API service 20 accesses the memory with which it is equipped, which contains information about the registered IDs. If the API service 20 has determined that at least one ID is not registered, then a notification about the presence of an unregistered ID is sent to the user device 10. [0056] If all IDs are registered, then the API service 20 generates a request containing the necessary sequence of data movement operations, in particular a command to move the first data set from ID 30 and the second data set from ID 31 to ID 40, as well as a parameter of the movement priority determining the order of execution of the movement operation, and sends the generated request to the Task Scheduler 50, which can be implemented on the basis of at least one computing device equipped with logical elements for performing the functions assigned to it. In addition, other information contained in the data movement request can be sent to the Scheduler 50. The Task Scheduler 50 processes the queue of transport tasks, polls the system for the presence of suitable events that will entail the initialization of the sought operations, maintains an event log.

[0057] При поступлении упомянутого запроса на перемещение Планировщик системы 50 в соответствии с параметром приоритета добавляет новую задачу на перемещение (транспортировку) данных в очередь и высылает подтверждение о постановке в очередь в адрес API-сервиса 20. При этом Планировщик 50 также может включить в состав подтверждения информацию о прогнозном сроке начала выполнения задачи, месте задачи в очереди, приоритете задачи (в случае если Планировщик 50 имеет несколько очередей различного приоритета), и другую информацию. [0057] Upon receipt of the mentioned request for movement, the Scheduler of the system 50, in accordance with the priority parameter, adds a new task for movement (transportation) of data to the queue and sends a confirmation of the placement in the queue to the address of the API service 20. In this case, the Scheduler 50 may also include in the confirmation information about the predicted start date of the task execution, the place of the task in the queue, the priority of the task (if the Scheduler 50 has several queues of different priorities), and other information.

[0058] В случае, если запрос на перемещение данных содержит несколько целевых ИД 40, то Планировщика задач 50 формирует несколько задач на перемещение данных с одинаковым параметром приоритета в зависимости от количества ИД 40 и добавляет все задачи в очередь, причем каждый процесс транспортировки данных в ИД 40 определяется как отдельный самостоятельный процесс на этапе выполнения, а все процессы при параллельной транспортировке данных создаются с одинаковым приоритетом. [0058] In case the data movement request contains several target IDs 40, then the Task Scheduler 50 generates several data movement tasks with the same priority parameter depending on the number of IDs 40 and adds all the tasks to the queue, wherein each data transport process in ID 40 is defined as a separate independent process at the execution stage, and all processes during parallel data transport are created with the same priority.

[0059] При получении подтверждения от Планировщика 50 о постановке задачи в очередь API-сервис 20 направляет в адрес устройства 10 пользователя подтверждение о постановке задачи в очередь для отображения пользователю посредством UI 11. [0059] Upon receiving confirmation from Scheduler 50 about placing the task in the queue, API service 20 sends confirmation about placing the task in the queue to the address of user device 10 for display to the user via UI 11.

[0060] Планировщик 50 системы обрабатывает очередь задач в соответствии с заранее заданными алгоритмами приоритезации и параметрами приоритета конкретной задачи, включая, но не ограничиваясь методами приоритезации: без дополнительной приоритезации (метод FIFO, первый пришел-первый вышел), с приоритезацией в виде выделения приоритетных очередей (каждая очередь имеет свой приоритет, всегда сначала выполняются задания из очереди с большим приоритетом), приоритезации пропорционального выделения ресурсов (в этом случае боле приоритетная очередь получает больше ресурсов для выполнения задачи, например, для случая трех очередей из очереди высшего приоритета выполняются четыре задания, далее выполняются два задания из очереди среднего приоритета, далее одна задача из очереди низшего приоритета, далее цикл повторяется), и другими методами и алгоритмами. [0060] The system scheduler 50 processes the task queue in accordance with predetermined prioritization algorithms and task-specific priority parameters, including but not limited to prioritization methods: without additional prioritization (FIFO method, first in, first out), with prioritization in the form of allocation of priority queues (each queue has its own priority, tasks from the queue with a higher priority are always executed first), prioritization of proportional resource allocation (in this case, a higher priority queue receives more resources to complete a task, for example, in the case of three queues, four tasks are executed from the highest priority queue, then two tasks from the medium priority queue are executed, then one task from the lowest priority queue, then the cycle repeats), and other methods and algorithms.

[0061] При необходимости выполнения задачи по условию Планировщик 50 приостанавливает обработку конкретной задачи до момента выполнения условия, после чего задача выполняется с высоким либо высшим приоритетом. [0061] When a task needs to be executed based on a condition, Scheduler 50 suspends processing of the specific task until the condition is met, after which the task is executed with high or highest priority.

[0062] Дополнительно Планировщик 50 может быть выполнен с возможностью перестановки задачи на перемещение в очереди. Например, Планировщик 50 может быть оснащен памятью, в которой содержатся данные, описывающие время выполнения аналогичных задач на перемещение, которые могут быть введены разработчиком планировщика или сохранены планировщиком в качестве исторических данных при выполнении аналогичных задач на перемещение данных, с учетом маршрута передачи данных. [0062] Additionally, the scheduler 50 may be configured to rearrange the moving task in the queue. For example, the scheduler 50 may be equipped with a memory that contains data describing the execution time of similar moving tasks, which may be entered by the developer of the scheduler or stored by the scheduler as historical data when executing similar data moving tasks, taking into account the data transfer route.

[0063] Соответственно, Планировщик 50 извлекает из запроса на перемещение данных значение времени, до истечения которого следует завершить перенос данных, извлекает из памяти данные, характеризующие время выполнения аналогичных задач на перемещение по заданному маршруту и на их основе определяет спрогнозированное время выполнения задачи на перемещение, после чего с учетом спрогнозированного времени выполнения других задач, содержащихся в очереди, и значений их приоритетов, осуществляет перестановку задачи на перемещение в очереди для ее выполнения в указанное значение времени. Упомянутая задача может быть переставлена как вперед по очереди для ее выполнения в заданное значение времени, так и назад для того, чтобы другие задачи были выполнены в указанное для них значение времени. [0063] Accordingly, the Scheduler 50 extracts from the data movement request a time value before which the data movement should be completed, extracts from the memory data characterizing the execution time of similar tasks for movement along the specified route and, on their basis, determines the predicted execution time of the movement task, after which, taking into account the predicted execution time of other tasks contained in the queue and the values of their priorities, it rearranges the movement task in the queue for its execution at the specified time value. The mentioned task can be rearranged both forward in the queue for its execution at the specified time value, and backward so that other tasks are executed at the time value specified for them.

[0064] При поступлении задачи из очереди на исполнение Планировщик 50 системы передает задачу к исполнению (102) в Рабочий Узел 60, который при поступлении задачи на выполнение выполняет следующий набор действий - этапов. [0065] На этапе 103 Рабочий Узел 60 выполняет определение интерфейсов для работы с ИД 30, ИД 31 и ИД 40, после чего осуществляет подключение к упомянутым ИД. Определение упомянутых интерфейсов и установление соединения может осуществляться известными методами автоматически либо на основе ранее сделанных настроек сетевой и вычислительной среды с учетом протоколов передачи данных, реализованных на различных уровнях сетевой модели OSI (https://ru.wikipedia.orq/wiki/CeTeBan модель OSI), включая Прикладной уровень (например, при учете и применении условий и триггеров), уровень Представления (например, при применении шифрования данных при передаче между источниками данных), Сеансовый уровень (например, для установления сеансов связи при передаче по расписанию). При этом ИД могут быть подключены к сети передачи на Транспортном, Сетевом и Канальном уровнях. [0064] When a task is received from the execution queue, the Scheduler 50 of the system transfers the task for execution (102) to the Worker Node 60, which, when the task is received for execution, performs the following set of actions - stages. [0065] At step 103, the Worker Node 60 determines interfaces for working with the ID 30, ID 31 and ID 40, and then connects to said ID. The determination of said interfaces and the establishment of a connection can be carried out by known methods automatically or on the basis of previously made settings of the network and computing environment, taking into account the data transfer protocols implemented at various levels of the OSI network model (https://ru.wikipedia.orq/wiki/CeTeBan OSI model), including the Application layer (for example, when taking into account and applying conditions and triggers), the Presentation layer (for example, when applying data encryption when transmitting between data sources), the Session layer (for example, to establish communication sessions when transmitting according to a schedule). In this case, the IDs can be connected to the transmission network at the Transport, Network and Data Link layers.

[0066] После того, как Рабочий Узел 60 подключился к упомянутым ИД, Рабочий Узел 60 проводит предварительную оценку возможности проведения задачи по перемещению данных на основе имеющихся ресурсов. Например, Рабочий Узел 60 может определить итоговый объем данных, сохраненных в ИД 30 и ИД 31 , предназначенных для перемещения, и определить наличие свободного места для хранения данных в ИД 40. Если Рабочим Узлом определено, что имеющихся ресурсов недостаточно для выполнения упомянутой задачи, то Рабочий Узел 60 направляет соответствующее уведомление на устройство 20 пользователя через API-сервис 20. Если упомянутых ресурсов достаточно, то Рабочий Узел 60 переходит к следующему этапу (105). [0066] After the Worker Node 60 has connected to the said IDs, the Worker Node 60 makes a preliminary assessment of the possibility of performing the data movement task based on the available resources. For example, the Worker Node 60 can determine the total amount of data stored in the IDs 30 and 31 intended for movement, and determine the availability of free space for storing data in the ID 40. If the Worker Node determines that the available resources are insufficient to perform the said task, then the Worker Node 60 sends a corresponding notification to the user device 20 via the API service 20. If the said resources are sufficient, then the Worker Node 60 proceeds to the next step (105).

[0067] На следующем этапе (105) Рабочий Узел 60 определяет способ взаимодействия с целевым ИД 40 на основе объема данных и формата перемещаемых данных и сохраненных в упомянутых ИД. Например, Рабочий Узел 60 может определить, что файлы в целевом ИД 40 хранятся в формате jpeg или занимают одинаковый объем данных, и на основе данной информации в качестве способа взаимодействия с ИД 40 определить, что перенос данных следует выполнить только определенного формата, т.е. jpeg, а объем данных не должен превышать объема данных файлов, сохраненных в ИД 40, если только разработчиком не установлены разрешения на сохранения в ИД 40 данных любого формата и объема. Указанный формат файлов приведен в качестве примера, при этом описанным в настоящей заявке способом могут быть обработаны датасеты структурированных и неструктурированных файлов, содержащих звуковую и голосовую информацию в форматах vaw, MP3 и других, видео информацию в форматах avi, MP4 и других, а также представления файлов в форматах plain text, csv, табличных форматах представления, в кодированном виде, например, используя представление Base64, бинарное представление, представление в форматах JSON и xml и в других форматах и представлениях. [0067] In the next step (105), the Worker Node 60 determines the method of interaction with the target ID 40 based on the volume of data and the format of the data being moved and stored in said IDs. For example, the Worker Node 60 can determine that the files in the target ID 40 are stored in jpeg format or occupy the same amount of data, and based on this information, as a method of interaction with the ID 40, determine that the data transfer should be performed only in a certain format, i.e. jpeg, and the data volume should not exceed the data volume of the files stored in the ID 40, unless the developer has set permissions to store data of any format and volume in the ID 40. The specified file format is given as an example, while the method described in this application can process datasets of structured and unstructured files containing audio and voice information in vaw, MP3 and other formats, video information in avi, MP4 and other formats, as well as file representations in plain text, csv, tabular representation formats, in encoded form, for example, using Base64 representation, binary representation, representation in JSON and xml formats and in other formats and representations.

[0068] Соответственно, если разрешения на сохранения в ИД 40 данных любого формата и объема не установлены и по меньшей мере один файл из перемещаемых файлов в ИД 30 или ИД 31 сохранен в формате, отличном от jpeg (например, в формате png), или его объем в несколько раз превышает объем файлов, сохраненных в ИД 40, то в соответствии с определенным ранее способом взаимодействия с ИД 40 такой файл не будет перенесен, а пользователю устройства 10 будет направлено соответствующее уведомление. [0068] Accordingly, if permissions for storing data of any format and volume in ID 40 are not set and at least one file from the files being moved in ID 30 or ID 31 is saved in a format other than jpeg (for example, in png format), or its volume is several times greater than the volume of files saved in ID 40, then in accordance with the previously determined method of interaction with ID 40, such a file will not be transferred, and a corresponding notification will be sent to the user of device 10.

[0069] После того, как способ взаимодействия с целевым ИД 40 определен, Рабочий Узел 60 переходит к этапу 106 построения оптимального маршрута перемещения данных от ИД 30 и ИД 31 в ИД 40 в сети передачи данных. В качестве сети передачи данных могут выступать: локальная вычислительной сети; разделенная на сегменты и домены сеть Интранет; виртуальная вычислительная среда в облачной вычислительной инфраструктуре; виртуальная вычислительная среда, осуществляющая передачу данных по выделенным каналам через сеть Интернет либо через другие сети; либо комбинации упомянутых сетей. [0069] After the method of interaction with the target ID 40 is determined, the Worker Node 60 proceeds to the step 106 of constructing an optimal route for moving data from the ID 30 and the ID 31 to the ID 40 in the data transmission network. The following may act as a data transmission network: a local area network; an Intranet network divided into segments and domains; a virtual computing environment in a cloud computing infrastructure; a virtual computing environment that transmits data via dedicated channels through the Internet or through other networks; or combinations of the mentioned networks.

[0070] Оптимальный маршрут может быть определен широкого известными методами, причем для определения оптимального маршрута Рабочий Узел 60 определяет маршрут от ИД 30 и ИД 31 до ИД 40, а также от ИД, в которых хранятся копии данных ИД 30 или ИД 31 , которые следует переместить, до ИД 40, после чего выбирается оптимальный маршрут для перемещения данных, например, на основе анализа Рабочим узлом 60 времени задержи при передачи данных по маршруту от ИД 30 и ИД 31 до ИД 40 и от ИД, в которых хранятся копии данных, до ИД 40. Информация ИД с копиями данных, сохраненных в ИД 30 и ИД 31 , может быть заранее сохранена в памяти Рабочего Узла 60 или представлена пользователем в запросе на перемещение данных. [0070] The optimal route can be determined by widely known methods, wherein to determine the optimal route, the Worker Node 60 determines the route from the ID 30 and the ID 31 to the ID 40, as well as from the IDs in which copies of the data of the ID 30 or the ID 31 that should be moved are stored to the ID 40, after which the optimal route for moving the data is selected, for example, based on an analysis by the Worker Node 60 of the delay time in transmitting data along the route from the ID 30 and the ID 31 to the ID 40 and from the IDs in which copies of the data are stored to the ID 40. The ID information with copies of the data stored in the ID 30 and the ID 31 can be stored in advance in the memory of the Worker Node 60 or provided by the user in a request for moving the data.

[0071] После того, как оптимальный маршрут передачи данных определен, Рабочий Узел 60 направляет соответствующий сигнал для проверки работоспособности ИД, задействованных для перемещения данных, и подтверждения их работоспособности и готовности передать и принять данные соответственно, после чего, в зависимости от технологии проверки целостности данных, назначает первому и второму набору данных, предназначенных для перемещения, параметры целостности данных. В качестве параметра целостности может быть назначена контрольная сумма или хеш-сумма, определенные Рабочим узлом 60 известными методами для первого и второго наборов данных, либо в качестве параметра целостности может быть назначен размер данных. [0071] After the optimal data transmission route is determined, the Worker Node 60 sends a corresponding signal to check the operability of the IDs involved in the data movement and confirm their operability and readiness to transmit and receive data, respectively, and then, depending on the data integrity verification technology, assigns the first and second sets of data intended for movements, data integrity parameters. The integrity parameter may be a checksum or a hash sum determined by the Worker node 60 using known methods for the first and second data sets, or the data size may be designated as the integrity parameter.

[0072] Далее Рабочий Узел 60 переходит к этапу перемещения данных 107 в соответствии с условиями перемещения данных, в частности, Рабочий Узел 60 производит чтение данных из ИД 30 и ИД 31 или из ИД, в которых хранятся копии данных согласно определенному ранее маршруту, и передачу (транспортировку) данных в адрес целевого ИД 40, а также с учетом кода коррекции ошибок, инкремента (добавления) данных и параметров параллельной передачи данных. Например, в соответствии с условиями перемещения данных сохранные в ИД 40 первый и второй наборы данных могут дополниться указанными метаданными, либо если перемещается множество фотографий, то в соответствии с условиями перемещения данных метаданные могут быть сохранены как для каждой фотографии, так и для заданного количества фотографий. [0072] Then, the Working Node 60 proceeds to the data movement step 107 in accordance with the data movement conditions, in particular, the Working Node 60 reads data from the ID 30 and the ID 31 or from the IDs in which copies of the data are stored according to a previously determined route, and transmits (transports) the data to the address of the target ID 40, and also taking into account the error correction code, the data increment (addition) and the parameters of parallel data transmission. For example, in accordance with the data movement conditions, the first and second sets of data stored in the ID 40 can be supplemented with the specified metadata, or if a plurality of photographs are moved, then in accordance with the data movement conditions, the metadata can be stored both for each photograph and for a specified number of photographs.

[0073] В процессе транспортировки Рабочий Узел 60 собирает и анализирует информацию о целостности передаваемых данных 108 посредством сравнения параметров целостности первого и второго набора данных и параметров целостности этих же данных, перенесенных в целевой ИД 40, определенных описанным выше способом после их переноса. В альтернативном варианте реализации представленного решение проверка целостности данных может быть проверена посредством кодирования первого и второго наборов данных, полученных от ИД, переноса их в целевой ИД 41 с последующим декодированием. Если полученные после декодирования данные соответствуют данным из первого или второго набора данных, например, соответствует их размер, то Рабочий Узел 60 принимает решение об успешном завершении переноса данных. [0073] During the transportation, the Worker Node 60 collects and analyzes information about the integrity of the transmitted data 108 by comparing the integrity parameters of the first and second sets of data and the integrity parameters of the same data transferred to the target ID 40, determined in the manner described above after their transfer. In an alternative embodiment of the presented solution, the data integrity check can be checked by encoding the first and second sets of data received from the ID, transferring them to the target ID 41 and then decoding. If the data received after decoding corresponds to the data from the first or second set of data, for example, their size corresponds, then the Worker Node 60 makes a decision on the successful completion of the data transfer.

[0074] Соответственно, аналогичное решение об успешном завершении переноса данных Рабочий Узел 60 принимает в случае, если упомянутые параметры целостности данных совпадают. Подтверждение об успешной транспортировке соответствующего набора и/или фрагмента данных направляется Рабочим Узлом 60 в Планировщик 50, причем в случае отсутствия подтверждения успешной транспортировки Планировщик 50 направляет команду в Рабочий Узел 60 на повторную передачу до момента подтверждения успеха. Также команда на повторную передачу данных набора и/или фрагмента данных направляется Планировщиком 50 в случае проблем и сбоев в процессе передачи. [0075] Также уведомления и отчеты о проведенных операциях и их статусе направляются через API-сервис 20 в устройство 10 пользователя, при необходимости обогащенные уведомлениями с параметрами выполняемой задачи. [0076] Дополнительно Рабочий Узел 60 выполнен с возможностью синхронизации завершения процесса перемещения данных из ИД 30 и 31 с учетом инкремента. Процесс синхронизации заключается, в частности, в том, чтобы к моменту завершения передачи и фотографий, и метаданных все пары были собраны, а каждой фотографии сопоставлены свои метаданные, каждому набору метаданных сопоставлена своя фотография. Соответственно, Рабочий Узел 60 проверяет, были перенесены в ИД 40 первый и второй наборы данных и инкремент, прежде чем перейти к следующей задаче или следующему этапу задачи. [0074] Accordingly, the Worker Node 60 makes a similar decision on the successful completion of the data transfer if the said data integrity parameters match. The confirmation of the successful transportation of the corresponding set and/or fragment of data is sent by the Worker Node 60 to the Scheduler 50, and in the absence of confirmation of successful transportation, the Scheduler 50 sends a command to the Worker Node 60 for retransmission until success is confirmed. Also, a command for retransmission of the data of the set and/or fragment of data is sent by the Scheduler 50 in the event of problems and failures during the transfer process. [0075] Also, notifications and reports on the operations performed and their status are sent via the API service 20 to the user device 10, if necessary enriched with notifications with the parameters of the task being performed. [0076] Additionally, the Worker Node 60 is configured to synchronize the completion of the process of moving data from the IDs 30 and 31, taking into account the increment. The synchronization process consists, in particular, in the fact that by the time the transfer of both photographs and metadata is completed, all pairs are collected, and each photograph is associated with its own metadata, each set of metadata is associated with its own photograph. Accordingly, the Worker Node 60 checks whether the first and second sets of data and the increment have been transferred to the ID 40 before moving on to the next task or the next stage of the task.

[0077] Также дополнительно перед перемещением первого или второго набор данных Рабочий Узел 60 проверяет, содержит ли ИД 40 аналогичный набор данных. Если ИД 40 не содержит аналогичный набор данных, то Рабочий Узел 60 после перемещения первого или второго набора данных назначает им первую версию. Если ИД 40 уже содержит аналогичный набор данных, то перемещаемому первому или второму набору данных назначается следующая версия данных в зависимости от версии аналогичного набора данных. [0077] Also additionally, before moving the first or second set of data, the Worker Node 60 checks whether the ID 40 contains a similar set of data. If the ID 40 does not contain a similar set of data, then the Worker Node 60 assigns the first version to them after moving the first or second set of data. If the ID 40 already contains a similar set of data, then the first or second set of data being moved is assigned the next version of the data depending on the version of the similar set of data.

[0078] После завершения всех операций поставленных задач Планировщик 50, выполняющий мониторинг перемещения данных Рабочим Узлом 60, фиксирует метрики 110, связанные с самой операцией и использованием ресурсов, в частности, время перемещения файлов первого и второго набора данных, количество занимаемой ими памяти, количество переданных файлов, подтверждение выполнения операции в заданное время, значение задержки при передаче данных и т.д. С помощью API-Сервиса 20 пользователю устройство 10, а также другим модулям и системам может быть предоставлен доступ к релевантным метрикам для анализа и подготовки отчетов. [0078] After all operations of the assigned tasks are completed, the Scheduler 50, which monitors the movement of data by the Worker Node 60, records metrics 110 associated with the operation itself and the use of resources, in particular, the time of movement of the files of the first and second data sets, the amount of memory occupied by them, the number of files transferred, confirmation of the execution of the operation at a specified time, the value of the delay in the transfer of data, etc. Using the API Service 20, the user of the device 10, as well as other modules and systems, can be provided with access to relevant metrics for analysis and preparation of reports.

[0079] В случае если это предусмотрено планом транспортировки Планировщик задач 50 при получении подтверждения направляет в адрес API- сервиса 20 подтверждение успешной транспортировки набора и/или фрагмента данных, на основании которого API-сервис 20 выдает в адрес ИД команду на удаление соответствующих данных для исключения хранения избыточных копий данных. [0079] If this is provided for by the transportation plan, Task Scheduler 50, upon receiving confirmation, sends to the address of API service 20 a confirmation of the successful transportation of a set and/or fragment of data, on the basis of which API service 20 issues a command to the address of ID to delete the corresponding data in order to prevent the storage of redundant copies of data.

[0080] Транспортировка данных по расписанию является операцией, повторяющейся без изменения параметров, с возможностью автоматизации процесса повторного выполнения. Транспортировка данных по расписанию является, как правило, основной операцией при промышленной эксплуатации информационных систем и вычислительных комплексов обработки данных, включая системы, работающие в режиме реального времени. Если пользователь при формировании запроса перемещение данных указал в качестве инструкции по объединению данных - перемещение данных по расписанию, то система обработки данных работает следующим образом. [0080] Scheduled data transport is an operation that is repeated without changing parameters, with the possibility of automating the process of repeated execution. Scheduled data transport is, as a rule, the main operation in the industrial operation of information systems and computing complexes for data processing, including systems operating in real time. If the user, when forming a data movement request, specified the data movement according to a schedule as an instruction for combining data, then the data processing system operates as follows.

[0081] На первом этапе (100) пользователь описанным ранее способом посредством устройства 10 формирует запрос на объединение первого набора данных из первого источника данных (ИД) 30 и второго набора данных из второго ИД (31) в целевом ИД 40. При этом указанные в запросе правила приоритета соблюдения условий включают, но не ограничиваются, проверку готовности данных для транспортировки, внешнего по отношению к системе разрешения на транспортировку (бизнес-логика), завершение задач транспортировки до конкретного момента, заданного пользователем, начало выполнения задачи по расписанию и другие параметры. Конкретные приоритеты для перечисленных правил и условия их применения могут изменяться в зависимости от условий применения системы, составляющей предмет изобретения. [0081] At the first stage (100), the user, in the manner described above, by means of the device 10 forms a request for combining the first set of data from the first data source (ID) 30 and the second set of data from the second ID (31) in the target ID 40. In this case, the rules for the priority of compliance with the conditions specified in the request include, but are not limited to, checking the readiness of data for transportation, permission for transportation external to the system (business logic), completion of transportation tasks before a specific moment specified by the user, the beginning of the execution of a task according to a schedule, and other parameters. Specific priorities for the listed rules and the conditions for their application may change depending on the conditions of application of the system that is the subject of the invention.

[0082] Далее описанным ранее способом 101 запрос на перемещение данных направляется в API-сервис 20. При получении запроса API-сервис 20 анализирует полученный запрос, идентифицирует его как запрос на транспортировку данных по расписанию, проверяет регистрацию всех источников данных, участвующих в перемещении данных, формирует запрос, содержащий необходимую последовательность операций по перемещению данных и направляет сформированный запрос в адрес Планировщика задач 50. [0082] Next, in the manner described previously 101, a request for data movement is sent to the API service 20. Upon receiving the request, the API service 20 analyzes the received request, identifies it as a request for scheduled data transport, checks the registration of all data sources participating in the data movement, generates a request containing the necessary sequence of operations for data movement, and sends the generated request to the Task Scheduler 50.

[0083] При первом поступлении запроса на перемещение по расписанию Планировщик 50 системы добавляет новую задачу транспортировки в начало очереди, получает подтверждение готовности данных к транспортировке от источника данных и высылает подтверждение о постановке в очередь в адрес API- сервиса 20. При этом Планировщик 50 также может включить в состав подтверждения информацию о прогнозном сроке начала выполнения задачи, месте задачи в очереди, приоритете задачи (в случае если Планировщик 50 имеет несколько очередей различного приоритета), и другую информацию. [0083] When a request for movement according to the schedule is first received, the Scheduler 50 of the system adds a new transport task to the beginning of the queue, receives confirmation of the readiness of the data for transport from the data source and sends a confirmation of the placement in the queue to the address of the API service 20. In this case, the Scheduler 50 may also include in the confirmation information about the expected start date of the task, the place of the task in the queue, the priority of the task (if the Scheduler 50 has several queues of different priorities), and other information.

[0084] После этого Планировщик 50 осуществляет расчет времени на выполнение операции транспортировки данных, заданной расписанием, как отношение объема данных к скорости передачи данных, замеряемой на основе предыдущих транспортировок данных по конкретному маршруту. В случае если прогнозное время завершения транспортировки наступает после определенного пользователем момента, повышает приоритет задачи и перемещает задачу вперед в очереди на выполнение. [0084] The Scheduler 50 then calculates the time to complete the data transport operation specified by the schedule as the ratio of the data volume to the data transfer rate measured based on previous data transports on a specific route. If the predicted completion time of the transport is after a user-defined time, increases the priority of the task and moves the task forward in the execution queue.

[0085] Планировщик 50 осуществляет сбор, анализ и хранение необходимых метрик, описывающих процесс транспортировки данных, в том числе информацию о скорости передачи данных по конкретным маршрутам по результатам измерений при выполнении предыдущих задач транспортировки данных. [0085] The scheduler 50 collects, analyzes, and stores the necessary metrics describing the data transport process, including information about the data transfer rate along specific routes based on the results of measurements during the execution of previous data transport tasks.

[0086] Когда очередь для выполнения задания подошла, Планировщик 50 проверяет подтверждение готовности данных к транспортировке от источника данных, сравнивает текущее время и время начала выполнения задачи по расписанию, а также прогнозное время завершения выполнения задачи и время завершения, заданное пользователем. Если момент запланированного начала выполнения задания не наступил, и задача транспортировки будет завершена в требуемое время, Планировщик 50 ставит задачу на второе место в очереди, повторяет вычисление прогнозного времени завершения выполнения задачи транспортировки, сравнивает с временем завершения, заданным Пользователем 50, и в случае положительного ответа о завершении задач вовремя, принимает в работу следующую по приоритету задачу из очереди, для которой не задан момент выполнения. [0086] When the queue for executing the task has arrived, the Scheduler 50 checks the confirmation of the readiness of the data for transportation from the data source, compares the current time and the time of the start of the task according to the schedule, as well as the estimated time of completion of the task and the completion time specified by the user. If the time of the scheduled start of the task has not arrived, and the transportation task will be completed at the required time, the Scheduler 50 puts the task in second place in the queue, repeats the calculation of the estimated time of completion of the transportation task, compares it with the completion time specified by the User 50, and in the case of a positive response about the completion of the tasks on time, accepts the next priority task from the queue for which the execution time is not specified.

[0087] В случае если прогнозное время завершения задачи транспортировки, переставленное на второе место в очереди, наступит после момента времени завершения транспортировки, заданного Пользователем, задача транспортировки по расписанию возвращается на первое место в очереди, и Планировщик 50 приступает к ее выполнению. [0087] If the predicted completion time of the transportation task, moved to the second place in the queue, comes after the transportation completion time specified by the User, the scheduled transportation task returns to the first place in the queue, and the Scheduler 50 begins to execute it.

[0088] Если очереди есть две или более задачи на транспортировку данных, задачи сначала переставляются в очереди в порядке наступления момента начала выполнения и с учетом прогнозного времени завершения. Если момент начала выполнения для каждой из этих задач не наступил, и обе задачи по прогнозу будут завершены до момента, заданного Пользователем, Планировщик 50 принимает в работу задачу с приоритетом №3 либо ниже по аналогии с алгоритмом, описанным ранее способом. [0088] If there are two or more data transport tasks in the queue, the tasks are first rearranged in the queue in the order of the start time of execution and taking into account the estimated completion time. If the start time of execution for each of these tasks has not yet occurred, and both tasks are estimated to be completed before the time specified by the User, the Scheduler 50 accepts the task with priority #3 or lower in analogy with the algorithm described earlier in the manner.

[0089] В случае если прогнозные моменты завершения выполнения двух или более задач транспортировки требуют изменения порядка выполнения в соответствии с условиями приоритезации, описанными ранее, Планировщик 50 осуществляет соответствующее изменение в очереди задач [0089] If the predicted completion times of two or more transportation tasks require a change in the order of execution in in accordance with the prioritization conditions described earlier, Scheduler 50 makes the appropriate change to the task queue

[0090] Одновременно с первым поступлением запроса на перемещение по расписанию Планировщик 50 системы добавляет новую задачу транспортировки в расписание на транспортировку в будущих периодах, резервируя место в очереди. [0091] Так как объем данных, предназначенных к транспортировке, может изменяться при повторном выполнении операций, Планировщик 50 осуществляет вычисление прогнозного времени начала и завершения операции заблаговременно до наступления момента выполнения в соответствии с расписанием, а также проверяет доступность источников данных и их готовность к транспортировке. Период времени для заблаговременного прогноза задается при установке системы и корректируется на фиксированную либо переменную величину в соответствии с успешным либо неуспешным по времени завершением предыдущей задачи транспортировки по соответствующему маршруту. Для своевременного выполнения задач транспортировки в системе также может быть задан пороговый интервал для резерва времени на компенсацию ошибки прогнозирования. [0090] Simultaneously with the first receipt of a request for movement according to the schedule, the System Scheduler 50 adds a new transportation task to the transportation schedule in future periods, reserving a place in the queue. [0091] Since the volume of data to be transported may change when operations are repeatedly executed, the Scheduler 50 calculates the forecast time of the start and end of the operation in advance of the execution time according to the schedule, and also checks the availability of data sources and their readiness for transportation. The time period for the advance forecast is set during the installation of the system and is adjusted by a fixed or variable value in accordance with the successful or unsuccessful completion of the previous transportation task along the corresponding route. In order to execute transportation tasks on time, a threshold interval for a time reserve for compensation of the forecast error can also be set in the system.

[0092] В случае неготовости источников данных к транспортировке (недоступность, отсутствие авторизации на источнике, отсутствие прав доступа и т.д.) Планировщик 50 направляет соответствующее уведомление в адрес API- сервиса 20, который в свою очередь пересылает соответствующее предупреждение (Алерт) о риске невыполнения транспортировке в UI 11 Пользователя для принятия необходимый действий для восстановления доступности источников. [0092] If the data sources are not ready for transportation (unavailability, lack of authorization on the source, lack of access rights, etc.), the Scheduler 50 sends a corresponding notification to the API service 20, which in turn forwards a corresponding warning (Alert) about the risk of failure to perform transportation to the User UI 11 to take the necessary actions to restore the availability of the sources.

[0093] В случае готовности источников информация о прогнозируемых моментах времени начала и завершения транспортировки и статусе готовности источников сохраняется в журналах событий Планировщика 50. [0093] When sources are ready, information about the predicted start and end times of transportation and the readiness status of the sources is stored in the event logs of Scheduler 50.

[0094] После выполнения всех условий по приоритетам и условиям выполнения транспортировки Планировщик системы передает задачу к исполнению в Рабочий Узел 60. Перемещение данных рабочим узлом выполняется описанным ранее способом в соответствии с этапами 102 - 109. [0094] After all the conditions for priorities and conditions for performing the transportation are met, the System Scheduler transfers the task for execution to the Work Node 60. The movement of data by the work node is performed in the manner described earlier in accordance with steps 102 - 109.

[0095] В процессе транспортировки Планировщик 50 собирает и анализирует информацию о целостности передаваемых данных, получает подтверждение об успешной транспортировке соответствующего набора и/или фрагмента данных, а в случае отсутствия подтверждения успешной транспортировки контролирует и выполняет повторную передачу до момента подтверждения успеха. [0096] API-сервис 20 направляет уведомления и отчеты о проведенных операциях и их статусе Пользователю, при необходимости обогащая уведомления параметрами выполняемой задачи. В случае проблем и сбоев в процессе передачи Планировщик 60 осуществляет повторную передачу соответствующего набора и/или фрагмента данных. [0095] During the transportation process, the Scheduler 50 collects and analyzes information about the integrity of the transmitted data, receives confirmation of the successful transportation of the corresponding set and/or fragment of data, and in the absence of confirmation of successful transportation, monitors and performs retransmission until success is confirmed. [0096] API service 20 sends notifications and reports on the operations performed and their status to the User, if necessary enriching the notifications with the parameters of the task being performed. In the event of problems and failures during the transmission process, Scheduler 60 retransmits the corresponding set and/or fragment of data.

[0097] После завершения операции поставленных задач Система фиксирует метрики, связанные с самой операцией и использованных ресурсов. Пользователю и другим модулям, и системам предоставляется доступ к релевантным метрикам для анализа и подготовки отчетов. [0097] After the operation of the assigned tasks is completed, the System records metrics related to the operation itself and the resources used. The User and other modules and systems are provided access to relevant metrics for analysis and reporting.

[0098] Планировщик 60 опрашивает систему о наличии подходящих событий, которые повлекут инициализацию искомых операций. [0098] Scheduler 60 polls the system for the presence of suitable events that will cause the initialization of the desired operations.

[0099] Пользователь имеет возможность произвести остановку детерминированного процесса (запланированного либо уже выполняемого) через пользовательский интерфейс. При том через UI 11 пользователя в адрес API - сервиса 20 направляется соответствующий запрос на остановку операций, включая идентификаторы операций, условия проведения (период времени, в который эти операции должны быть выполнены) и другие необходимые параметры. При получении запроса на остановку API-сервис 20 запрашивает подтверждение операции остановки, и после получения подтверждения Пользователя направляет соответствующую команду Планировщику и Рабочему узлу. [0099] The user has the ability to stop a deterministic process (planned or already running) via the user interface. In this case, via the user UI 11, a corresponding request to stop operations is sent to the API service 20, including the identifiers of the operations, the conditions of execution (the period of time in which these operations must be performed) and other necessary parameters. Upon receiving a request to stop, the API service 20 requests confirmation of the stop operation, and after receiving the User's confirmation, sends a corresponding command to the Scheduler and the Worker Node.

[0100] При прерывании операции должны быть выполнены необходимые действия для сохранения данных, включая приостановку выполнения либо отмену операций удаления дублируемых данных. [0100] When an operation is interrupted, necessary actions must be taken to preserve the data, including pausing or canceling the operations to remove duplicate data.

[0101] В случае параллельной передачи данных от источника данных к целевому источнику система рассматривает каждый процесс транспортировки как отдельный самостоятельный процесс на этапе выполнения. Все процессы при параллельной транспортировке данных должны создаваться с одинаковым приоритетом. [0101] In the case of parallel data transfer from a data source to a target source, the system treats each transport process as a separate independent process during execution. All processes in parallel data transport must be created with the same priority.

[0102] Удаление дублированных данных осуществляется по завершении процессов параллельного переноса части данных либо всего набора перемещаемых данных. [0102] Duplicate data is deleted upon completion of parallel transfer processes of a portion of the data or the entire set of transferred data.

[0103] При прогнозировании времени выполнения параллельных транспортировок данных принимается для построения прогнозов длительность выполнения процесса с максимальным временем выполнения. [0103] When predicting the execution time of parallel data transports, the duration of execution of the process with the maximum execution time is taken into account for constructing forecasts.

[0104] В случае если Планировщик 50 и Рабочий узел 60 фиксируют большое количество ошибок при передаче, либо не получают подтверждение целостности передаваемых данных, Планировщик прерывает процесс передачи и направляет уведомление Пользователю через API-сервис 20 для принятия необходимых действий, с одновременным формированием отчета об ошибке. [0104] If Scheduler 50 and Worker Node 60 detect a large number of transmission errors or do not receive integrity confirmation transmitted data, the Scheduler interrupts the transmission process and sends a notification to the User via API service 20 to take the necessary actions, while simultaneously generating an error report.

[0105] Представленное решение обеспечивает гарантированное своевременное контролируемое безопасное перемещение больших и сверхбольших объемов структурированных и неструктурированных данных и потоков данных в облачных средах и в локальных средах, построенных с использование облачных технологий, включая виртуализацию, масштабирование, ограничение доступа, коллективную работу сданными, потоковую и онлайн обработку данных, непрерывную и пакетную обработку данных, консолидацию трансформацию данных, разметку данных, сбор метаданных, синхронную и асинхронную обработку потоков данных, резервирование, поставку данных к нужному моменту либо по готовности, удаление избыточных экземпляров и копий данных и другие технологии и методы [0105] The presented solution provides guaranteed timely controlled secure movement of large and super-large volumes of structured and unstructured data and data streams in cloud environments and in local environments built using cloud technologies, including virtualization, scaling, access restriction, data collaboration, streaming and online data processing, continuous and batch data processing, data consolidation and transformation, data tagging, metadata collection, synchronous and asynchronous processing of data streams, backup, data delivery at the right time or when ready, removal of redundant copies and copies of data and other technologies and methods

[0106] В общем виде (см. фиг. 5) вычислительное устройство (200) содержит объединенные общей шиной информационного обмена один или несколько процессоров (201), включая вычислительные для ускорения вычислений - модули GPU (Графический процессор, Англ, graphics processing unit), TPU (Тензорный процессор, англ. Tensor Processing Unit), IPU (Интегрированное процессорное устройство, англ. Integrated Processing Unit), а также модулей на основе микропроцессоров FPGA и ASIC, средства памяти, такие как ОЗУ (202) и ПЗУ (203), интерфейсы ввода/вывода (204), устройства (средства) ввода/вывода (205), и устройство (средство) для сетевого взаимодействия (206). [0106] In general (see Fig. 5), the computing device (200) contains one or more processors (201) connected by a common information exchange bus, including computing modules for accelerating computations - GPU (graphics processing unit), TPU (tensor processing unit), IPU (integrated processing unit), as well as modules based on FPGA and ASIC microprocessors, memory means such as RAM (202) and ROM (203), input/output interfaces (204), input/output devices (means) (205), and a device (means) for network interaction (206).

[0107] Процессор (201) (или несколько процессоров, многоядерный процессор и т.п.) может быть реализован на базе архитектуры CISC, RISC, ARM и др., и выбираться из ассортимента устройств, широко применяемых в настоящее время, например, таких производителей, как: Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™, Huawei Kunpeng™ и Nvidia™, IBM™, Байкал™, IVA™ и т.п. Под процессором или одним из используемых процессоров в устройстве (200) также необходимо учитывать ускоритель вычислений, реализованный, например, в виде графического процессора, например, GPU NVIDIA с программной моделью, совместимой с CUDA, Graphcore, TPU (Тензорный процессор, англ. Tensor Processing Unit), IPU (Интегрированное процессорное устройство, англ. Integrated Processing Unit), а также модулей на основе микропроцессоров FPGA и ASIC, тип которых также является пригодным для полного или частичного выполнения способа, а также может применяться для обучения и применения моделей машинного обучения в различных информационных системах. [0107] The processor (201) (or several processors, a multi-core processor, etc.) can be implemented on the basis of CISC, RISC, ARM, etc. architecture, and selected from a range of devices that are widely used at present, for example, from such manufacturers as: Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™, Huawei Kunpeng™ and Nvidia™, IBM™, Baikal™, IVA™, etc. The processor or one of the processors used in the device (200) must also include a computing accelerator, implemented, for example, in the form of a graphics processor, such as an NVIDIA GPU with a software model compatible with CUDA, Graphcore, TPU (Tensor Processing Unit), IPU (Integrated Processing Unit), as well as modules based on FPGA and ASIC microprocessors, the type of which is also suitable for the full or partial implementation of the method, and can also be used for training and application of machine learning models in various information systems.

[0108] ОЗУ (202) представляет собой оперативную память и предназначено для хранения исполняемых процессором (201) машиночитаемых инструкций для выполнения необходимых операций по логической обработке данных. ОЗУ (202), как правило, содержит исполняемые инструкции операционной системы и соответствующих программных компонент (приложения, программные модули и т.п ). При этом, в качестве ОЗУ (202) может выступать доступный объем памяти графической карты, графического процессора или других ускорителей вычислений. [0109] ПЗУ (203) представляет собой одно или более устройств постоянного хранения данных, например, жесткий диск (HDD), твердотельный накопитель данных (SSD), флэш-память (EEPROM, NAND и т.п.), оптические носители информации (CD-R/RW, DVD-R/RW, BlueRay Disc, MD) и др. [0108] RAM (202) is a random access memory and is intended for storing machine-readable instructions executable by the processor (201) for performing the necessary operations for logical data processing. RAM (202), as a rule, contains executable instructions of the operating system and the corresponding software components (applications, software modules, etc.). In this case, the available memory capacity of the graphics card, graphics processor or other computing accelerators can act as RAM (202). [0109] ROM (203) is one or more permanent data storage devices, for example, a hard disk (HDD), a solid-state drive (SSD), flash memory (EEPROM, NAND, etc.), optical storage media (CD-R/RW, DVD-R/RW, BlueRay Disc, MD), etc.

[0110] Для организации работы компонентов устройства (200) и организации работы внешних подключаемых устройств применяются различные виды интерфейсов В/В (204). Выбор соответствующих интерфейсов зависит от конкретного исполнения вычислительного устройства, которые могут представлять собой, не ограничиваясь: PCI, AGP, PS/2, IrDa, FireWire, COM, SATA, IDE, USB (2.0, 3.0, 3.1 , micro, mini, type C), HDMI, DVI, VGA, RJ45, одномодовые и многомодовые оптические интерфейсы SC и ST, WiFi, NVLink, InfiniBand, STM, ATM и т.п. [0110] To organize the operation of the components of the device (200) and to organize the operation of external connected devices, various types of I/O interfaces (204) are used. The selection of the corresponding interfaces depends on the specific design of the computing device, which may be, without limitation: PCI, AGP, PS/2, IrDa, FireWire, COM, SATA, IDE, USB (2.0, 3.0, 3.1, micro, mini, type C), HDMI, DVI, VGA, RJ45, single-mode and multi-mode optical interfaces SC and ST, WiFi, NVLink, InfiniBand, STM, ATM, etc.

[0111] Для обеспечения взаимодействия пользователя с вычислительным устройством (200) применяются различные средства (205) В/В информации, например, клавиатура, дисплей (монитор), сенсорный дисплей, тач-пад, джойстик, манипулятор мышь, световое перо, стилус, сенсорная панель, трекбол, динамики, микрофон, средства дополненной реальности, оптические сенсоры, планшет, световые индикаторы, проектор, камера, средства биометрической идентификации (сканер сетчатки глаза, сканер отпечатков пальцев, модуль распознавания голоса) и т.п. [0111] To ensure user interaction with the computing device (200), various I/O information means (205) are used, for example, a keyboard, a display (monitor), a touch display, a touchpad, a joystick, a mouse, a light pen, a stylus, a touch panel, a trackball, speakers, a microphone, augmented reality means, optical sensors, a tablet, light indicators, a projector, a camera, biometric identification means (a retina scanner, a fingerprint scanner, a voice recognition module), etc.

[0112] Средство сетевого взаимодействия (206) обеспечивает передачу данных посредством внутренней или внешней вычислительной сети, например, Интранет, Интернет, ЛВС и т.п. В качестве одного или более средств (206) может использоваться, но не ограничиваться: Ethernet карта, LTE модем, 5G модем, модуль спутниковой связи, NFC модуль, Wi-Fi модуль, одномодовые и многомодовые оптические интерфейсные модули, NVLink, InfiniBand, STM, ATM и ДР- [0113] Дополнительно могут применяться также средства спутниковой навигации в составе устройства (200), например, GPS, ГЛОНАСС, BeiDou, Galileo. Сигналы точного времени, полученные от упомянутых средств спутниковой навигации, могут использоваться для синхронизации работы географически разнесенных устройств. [0114] Конкретный выбор элементов устройства (200) для реализации различных программно-аппаратных архитектурных решений может варьироваться с сохранением обеспечиваемого требуемого функционала. [0112] The network interaction means (206) provides data transmission via an internal or external computer network, such as an Intranet, the Internet, a LAN, etc. One or more means (206) may include, but are not limited to: an Ethernet card, an LTE modem, a 5G modem, a satellite communication module, an NFC module, a Wi-Fi module, single-mode and multi-mode optical interface modules, NVLink, InfiniBand, STM, ATM and DR- [0113] Additionally, satellite navigation means may also be used as part of the device (200), for example, GPS, GLONASS, BeiDou, Galileo. Precise time signals received from the mentioned satellite navigation means may be used to synchronize the operation of geographically separated devices. [0114] The specific selection of elements of the device (200) for implementing various software and hardware architectural solutions may vary while maintaining the required functionality provided.

[0115] Модификации и улучшения вышеописанных вариантов осуществления настоящего технического решения будут ясны специалистам в данной области техники. Предшествующее описание представлено только в качестве примера и не несет никаких ограничений. Таким образом, объем настоящего технического решения ограничен только объемом прилагаемой формулы изобретения. [0115] Modifications and improvements of the above-described embodiments of the present technical solution will be clear to those skilled in the art. The foregoing description is provided only as an example and does not carry any limitations. Therefore, the scope of the present technical solution is limited only by the scope of the appended claims.

Claims

CLAUSE OF INVENTION.

1. A method for moving data in a cloud environment, comprising the steps of:

- a request is received via API service 20 for moving the first data set from the first data source (ID) 30 and the second data set from the second ID 31 to the target ID 40, containing at least information about the conditions for moving the data; a request is generated via API service 20, containing a sequence of operations for moving the data, as well as a parameter of the priority of moving, determining the order of execution of the moving operation, and the generated request is sent to the address of Task Scheduler 50;

- add a new task to move data to the queue using Task Scheduler 50 in accordance with the mentioned priority parameter;

- when a task is received from the execution queue, the task is transferred to Work Node 60;

- define, by means of Work Node 60, a list of interfaces for working with ID 30, ID 31 and ID 40;

- perform, via Work Node 60, an assessment of the possibility of carrying out the data movement task based on the available resources;

- determine by means of the Work Node 60 the method of interaction with the target ID 40 based on the volume of data and the format of the data being moved and stored in the said IDs;

- determine by means of Work Node 60 the optimal route for moving data from ID 30 and ID 31 to ID 40;

- check the operability of the IDs used to move data using the Working Node 60;

- by means of the Work Node 60, the first and second sets of data are read from the ID 30 and ID 31 and the said data are transferred to the address of the target ID 40 in accordance with the conditions of data movement.

2. The method according to item 1, characterized in that ID 30, ID 31 and ID 40 are located in the same cloud (i.e., logical pool).

3. The method according to claim 1, characterized in that at least one ID 30, ID 31 or ID 40 is located in a different cloud relative to the said IDs.

4. The method according to item 1, characterized in that the data transmission to the address of the target ID 40 is carried out taking into account the error correction code, data increment or parallel data transmission parameters.

5. The method according to paragraph 1, characterized in that it additionally contains a stage in which the registration of all data sources participating in the movement of data is checked.

6. The method according to claim 1, characterized in that the request for moving data contains information about at least one additional target ID to which the first and second sets of data should be moved, and the Task Scheduler 50 generates several tasks for moving data with the same priority parameter depending on the number of additional IDs.

7. The method according to item 1, characterized in that the Task Scheduler 50 additionally performs the following steps:

- extract from the data transfer request the time value before which the data transfer must be completed;

- extract data from memory that characterize the time it takes to complete similar tasks to move along a given route;

- based on the extracted data, the predicted time for completing the movement task is determined;

- performs a rearrangement of the data movement task in the queue for its execution at the specified time value, taking into account the predicted execution time of other tasks contained in the queue and the values of their priorities.

8. The method according to item 1, characterized in that in order to determine the optimal route, the following steps are performed by means of the Working Node 60:

- determine the route from ID 30 and ID 31 to ID 40, as well as from the IDs in which copies of the data of ID 30 or ID 31 that should be moved are stored, to ID 40;

- select the optimal route for moving data, taking into account the IDs in which copies of the data are stored.

9. The method according to item 1, characterized in that an additional step of checking the integrity of the transferred data is performed.

10. The method according to item 1, characterized in that the stage of synchronizing the completion of the process of moving data from IDs 30 and 31 is additionally performed by means of Work Node 60, including taking into account the increment.

11. The method according to claim 1, characterized in that additionally, the Work Node 60 performs version control of the moved data.

12. The method according to item 1, characterized in that metrics associated with the operation of moving data and the use of resources are additionally recorded.

13. The method according to item 1, characterized in that the execution of the data transportation operation is carried out in accordance with a specified schedule.

14. The method according to claim 1, characterized in that the first or second data sets contain structured and/or unstructured data and/or heterogeneous (multimodal) data.

15. A data movement system in a cloud environment, containing at least three IDs, an API service, a Task Scheduler, at least one

The Working Unit, wherein the system is designed with the ability to implement the method according to any of paragraphs 1-14.