RU2817534C1

RU2817534C1 - Method for automatic detection of objects using computer vision system installed on uav

Info

Publication number: RU2817534C1
Application number: RU2023121143A
Authority: RU
Inventors: Дмитрий Михайлович Мокачев; Вячеслав Константинович Барбасов
Original assignee: Общество с ограниченной ответственностью "ДРОН СОЛЮШНС"
Filing date: 2023-09-20
Publication date: 2024-04-16

Abstract

FIELD: physics.

SUBSTANCE: invention relates to methods of automatic detection of objects using a computer vision system installed on an unmanned aerial vehicle (UAV). Method includes steps of receiving frames from a video stream received from a camera of the UAV to a computing device, determining an offset of the current frame of the video stream relative to the previous processed frame in pixels, determining the size and position of the tracking window on the current frame, processing a fragment of the frame inside the tracking window by means of a neural network for detecting objects, determining the position of the objects detected in the fragment in the full frame. Further, determining the displacement in the frame of previously detected objects, which are inside the frame, but do not fall into the current tracking window by adding to the coordinates detected at the previous stages of processing objects, displacements in pixels between frames, and outputting the detection result.

EFFECT: high accuracy of detecting small-sized objects and objects.

6 cl, 4 dwg

Description

ОБЛАСТЬ ТЕХНИКИTECHNICAL FIELD

Настоящее техническое решение относится к области технического зрения и беспилотных воздушных средств (БВС), а именно к способу автоматического обнаружения объектов с использованием системы технического зрения, установленной на БВС.This technical solution relates to the field of technical vision and unmanned aerial vehicles (UAV), namely to a method for automatically detecting objects using a technical vision system installed on the UAV.

УРОВЕНЬ ТЕХНИКИBACKGROUND OF THE ART

Традиционные методы обнаружения объектов с применением нейронных сетей сталкиваются с трудностью обнаружения мелких объектов и объектов, находящихся далеко в кадре, так как они представлены малым количеством пикселей на изображении и имеют недостаточную детализацию. Известны методы повышения точности обнаружения малоразмерных объектов за счет распознавания каждого кадра по частям, "пробегая" по кадру окном. Однако данный способ приводит к резкому снижению скорости обнаружения из-за повышенных вычислительных затрат на обработку каждого кадра.Traditional object detection methods using neural networks face difficulty in detecting small objects and objects located far in the frame, since they are represented by a small number of pixels in the image and have insufficient detail. There are known methods for increasing the accuracy of detecting small objects by recognizing each frame in parts, “running” a window over the frame. However, this method leads to a sharp decrease in the detection speed due to the increased computational costs of processing each frame.

Другие способы повышения точности обнаружения малоразмерных объектов заключаются в изменении архитектуры нейронной сети. Например, добавление модулей, отвечающих за выделение признаков мелких объектов, и удаление отвечающих за крупные объекты. Однако такой подход потребует постоянного исследования и адаптации решения к новым моделям.Other ways to improve the accuracy of detecting small objects are to change the architecture of the neural network. For example, adding modules responsible for identifying features of small objects and removing those responsible for large objects. However, such an approach will require constant research and adaptation of the solution to new models.

Из уровня техники известен источник информации WO 2022074643 A1, опубликованный 14.04.2022, раскрывающий способ георегистрации с использованием идентификации объектов на основе машинного обучения. Система, посредством которой осуществляется способ, включает в себя видеокамеру, расположенную на транспортном средстве, таком как БВС, которая делает аэрофотоснимки местности. Вероятность успеха и точность алгоритмов геосинхронизации повышаются за счет использования обученной искусственной нейронной сети с прямой связью (ANN) для идентификации динамических объектов, которые меняются со временем, в кадрах, снятых видеокамерой. Такие кадры помечаются, например, путем добавления метаданных. Помеченные кадры могут использоваться в алгоритме геосинхронизации, который может основываться на сравнении с эталонными изображениями или может основываться на другом или том же ANN, путем удаления динамического объекта из кадра или удаления помеченного кадра для алгоритма. Динамический объект может меняться со временем из-за условий окружающей среды, таких как изменения погоды или географические изменения.The prior art contains information source WO 2022074643 A1, published on April 14, 2022, which discloses a method for georegistration using object identification based on machine learning. The system by which the method is carried out includes a video camera located on a vehicle, such as a UAV, which takes aerial photographs of the area. The success rate and accuracy of geosynchronization algorithms are improved by using a trained feedforward artificial neural network (ANN) to identify dynamic features that change over time in frames captured by a video camera. Such frames are marked, for example, by adding metadata. The tagged frames can be used in a geosynchronization algorithm, which can be based on comparison with reference images or can be based on a different or the same ANN, by removing the dynamic object from the frame or removing the tagged frame for the algorithm. A dynamic object may change over time due to environmental conditions such as weather changes or geographic changes.

Из уровня техники известен источник информации US 20220028048 A1, опубликованный 27.01.2022, раскрывающий систему и способ определения положения объекта с использованием изображений, полученных с мобильного устройства сбора изображений. Способ включает в себя получение изображений целевого географического района и телеметрической информации средства сбора изображений во время захвата, анализ каждого изображения для идентификации объектов и определение положения объектов. Способ дополнительно включает в себя определение высоты захвата изображения, определение положения изображения, используя высоту захвата и телеметрическую информацию, выполнение преобразования изображения на основе высоты захвата и телеметрической информации, идентификацию объектов на преобразованном изображении, определение местоположений в первых пикселях преобразованном изображении, выполнение обратного преобразования. Преобразование первых пикселей для определения местоположений вторых пикселей изображении и определение положений объектов в области на основе местоположений вторых пикселей в захваченном изображении и определенной позиции изображения.The prior art contains information source US 20220028048 A1, published on January 27, 2022, which discloses a system and method for determining the position of an object using images obtained from a mobile image collection device. The method includes obtaining images of a target geographic area and telemetry information from an image acquisition means at the time of capture, analyzing each image to identify objects, and determining the position of objects. The method further includes determining an image capture height, determining an image position using the capture height and telemetry information, performing an image transformation based on the capture height and telemetry information, identifying objects in the transformed image, determining locations in the first pixels of the transformed image, and performing an inverse transformation. Transforming the first pixels to determine the locations of the second pixels in the image and determining the positions of objects in the area based on the locations of the second pixels in the captured image and the determined position of the image.

Предлагаемое решение отличается от известных из уровня техники решений тем, что в предлагаемом решении не нужно сканировать каждый кадр изображения, так как информация об объектах сохраняется в нескольких кадрах и достаточно обработать один кадр, а посредством адаптации размера и позиции следящего окна к скорости и высоте движения БПЛА повышается точность обнаружения малоразмерных объектов.The proposed solution differs from the solutions known from the prior art in that in the proposed solution it is not necessary to scan each image frame, since information about objects is stored in several frames and it is enough to process one frame, and by adapting the size and position of the tracking window to the speed and height of movement UAVs increase the accuracy of detecting small objects.

СУЩНОСТЬ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

Технической задачей, на решение которой направлено заявленное техническое решение, является обнаружение малоразмерных объектов и объектов, находящихся далеко в кадре с сохранением скорости обработки видеопотока на уровне, близкому к реальному времени, а также отсутствие изменений архитектуры нейронной сети.The technical problem to be solved by the stated technical solution is the detection of small-sized objects and objects located far in the frame while maintaining the video stream processing speed at a level close to real time, as well as the absence of changes in the neural network architecture.

Техническим результатом, достигающимся при решении вышеуказанной технической задачи, является повышение точности обнаружения малоразмерных объектов и объектов, находящихся далеко в кадре и сохранение скорости обработки видеопотока на уровне, близком к реальному времени за счет использования перекрытия соседних кадров в видеопотоке. При сохранении скорости обработки видеопотока на уровне 15 и более кадров в секунду, существует высокая степень перекрытия между соседними кадрами. Это означает, что информация об объектах сохраняется в нескольких последовательных кадрах, учитывая скорость и направление оптического потока, зависящего от скорости, направления и высоты полета БВС, кадры, которые уже были обработаны на предыдущих кадрах, обрабатывать не обязательно. Путем адаптации размера и позиции следящего окна к скорости и высоте движения БПЛА повышается точность обнаружения малоразмерных объектов.The technical result achieved by solving the above technical problem is to increase the accuracy of detection of small-sized objects and objects located far in the frame and maintain the video stream processing speed at a level close to real time by using the overlap of adjacent frames in the video stream. While maintaining the video stream processing speed at 15 or more frames per second, there is a high degree of overlap between adjacent frames. This means that information about objects is stored in several successive frames, taking into account the speed and direction of the optical flow, which depends on the speed, direction and altitude of the UAV flight; frames that have already been processed in previous frames do not need to be processed. By adapting the size and position of the tracking window to the speed and altitude of the UAV, the accuracy of detecting small objects is increased.

Заявленный технический результат достигается за счет осуществления способа автоматического обнаружения объектов с использованием системы технического зрения, установленной на беспилотном воздушном судне (БВС), содержащий следующие этапы:The claimed technical result is achieved by implementing a method for automatically detecting objects using a technical vision system installed on an unmanned aircraft (UAV), containing the following steps:

на вычислительное устройство получают кадры из видеопотока, полученного с камеры БВС;the computing device receives frames from the video stream received from the UAV camera;

определяют смещение текущего кадра видеопотока по отношению к предыдущему обработанному кадру в пикселях;determine the offset of the current frame of the video stream relative to the previous processed frame in pixels;

определяют размер и положение следящего окна на текущем кадре;determine the size and position of the tracking window on the current frame;

осуществляют обработку фрагмента кадра внутри следящего окна посредством нейронной сети для обнаружения объектов;process a frame fragment inside the tracking window using a neural network to detect objects;

осуществляют определение положения обнаруженных во фрагменте объектов в полном кадре;determine the position of objects detected in the fragment in the full frame;

осуществляют определение смещения в кадре обнаруженных ранее объектов, которые находятся внутри кадра, но не попадают в текущее следящее окно, путем добавления к координатам, обнаруженных на предыдущих этапах обработки объектов, смещения в пикселях между кадрами;determine the displacement in the frame of previously detected objects that are located inside the frame, but do not fall into the current tracking window, by adding to the coordinates detected at the previous stages of object processing, a displacement in pixels between frames;

выводят результат обнаружения.output the detection result.

В частном варианте реализации предлагаемого способа, смещение текущего кадра по отношению к предыдущему обработанному кадру в пикселях осуществляют посредством алгоритма оптического потока.In a particular embodiment of the proposed method, the offset of the current frame relative to the previous processed frame in pixels is carried out using an optical flow algorithm.

В частном варианте реализации предлагаемого способа, смещение текущего кадра по отношению к предыдущему обработанному кадру в пикселях осуществляют посредством метода блочного сопоставления.In a particular embodiment of the proposed method, the offset of the current frame relative to the previous processed frame in pixels is carried out using the block matching method.

В частном варианте реализации предлагаемого способа, смещение текущего кадра по отношению к предыдущему обработанному кадру в пикселях осуществляют посредством определения смещения в пикселях на основе высоты и скорости движения БВС и ориентации камеры.In a particular embodiment of the proposed method, the offset of the current frame relative to the previous processed frame in pixels is carried out by determining the offset in pixels based on the height and speed of movement of the UAV and the orientation of the camera.

В частном варианте реализации предлагаемого способа, следящее окно располагается в центре кадра.In a particular embodiment of the proposed method, the tracking window is located in the center of the frame.

В частном варианте реализации предлагаемого способа, следящее окно перемещается по той части кадра, которая быстрее сместится за переделы кадра.In a particular embodiment of the proposed method, the tracking window moves over that part of the frame that will quickly move outside the frame.

ОПИСАНИЕ ЧЕРТЕЖЕЙDESCRIPTION OF DRAWINGS

Реализация изобретения будет описана в дальнейшем в соответствии с прилагаемыми чертежами, которые представлены для пояснения сути изобретения и никоим образом не ограничивают область изобретения. К заявке прилагаются следующие чертежи:The implementation of the invention will be described further in accordance with the accompanying drawings, which are presented to explain the essence of the invention and in no way limit the scope of the invention. The following drawings are attached to the application:

Фиг.1 - иллюстрирует схему, где кадры, пропущенные из-за времени работы алгоритма обнаружения объектов нейронной сетью, не учитываются.Figure 1 illustrates a diagram where frames missed due to the running time of the object detection algorithm by the neural network are not taken into account.

Фиг. 2 - иллюстрирует схему смещения видеопотока по одной оси с постоянной скоростью, для сканирования кадра с помощью следящего окна.Fig. 2 illustrates a scheme for shifting a video stream along one axis at a constant speed to scan a frame using a tracking window.

Фиг. 3 -иллюстрирует схему смещения видеопотока по двум осям X и Y, для сканирования кадра с помощью следящего окна.Fig. 3 illustrates the scheme for shifting the video stream along two axes X and Y to scan a frame using a tracking window.

Фиг. 4 - иллюстрирует пример реализации предлагаемого решения.Fig. 4 illustrates an example of the implementation of the proposed solution.

ДЕТАЛЬНОЕ ОПИСАНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

В приведенном ниже подробном описании реализации изобретения приведены многочисленные детали реализации, призванные обеспечить отчетливое понимание настоящего изобретения. Однако, квалифицированному в предметной области специалисту, будет очевидно каким образом можно использовать настоящее изобретение, как с данными деталями реализации, так и без них. В других случаях хорошо известные методы, процедуры и компоненты не были описаны подробно, чтобы не затруднять понимание особенностей настоящего изобретения.The following detailed description of the invention sets forth numerous implementation details designed to provide a clear understanding of the present invention. However, it will be apparent to one skilled in the art how the present invention can be used with or without these implementation details. In other cases, well-known methods, procedures and components have not been described in detail so as not to obscure the features of the present invention.

Кроме того, из приведенного изложения будет ясно, что изобретение не ограничивается приведенной реализацией. Многочисленные возможные модификации, изменения, вариации и замены, сохраняющие суть и форму настоящего изобретения, будут очевидными для квалифицированных в предметной области специалистов.In addition, from the above discussion it will be clear that the invention is not limited to the above implementation. Numerous possible modifications, alterations, variations and substitutions, while retaining the spirit and form of the present invention, will be apparent to those skilled in the art.

Предлагаемый способ осуществляется за счет работы системы, состоящей из БВС и вычислительного устройства.The proposed method is carried out through the operation of a system consisting of a UAV and a computing device.

На вычислительное устройство получают кадры из видеопотока, полученного с камеры БВС. Камера БВС получает изображение с помощью оптических линз и матрицы изображения, которая разбивает изображение на отдельные пиксели. Полученный кадр представляет собой электронное изображение в цифровом формате, то есть набор чисел, которые соответствуют яркости каждого пикселя на изображении. Получение кадров выполняется в режиме реального времени. На данном шаге из видеопотока берется последний из доступных декодированных кадров.The computing device receives frames from the video stream received from the UAV camera. The BVS camera receives an image using optical lenses and an image matrix, which breaks the image into individual pixels. The resulting frame is an electronic image in digital format, that is, a set of numbers that correspond to the brightness of each pixel in the image. Frames are received in real time. At this step, the last available decoded frame is taken from the video stream.

Далее определяют смещение текущего кадра видеопотока по отношению к предыдущему обработанному кадру в пикселях. Используется предыдущий кадр, который обрабатывался данным алгоритмом. Пропущенные кадры, если такие есть из-за времени работы алгоритма обнаружения объектов нейронной сетью, не учитываются (Фиг. 1).Next, the offset of the current frame of the video stream relative to the previous processed frame is determined in pixels. The previous frame that was processed by this algorithm is used. Missed frames, if there are any due to the operating time of the object detection algorithm by the neural network, are not taken into account (Fig. 1).

Для определения смещения между кадрами в видеопотоке могут быть использованы различные алгоритмы.Various algorithms can be used to determine the offset between frames in a video stream.

Один из наиболее распространенных алгоритмов - это алгоритм оптического потока (optical flow), который позволяет определить векторы смещения для каждого пикселя на изображении. Алгоритм оптического потока основан на предположении, что яркость каждой точки на изображении сохраняется между соседними кадрами. Он использует вычисление градиента яркости изображения и применяет методы оптимизации, такие как метод Лукаса-Канаде, чтобы найти векторы смещения пикселей между двумя кадрами.One of the most common algorithms is the optical flow algorithm, which allows you to determine displacement vectors for each pixel in the image. The optical flow algorithm is based on the assumption that the brightness of each point in the image is maintained between adjacent frames. It uses the calculation of the luminance gradient of an image and applies optimization techniques such as the Lucas-Kanade method to find pixel displacement vectors between two frames.

Другой метод для определения смещения между кадрами - это метод блочного сопоставления (block matching), который сравнивает блоки пикселей на текущем кадре с соответствующими блоками на предыдущем кадре. В результате сравнения определяются векторы смещения для каждого блока.Another method for determining offset between frames is the block matching method, which compares blocks of pixels on the current frame with corresponding blocks on the previous frame. As a result of the comparison, displacement vectors for each block are determined.

Третий способ заключается в расчете смещения в пикселях на основе высоты и скорости движения БВС. Для расчета смещения в пикселях на основе высоты и скорости движения БВС можно использовать следующую формулу:The third method is to calculate the displacement in pixels based on the height and speed of the UAV. To calculate the pixel offset based on the height and speed of the UAV, you can use the following formula:

dx=V * time_interval * frame_resolution_x / (2 * Н * tan(A))dx=V * time_interval * frame_resolution_x / (2 * N * tan(A))

dy=V * time_interval * frame_resolution_y / (2 * H * tan(A))dy=V * time_interval * frame_resolution_y / (2 * H * tan(A))

где:Where:

dx - смещение между кадрами по оси х в пикселях;dx - offset between frames along the x axis in pixels;

dy - смещение между кадрами по оси у в пикселях;dy - offset between frames along the y-axis in pixels;

Н - высота полета БПЛА над поверхностью;H - UAV flight altitude above the surface;

V - скорость движения БПЛА;V is the speed of the UAV;

А - угол обзора камеры;A - camera viewing angle;

time_interval - временной интервал между кадрами (время обновления), который можно определить как обратное значение частоты кадров;time_interval - time interval between frames (update time), which can be defined as the inverse value of the frame rate;

frame_resolution_x - разрешение кадра в пикселях по оси х;frame_resolution_x - frame resolution in pixels along the x axis;

frame_resolution_y - разрешение кадра в пикселях по оси у.frame_resolution_y - frame resolution in pixels along the y-axis.

Для использования данной формулы необходимо знать высоту и скорость движения БВС, данные параметры получают с полетного контроллера БВС, а также угол обзора камеры. Угол обзора камеры позволяет определить поле зрения камеры и, следовательно, количество пикселей на один градус угла обзора. Камера должна быть направлена строго вниз.To use this formula, you need to know the altitude and speed of the UAV; these parameters are obtained from the UAV flight controller, as well as the camera viewing angle. Camera viewing angle determines the camera's field of view and therefore the number of pixels per degree of viewing angle. The camera should be pointed straight down.

Далее определяют размер и положение следящего окна на текущем кадре. Эффективный размер следящего окна зависит от архитектуры нейронной сети, используемой для обнаружения объектов (наличие модулей отвечающих за выделение признаков мелких объектов), размера объектов в обучающем датасете, размера потенциальных объектов в пикселях. При этом минимальный размер окна зависит от скорости смещения кадров, чтобы успеть обработать все пространство изображения внутри кадра до его перемещения за пределы кадра.Next, the size and position of the tracking window on the current frame are determined. The effective size of the tracking window depends on the architecture of the neural network used for object detection (the presence of modules responsible for identifying features of small objects), the size of objects in the training dataset, and the size of potential objects in pixels. In this case, the minimum window size depends on the frame shift speed in order to have time to process the entire image space inside the frame before it moves outside the frame.

Следящее окно может располагаться всегда в центре кадра (или любом другом месте) и повышать точность обнаружения объектов, изображение которых попадает в область в центре кадра. В этом случае необходимо планировать съемку таким образом, чтобы в процессе перемещения камеры захватить центральной областью кадра все интересуемое пространство, с учетом также движения распознаваемых объектов.The tracking window can always be located in the center of the frame (or any other place) and increase the accuracy of detecting objects whose image falls in the area in the center of the frame. In this case, it is necessary to plan the shooting in such a way that, while moving the camera, the central area of the frame captures the entire space of interest, also taking into account the movement of the objects being recognized.

Другой способ заключается в последовательном «сканировании» кадра с помощью следящего окна. В вырожденном случае, когда камера направлена строго вниз и повернута по направлению движения БВС, смещение видеопотока можно представить в виде смещения по одной оси с постоянной скоростью (Фиг. 2). Следящее окно имеет форму квадрата шириной W_win. Квадратный размер выбран в соответствии с пропорциями входных слоев нейронных сетей применяемых для обнаружения объектов. Ширина кадра - W. Смещение между соседними кадрами - Δ_pix.Another way is to sequentially “scan” the frame using a tracking window. In the degenerate case, when the camera is directed straight down and rotated in the direction of motion of the UAV, the displacement of the video stream can be represented as a displacement along one axis at a constant speed (Fig. 2). The tracking window has the shape of a square with width W _win . The square size is chosen in accordance with the proportions of the input layers of the neural networks used for object detection. Frame width - W. Offset between adjacent frames - Δ _pix .

Количество итераций для сканирования следящим окном по всей ширине кадра определяется:The number of iterations for scanning with a tracking window across the entire frame width is determined by:

Чтобы успеть сканировать следящим окном по всей ширине кадра до того, как эта область сместится за пределы кадра, скорость смещения должна быть меньше скорости ширины следящего окна, разделенного на количество итераций сканирования по ширине, округленной в большую сторону:In order to have time to scan with a tracking window across the entire width of the frame before this area moves outside the frame, the displacement rate must be less than the speed of the tracking window width divided by the number of width scanning iterations, rounded up:

Минимальная ширина следящего окна может быть рассчитана по формуле:The minimum width of the tracking window can be calculated using the formula:

Другими словами, площадь следящего окна должна быть не меньше площади части кадра, которая полностью сместится за пределы кадра на следующем кадре.In other words, the area of the tracking window must be no less than the area of the part of the frame that will completely move outside the frame on the next frame.

В рассматриваемом способе следящее окно перемещается по той части кадра, которая быстрее сместится за переделы кадры. Для перемещения следящего окна может использоваться любой алгоритм, например, слева направо, или от центра к краям.In the method under consideration, the tracking window moves over that part of the frame that will quickly move outside the frame. Any algorithm can be used to move the tracking window, for example, from left to right, or from the center to the edges.

При смещении видеопотока по двум осям X и Y (Фиг. 3), минимальный размер окна также определяется площадью кадра, которая полностью сместится за пределы кадра на следующем кадре. Выбор положения следящего окна в кадре определяется остаточной площадью, состоящей из части кадра, которая сместится за пределы кадра за вычетом площади перекрытой следящим окном на предыдущем кадре, S_w и S_h на Фиг. 3. С какого края остаточная площадь больше, с той стороны помещается следующее следящее окно. Если Sw>Sh, то новое следящее окно помещается справа от последнего следящего окна в горизонтальном ряду. Если Sh<Sw то помещается сверху от последнего следящего окна в вертикально рядуWhen the video stream is shifted along two axes X and Y (Fig. 3), the minimum window size is also determined by the area of the frame, which will completely shift outside the frame on the next frame. The choice of the position of the tracking window in the frame is determined by the residual area, consisting of the part of the frame that will move outside the frame minus the area covered by the tracking window in the previous frame, S _w and S _h in Fig. 3. On which side the residual area is larger, the next tracking window is placed on that side. If Sw>Sh, then a new tracking window is placed to the right of the last tracking window in the horizontal row. If Sh<Sw then it is placed on top of the last tracking window in the vertical row

Далее осуществляют обработку фрагмента кадра внутри следящего окна посредством нейронной сети для обнаружения объектов.Next, the frame fragment inside the tracking window is processed using a neural network to detect objects.

Фрагмент передается в нейронную сеть в соответствии с размерами входного слоя. При необходимости выполняется преобразование фрагмента для приведения к нужному размеру. Например, входной слой имеет размер матрицы 1280×1280×3, а размер фрагмента 640×640×3 (RGB-изображение с тремя цветовыми каналами). В этом случае фрагмент масштабируется до размеров входного слоя нейронной сети. Для масштабирования может использоваться любой из популярных алгоритмов, например, метод линейной интерполяции. Масштабирование возможно, как с сохранением пропорций, так и без сохранения. Если пропорции не совпадают с размером входного слоя нейронной сети, может применяться заполнение выходящего за изображение пространства нулями (означает отсутствие яркости).The fragment is transmitted to the neural network in accordance with the size of the input layer. If necessary, the fragment is converted to the required size. For example, the input layer has a matrix size of 1280x1280x3, and a fragment size of 640x640x3 (an RGB image with three color channels). In this case, the fragment is scaled to the size of the input layer of the neural network. For scaling, any of the popular algorithms can be used, for example, the linear interpolation method. Scaling is possible both with and without saving proportions. If the proportions do not match the size of the input layer of the neural network, the space outside the image can be filled with zeros (meaning no brightness).

Для обнаружения объектов внутри следящего окна может использоваться любая нейронная сеть, предназначенная для решения задачи object detection, заключающейся в определении координат объекта на изображении и класса объекта. Например, YOLO, SSD, Retina Net и другие. Результатом обработки изображения нейронной сетью является массив, содержащий относительные координаты центра объекта, ширину и высоту, класс объекта и уверенность нейронной сети для каждого объекта. Для обнаружения объектов разных масштабов в нейронных сетях используется пирамидальная обработка изображений, когда во время тренировки масштаб изображения изменяется. Стандартные модели нейронных сетей предназначены для обнаружения объектов разных размеров. Поэтому при увеличении объекта за счет выделения фрагмента изображения, вместо подачи на вход нейронной сети целого изображения, нейронная сеть лучше обнаруживает признаки малоразмерного объекта, так как размер объекта в этом случае больше соответствует масштабам объектов, на которых обучалась нейронная сеть.To detect objects inside the tracking window, any neural network designed to solve the object detection problem, which consists in determining the coordinates of an object in the image and the class of the object, can be used. For example, YOLO, SSD, Retina Net and others. The result of image processing by a neural network is an array containing the relative coordinates of the object's center, width and height, object class, and the neural network's confidence for each object. To detect objects of different scales, neural networks use pyramidal image processing when the image scale changes during training. Standard neural network models are designed to detect objects of different sizes. Therefore, when enlarging an object by selecting a fragment of an image, instead of feeding the entire image to the input of the neural network, the neural network better detects signs of a small-sized object, since the size of the object in this case is more consistent with the scale of the objects on which the neural network was trained.

Далее осуществляют определение положения обнаруженных во фрагменте объектов в полном кадре. Положение определяется координатами в пикселях внутри фрагмента смещенными на значения положения фрагмента внутри кадра. Для этого к абсолютным координатам X₁ и Y_фробъекта внутри фрагмента прибавляются абсолютные координаты фрагмента внутри кадра Х_фр и Y_фр. Таким образом глобальные координаты объекта внутри кадра будут Х_гл=X₁+Х_фр, Y_гл=Y₁+Y_фp.Next, the position of the objects detected in the fragment in the full frame is determined. The position is determined by the coordinates in pixels inside the fragment, shifted by the values of the fragment’s position inside the frame. To do this, the absolute coordinates of the fragment inside the frame X _fr and Y _fr are added to the absolute coordinates X ₁ and Y _fr of the object inside the fragment. Thus, the global coordinates of the object inside the frame will be X _hl =X ₁ +X _fr , Y _hl =Y ₁ +Y _fr .

Далее осуществляют определение смещения в кадре обнаруженных ранее объектов, которые находятся внутри кадра, но не попадают в текущее следящее окно, с использованием данных (высота полета, скорость БВС), полученных от полетного контроллера БВС, путем добавления к координатам, обнаруженным на предыдущих этапах обработки объектов, смещения в пикселях между кадрами. Для этого к абсолютным координатам объекта X и Y добавляется смещение в пикселях между кадрами dx и dy. Далее определяется, находится ли объект еще в кадре. Если объект оказался за пределами кадра расчет его положения на следующих кадрах прекращается. Таким образом новые координаты объекта внутри кадра будут:Next, the displacement in the frame of previously detected objects that are inside the frame, but do not fall into the current tracking window, is determined using data (flight altitude, UAV speed) received from the UAV flight controller by adding to the coordinates detected at the previous processing stages objects, displacement in pixels between frames. To do this, a displacement in pixels between frames dx and dy is added to the absolute coordinates of the object X and Y. Next, it is determined whether the object is still in the frame. If the object is outside the frame, the calculation of its position in the next frames stops. Thus, the new coordinates of the object inside the frame will be:

X_new=X+dх, при условии 0<X<X_max X _new =X+dх, subject to 0<X<X _max

Y_new=Y+dy, при условии 0<Y<Y_max Y _new =Y+dy, subject to 0<Y<Y _max

По результату обработки выводят результат обнаружения. Результаты обнаружения передаются в формате [х, у, w, h, class] каждого объекта в кадре, где х, у, w, h относительные координаты центра объекта, его ширины и высоты. Class - обозначает индекс класса объекта. Дополнительно могут быть переданы значения уверенности нейронной сети в значении класса. Вывод осуществляется в виде желаемого отображения обнаруженных объектов. Это может быть визуальное отображение в виде рамок и названий классов прям в видео потоке, или передача результатов во внешние алгоритмы для дальнейшей обработки.Based on the processing result, the detection result is output. Detection results are transmitted in the format [x, y, w, h, class] of each object in the frame, where x, y, w, h are the relative coordinates of the object’s center, its width and height. Class - denotes the index of the object's class. Additionally, the neural network's confidence in the class value can be passed. The output is provided in the form of the desired display of detected objects. This can be a visual display in the form of frames and class names directly in the video stream, or transfer of results to external algorithms for further processing.

Далее, со ссылкой на Фиг. 4, расписан примера реализации изобретения. Используется БВС типа квадрокоптер, основанный на автопилоте РХ4, камера (1) передает видеопоток по беспроводному каналу связи Wi-Fi, кодек - Н.264, полетный контроллер (4) передает информацию о положении БВС по MAVLink протоколу через тот же канал связи, вычислительное устройство представляет персональный компьютер с операционной системой Windows 10, захват кадра (2) выполняется с помощью библиотеки OpenCV, расчет положения следящего окна (3) в виде программной функции, обработка нейронной сетью (5) YOLOv5 для обнаружения объектов, расчет положения объектов в виде программной функции.Next, with reference to FIG. 4, an example of the invention is described. A quadcopter-type UAV is used, based on the PX4 autopilot, the camera (1) transmits a video stream via a Wi-Fi wireless communication channel, the codec is H.264, the flight controller (4) transmits information about the position of the UAV via the MAVLink protocol through the same communication channel, computing the device represents a personal computer with the Windows 10 operating system, frame capture (2) is performed using the OpenCV library, calculation of the position of the tracking window (3) as a software function, processing by a neural network (5) YOLOv5 for object detection, calculation of the position of objects as a software function functions.

По схеме, видеопоток с камеры (1) передается на персональный компьютер, где с помощью библиотеки OpenCV выполняется захват кадра (2). Затем с помощью алгоритма для расчета положения следящего окна (3) на кадре вычисляется положение следящего окна. Обработка изображения для обнаружения объектов выполняется с помощью нейронной сети YOLOv5 (5). Полученные от полетного контроллера координаты БВС (4) используются для расчета смещения обнаруженных ранее объектов, которые не попали в текущее следящее окно с помощью алгоритма для расчета положения обнаруженных объектов (6). Результаты обработки передаются в алгоритма для расчета положения обнаруженных объектов (6), который вычисляет положение объектов на изображении. По результату обработки выводят результат обнаружения.According to the scheme, the video stream from the camera (1) is transmitted to a personal computer, where a frame is captured using the OpenCV library (2). Then, using an algorithm to calculate the position of the tracking window (3), the position of the tracking window in the frame is calculated. Image processing for object detection is performed using the YOLOv5 neural network (5). The UAV coordinates (4) received from the flight controller are used to calculate the displacement of previously detected objects that were not included in the current tracking window using an algorithm for calculating the position of detected objects (6). The processing results are transferred to the algorithm for calculating the position of detected objects (6), which calculates the position of objects in the image. Based on the processing result, the detection result is output.

Для определения положения БВС используется полетный контроллер (4), который передает информацию о его положении по MAVLink протоколу через беспроводной канал связи Wi-Fi. Вычисления выполняются на персональном компьютере, который может быть любым ПК с установленной операционной системой Windows 10 и дискретной видеокартой с поддержкой вычислительных ядер CUDA.To determine the position of the UAV, a flight controller (4) is used, which transmits information about its position via the MAVLink protocol via a Wi-Fi wireless communication channel. Calculations are performed on a personal computer, which can be any PC with the Windows 10 operating system installed and a discrete video card supporting CUDA computing cores.

Вычислительная система, обеспечивающие обработку данных, необходимую для реализации заявленного решения, в общем случае содержат такие компоненты, как: один или более процессоров, по меньшей мере одну память, средство хранения данных, интерфейсы ввода/вывода, средство ввода, средства сетевого взаимодействия. При исполнении машиночитаемых команд, содержащихся в оперативно памяти, конфигурируют процессор устройства для выполнения основных вычислительные операции, необходимых для функционирования устройства или функциональности одного, или более его компонентов. Память, как правило, выполнена в виде ОЗУ, куда загружается необходимая программная логика, обеспечивающая требуемый функционал. При осуществлении работы предлагаемого решения выделяют объем памяти, необходимы для осуществления предлагаемого решения. Средство хранения данных может выполняться в виде HDD, SSD дисков, рейд массива, сетевого хранилища, флэш-памяти и т.п.Средство позволяет выполнять долгосрочное хранение различного вида информации. Интерфейсы представляют собой стандартные средства для подключения и работы периферийных и прочих устройств, например, USB, RS232, RJ45, COM, HDMI, PS/2, Lightning и т.п. В качестве средств ввода данных в любом воплощении системы, реализующей описываемый способ, может использоваться клавиатура, джойстик, дисплей (сенсорный дисплей), проектор, тачпад, манипулятор мышь, трекбол, световое перо, динамики, микрофон и т.п.Средства сетевого взаимодействия выбираются из устройства, обеспечивающий сетевой прием и передачу данных, например, Ethernet карту, WLAN/Wi-Fi модуль, Bluetooth модуль, BLE модуль, NFC модуль, IrDa, RFID модуль, GSM модем и т.п.С помощью средств обеспечивается организация обмена данными по проводному или беспроводному каналу передачи данных, например, WAN, PAN, ЛВС (LAN), Интранет, Интернет, WLAN, WMAN или GSM. Компоненты устройства сопряжены посредством общей шины передачи данных.A computing system that provides data processing necessary to implement the claimed solution generally contains components such as: one or more processors, at least one memory, data storage means, input/output interfaces, input means, network communication means. When executing machine-readable instructions contained in the RAM, the device processor is configured to perform basic computing operations necessary for the operation of the device or the functionality of one or more of its components. Memory, as a rule, is made in the form of RAM, into which the necessary program logic is loaded to provide the required functionality. When implementing the proposed solution, the amount of memory required to implement the proposed solution is allocated. Data storage media can be in the form of HDDs, SSD drives, raid arrays, network storage, flash memory, etc. The tool allows long-term storage of various types of information. Interfaces are standard means for connecting and operating peripheral and other devices, for example, USB, RS232, RJ45, COM, HDMI, PS/2, Lightning, etc. As data input means in any embodiment of a system that implements the described method, a keyboard, joystick, display (touch display), projector, touchpad, mouse, trackball, light pen, speakers, microphone, etc. can be used. Network interaction means are selected from a device that provides network reception and transmission of data, for example, an Ethernet card, WLAN/Wi-Fi module, Bluetooth module, BLE module, NFC module, IrDa, RFID module, GSM modem, etc. Using the tools, the organization of data exchange is ensured via a wired or wireless data transmission channel, for example, WAN, PAN, LAN, Intranet, Internet, WLAN, WMAN or GSM. The device components are interconnected via a common data bus.

В настоящих материалах заявки было представлено предпочтительное раскрытие осуществление заявленного технического решения, которое не должно использоваться как ограничивающее иные, частные воплощения его реализации, которые не выходят за рамки испрашиваемого объема правовой охраны и являются очевидными для специалистов в соответствующей области техники.In these application materials, a preferred disclosure of the implementation of the claimed technical solution was presented, which should not be used as limiting other, private embodiments of its implementation, which do not go beyond the scope of the requested scope of legal protection and are obvious to specialists in the relevant field of technology.

Claims

1. A method for automatically detecting objects using a technical vision system installed on an unmanned aircraft (UAV), containing the following steps:

the computing device receives frames from the video stream received from the UAV camera;

determine the offset of the current frame of the video stream relative to the previous processed frame in pixels;

determine the size and position of the tracking window on the current frame;

process a frame fragment inside the tracking window using a neural network to detect objects;

determine the position of objects detected in the fragment in the full frame;

determine the displacement in the frame of previously detected objects that are located inside the frame, but do not fall into the current tracking window, by adding to the coordinates detected at the previous stages of object processing, a displacement in pixels between frames;

output the detection result.

2. The method according to claim 1, characterized in that the offset of the current frame relative to the previous processed frame in pixels is carried out using an optical flow algorithm.

3. The method according to claim 1, characterized in that the offset of the current frame relative to the previous processed frame in pixels is carried out using the block matching method.

4. The method according to claim 1, characterized in that the offset of the current frame relative to the previous processed frame in pixels is carried out by determining the offset in pixels based on the height and speed of movement of the UAV and the orientation of the camera.

5. The method according to claim 1, characterized in that the tracking window is located in the center of the frame.

6. The method according to claim 1, characterized in that the tracking window moves over that part of the frame that will quickly move outside the frame.