RU2839709C1

RU2839709C1 - Method for spatially parameterized estimation of motion vectors

Info

Publication number: RU2839709C1
Application number: RU2024106874A
Authority: RU
Inventors: Петр ПОГЛ; Сергей Юрьевич Подлесный; Евгений Андреевич МОСКОВЦЕВ; Тимур Эркинович АЛИЕВ; Александр Викторович ЯКОВЕНКО
Original assignee: Самсунг Электроникс Ко., Лтд.
Filing date: 2024-03-15
Publication date: 2025-05-12

Abstract

FIELD: physics.

SUBSTANCE: invention relates to image processing, and more specifically to estimation of motion vectors in a sequence of photographic or video images. Method for estimating motion on an image field comprises steps of: obtaining at least two images from an input bit stream comprising a plurality of images, performing preliminary processing of images, estimating a map of M parameters of a motion estimation algorithm (ME) for said pair of images and performing motion estimation for said pair of images; wherein said preprocessing comprises, at least, brightness adjustment. applying preconfigured prediction model of spatial parameters of motion estimation algorithm to said pair of images, storing said map of parameters of the motion estimation algorithm in the form of an array having dimensions corresponding to the desired field of motion vectors (MVF); motion estimation is performed depending on the above motion estimation algorithm parameters map and the motion vector field is output in the form of two-dimensional (2D) array of displacements from the position of each pixel on the first image to at least one position of the corresponding pixel on the second image in the said pair of images. Also disclosed is a system which realizes this method.

EFFECT: high accuracy of estimating motion vectors over an image field.

17 cl, 8 dwg

Description

Область техники, к которой относится изобретениеField of technology to which the invention relates

Предлагаемое изобретение относится к области обработки изображений, и более конкретно к оценке векторов движения на последовательности фото- или видеоизображений. Изобретение может быть использовано в различных технологиях, где задействована оценка плотного поля векторов движения (MVF). В частности, изобретение может быть использовано в области компьютерного зрения, формирования фото- и видеоизображений, например в формировании изображений с расширенным динамическим диапазоном (HDR) или в многокадровой съемке.The proposed invention relates to the field of image processing, and more specifically to the evaluation of motion vectors in a sequence of photo or video images. The invention can be used in various technologies where the evaluation of a dense field of motion vectors (MVF) is involved. In particular, the invention can be used in the field of computer vision, the formation of photo and video images, for example in the formation of images with an extended dynamic range (HDR) or in multi-frame shooting.

Уровень техникиState of the art

В настоящее время при съемке фото- и видеоизображений с расширенным динамическим диапазоном (HDR) в качестве одной из возможных технологий используется многокадровая съемка. Однако, поскольку при последовательной съемке нескольких кадров в изображаемой сцене могут присутствовать движущиеся объекты, для формирования качественного итогового изображения HDR необходима компенсация движения этих движущихся объектов, в особенности при ночной съемке на относительно длинных выдержках. Современные алгоритмы обработки изображений используют оценку плотного оптического потока (поля векторов движения, MVF), состоящего в каждом пикселе изображения из двумерного вектора, показывающего перемещение данного пикселя, в частности, при помощи так называемого сопоставления блоков.Currently, when shooting high dynamic range (HDR) photos and videos, one of the possible technologies used is multi-frame shooting. However, since moving objects may be present in the scene when shooting several frames sequentially, the motion of these moving objects must be compensated for in order to form a high-quality final HDR image, especially when shooting at night at relatively long shutter speeds. Modern image processing algorithms use an estimate of the dense optical flow (motion vector field, MVF), which consists of a two-dimensional vector in each pixel of the image showing the movement of this pixel, in particular, using the so-called block matching.

Известным на сегодняшний день алгоритмам обработки изображений присуща, в частности, проблема, вызванная допущением о постоянной освещенности при съемке последовательности изображений.Currently known image processing algorithms suffer, in particular, from the problem caused by the assumption of constant illumination when shooting a sequence of images.

Известные алгоритмы оценки движения (ME) основаны на допущении о постоянстве параметров освещенности. Однако следующие характеристики условий съемки изображений в реальности не соответствуют данному допущению:Well-known motion estimation (ME) algorithms are based on the assumption of constant illumination parameters. However, the following characteristics of image shooting conditions in reality do not correspond to this assumption:

а) различные параметры экспозиции и коэффициентов усиления датчика изображения на различных кадрах в пределах последовательности кадров брекетинга HDR;a) different exposure and image sensor gain settings on different frames within the HDR bracketing frame sequence;

б) мерцание электрического освещения;b) flickering of electric lighting;

в) изменения естественного освещения от кадра к кадру вследствие движения;c) changes in natural lighting from frame to frame due to movement;

г) размытие при движении объектов и/или камеры, а также текущих в кадре жидкостей или летящего пара;d) blurring due to movement of objects and/or the camera, as well as liquids flowing in the frame or flying steam;

д) автоматическая подстройка баланса белого в камере и/или сдвиг цветокоррекции от кадра к кадру.d) automatic adjustment of white balance in the camera and/or shifting of color correction from frame to frame.

Еще одна проблема, присущая существующим подходам к оценке движения, состоит в формировании шума на датчике изображения. Известные алгоритмы оценки движения, основанные на сопоставлении блоков, могут производить оценку поля векторов движения (MVF) с высокой точностью, однако при этом они могут обладать высокой чувствительностью к качеству содержимого изображения и уровням шума на нем. В видеоизображениях, снятых в условиях низкой освещенности, применяемый рекурсивный процесс темпорального шумоподавления (TNR) формирует движущиеся артефакты (так называемый «кипящий шум»), которые в значительной мере снижают воспринимаемое качество изображения. При ночной съемке изображений HDR кадры, снятые с различными параметрами экспозиции, могут иметь различные уровни шума после регулировки яркости вследствие различных сочетаний параметров выдержки и светочувствительности датчика изображения. Присутствие шума и различие в уровнях шума на разных кадрах затрудняет сопоставление блоков изображения, которое является основной алгоритмов ME, основанных на сопоставлении блоков.Another problem inherent in existing approaches to motion estimation is the generation of noise on the image sensor. Known block-matching-based motion estimation algorithms can estimate the motion vector field (MVF) with high accuracy, but they can be highly sensitive to the quality of the image content and the noise levels in it. In video images captured in low-light conditions, the applied recursive temporal noise reduction (TNR) process generates moving artifacts (the so-called “boiling noise”), which significantly reduce the perceived image quality. In night shooting of HDR images, frames captured with different exposure parameters may have different noise levels after brightness adjustment due to different combinations of shutter speed and image sensor sensitivity. The presence of noise and differences in noise levels across frames complicates image block matching, which is the core of block-matching-based ME algorithms.

При съемке в условиях искусственного освещения, например, электрическими источниками света в изображаемой сцене может присутствовать мерцание, которое ошибочно воспринимается алгоритмом оценки векторов движения как движение в кадре. В процессе темпорального шумоподавления упомянутый шум, формируемый на датчике изображения, может также приводить к формированию ложных векторов движения, к накоплению ошибки от кадра к кадру, и в результате к нарушению контуров предметов, присутствующих в формируемом изображении, вследствие упомянутой некорректной оценки векторов движения.When shooting under artificial lighting conditions, for example, with electric light sources, the scene being imaged may contain flickering, which is erroneously perceived by the motion vector estimation algorithm as motion in the frame. In the process of temporal noise reduction, the said noise generated on the image sensor may also lead to the formation of false motion vectors, to the accumulation of errors from frame to frame, and as a result to the violation of the contours of objects present in the image being formed, due to the said incorrect estimation of motion vectors.

В технике широко известен алгоритм ME, основанный на сопоставлении блоков методом полного перебора. Для каждого блока в первом кадре находят соответствующий ему блок в втором кадре на основании минимального значения функционала ошибки. В качестве функционала ошибки часто используют SAD: сумму модулей разности значения интенсивности каждого пиксела в блоке. Недостатками метода полного перебора являются высокие вычислительные затраты и низкая точность оценки движения, связанная с тем, что при сопоставлении блоков путем полного перебора не учитывают физической природы движения в кадре. Для повышения точности МЕ используют методы регуляризации функционала ошибки, например, вводят ограничение на полную вариацию поля векторов движения.The ME algorithm, based on matching blocks by the exhaustive search method, is widely known in technology. For each block in the first frame, the corresponding block in the second frame is found based on the minimum value of the error functional. SAD is often used as the error functional: the sum of the absolute values of the difference in the intensity of each pixel in the block. The disadvantages of the exhaustive search method are high computational costs and low accuracy of motion estimation, due to the fact that when matching blocks by exhaustive search, the physical nature of motion in the frame is not taken into account. To improve the accuracy of ME, error functional regularization methods are used, for example, a limitation is introduced on the total variation of the motion vector field.

В источнике US 10341574 B2 (Apple Inc., опубликован 02.07.2019) раскрыт способ захвата изображения сцены с расширенным динамическим диапазоном (HDR), в котором захватывают и сохраняют множество изображений с первым уровнем экспозиции изображения. Далее захватывают первое изображение со вторым уровнем экспозиции изображения, выбирают второе изображение из упомянутого множества захваченных изображений. На основе первого и второго изображений формируют составное изображение сцены с расширенным динамическим диапазоном (HDR). Для этого используют совмещение изображений на основе бинаризации канала яркости с использованием множества пороговых значений, причем совмещение изображений проводят на полном разрешении и как минимум на одном уменьшенном масштабе изображения. Сопоставление блоков выполняется в отношении вышеупомянутых пар изображений и по существу состоит в сравнении значимых признаков на изображениях. К недостаткам данного способа можно отнести отсутствие регуляризации, что может вызывать описанные выше общие проблемы, связанные с шумом, ложными векторами движения и т.п. Результат осуществления данного известного способа может быть ближе по характеристикам к алгоритмам ME с полным перебором.The source US 10341574 B2 (Apple Inc., published 02.07.2019) discloses a method for capturing an image of a scene with an extended dynamic range (HDR), in which a plurality of images with a first image exposure level are captured and stored. Next, a first image with a second image exposure level is captured, and a second image is selected from said plurality of captured images. Based on the first and second images, a composite image of a scene with an extended dynamic range (HDR) is formed. For this purpose, image alignment is used based on the binarization of the brightness channel using a plurality of threshold values, wherein the image alignment is performed at full resolution and at least at one reduced image scale. Block matching is performed with respect to the above-mentioned pairs of images and essentially consists of comparing significant features in the images. The disadvantages of this method include the lack of regularization, which can cause the general problems described above associated with noise, false motion vectors, etc. The result of implementing this known method can be closer in characteristics to ME algorithms with a complete search.

В источнике US 20210306528 A1 (Oppo Mobile, опубликован 30.09.2021) раскрыт способ оценки движения, в котором получают множество значений функционала ошибки между блоками для блока, подлежащего сопоставлению, и каждого из опорных блоков. Среди опорных блоков определяют по меньшей мере один блок-кандидат на основании упомянутого множества ошибок. Определяют диапазон случайных чисел согласно смещению между блоком, подлежащим сопоставлению, и каждым из множества опорных блоков. Получают множество случайных векторов движения, используя вектор движения между блоком, подлежащим сопоставлению, и каждым из по меньшей мере одного блока-кандидата в качестве базового вектора движения, а также случайные числа в пределах упомянутого диапазона случайных чисел в качестве приращений. Определяют целевой вектор движения, соответствующий блоку, подлежащему сопоставлению, согласно упомянутым случайным векторам движения. В данном алгоритме сопоставления блоков используется функция потерь при сопоставлении, представляющая собой комбинацию SAD, градиента краев блоков, Евклидова расстояния и штрафного члена. Блоки-кандидаты могут сортироваться на основании различных компонентов упомянутой функции потерь. К недостаткам данного известного решения можно отнести формирование слишком большого количества случайных векторов движения-кандидатов на основании сопоставления с упомянутыми блоками-кандидатами, что ограничивает рабочие характеристики системы, в которой используется данный алгоритм.US 20210306528 A1 (Oppo Mobile, published 09/30/2021) discloses a motion estimation method in which a plurality of interblock error functional values is obtained for a block to be compared and each of the reference blocks. At least one candidate block is determined among the reference blocks based on said plurality of errors. A range of random numbers is determined according to the offset between the block to be compared and each of the plurality of reference blocks. A plurality of random motion vectors are obtained using the motion vector between the block to be compared and each of the at least one candidate block as a base motion vector, as well as random numbers within said range of random numbers as increments. A target motion vector corresponding to the block to be compared is determined according to said random motion vectors. This block matching algorithm uses a matching loss function that is a combination of SAD, block edge gradient, Euclidean distance, and penalty term. Candidate blocks can be sorted based on different components of the mentioned loss function. The disadvantages of this known solution include the generation of too many random candidate motion vectors based on matching with the mentioned candidate blocks, which limits the performance of the system that uses this algorithm.

В источнике US 9153027 B2 (Nvidia Corp., опубликован 06.10.2015) раскрыт способ выполнения быстрого нежесткого совмещения по меньшей мере двух изображений в стеке изображений для формирования изображения с расширенным динамическим диапазоном. Способ включает в себя этапы формирования деформированного изображения на основе набора соответствующих пикселей, анализа деформированного изображения для обнаружения ненадежных пикселей на деформированном изображении, и формирования скорректированных значений пикселей для каждого ненадежного пикселя в деформированном изображении. Набор соответствующих пикселей включает в себя множество пикселей исходного изображения, причем каждый из упомянутого множества пикселей связан с потенциальным признаком на исходном изображении и сопоставляется с соответствующим пикселем на опорном изображении, который по существу соответствует пикселю на исходном изображении. К недостаткам данного известного решения можно отнести то, что для описанной выше обработки в реальном масштабе времени необходимо применение графического процессора (GPU), имеющего характеристики, соответствующие уровню настольных компьютеров. Кроме того, в данном известном решении осуществляется непосредственная компенсация движения без построения оптического потока. Данное решение не подходит для преобразований видеоизображения с повышением частоты кадров (FRUC) и других аналогичных применений, которые требуют истинной оценки движения.The source US 9153027 B2 (Nvidia Corp., published 06.10.2015) discloses a method for performing fast non-rigid alignment of at least two images in a stack of images to form an image with an extended dynamic range. The method includes the steps of forming a warped image based on a set of corresponding pixels, analyzing the warped image to detect unreliable pixels in the warped image, and forming corrected pixel values for each unreliable pixel in the warped image. The set of corresponding pixels includes a plurality of pixels of the original image, wherein each of said plurality of pixels is associated with a potential feature in the original image and is compared with a corresponding pixel in the reference image, which essentially corresponds to a pixel in the original image. The disadvantages of this known solution include the fact that the above-described real-time processing requires the use of a graphics processing unit (GPU) having characteristics corresponding to the level of desktop computers. In addition, this known solution directly performs motion compensation without constructing optical flow. This solution is not suitable for frame rate upconversion (FRUC) and other similar applications that require true motion estimation.

В источнике CN 115619816 A (Xiamen University, опубликован 17.01.2023) раскрыт способ динамического измерения смещения для устранения помех от освещения, основанный на динамической коррекции карты переменной освещенности пространства. Способ содержит извлечение низкочастотного компонента освещенности из изображения, используя карту освещенности, построение двухмерной функции гамма-коррекции и объединение ее с упомянутой картой освещенности, и выполнение отслеживания целевых объектов на корректируемом изображении на основе сопоставления шаблонов, причем способ отличается точным установлением координат пикселей на шаблоне. Регулируется интенсивность темных зон в пространстве значений освещенности, за счет чего обеспечивается возможность отображения деталей входного изображения в условиях низкой неравномерной освещенности. К недостаткам данного известного решения можно отнести ограничивающее допущение о том, что истинное изображение представляет собой низкочастотную составляющую, а свет представляет собой высокочастотный модификатор, что не всегда корректно отражает условия реального мира при выполнении фото- и видеосъемки.The source CN 115619816 A (Xiamen University, published 01/17/2023) discloses a method for dynamically measuring offset to eliminate interference from illumination, based on dynamic correction of a map of variable spatial illumination. The method comprises extracting a low-frequency illumination component from an image using an illumination map, constructing a two-dimensional gamma correction function and combining it with said illumination map, and tracking target objects in the corrected image based on template matching, wherein the method is characterized by accurately establishing the coordinates of pixels on the template. The intensity of dark zones in the space of illumination values is adjusted, due to which it is possible to display details of the input image under conditions of low uneven illumination. The disadvantages of this known solution include the limiting assumption that the true image is a low-frequency component, and light is a high-frequency modifier, which does not always correctly reflect the conditions of the real world when performing photo and video shooting.

В источнике CN 114862902 A (Liaoning University of Science and Technology, опубликован 05.08.2022) раскрыт способ извлечения и сопоставления признаков из объектно-реляционной базы данных (ORB) с самоадаптацией к параметрам освещенности изображаемой сцены, причем способ содержит получение параметров состояния движения камеры, построение пирамиды изображений (т.е. получение из исходного изображения уменьшенных изображений в масштабах ½, ¼ и т.д.) в соответствии с состоянием движения камеры, извлечение ключевых точек, их добавление или отбрасывание на основе усовершенствованного алгоритма квадратичного дерева, получение пар сопоставленных точек признаков, отбрасывание ошибочных пар точек признаков с использованием алгоритма оценки параметров модели на основе случайных выборок (RANSAC). К недостаткам данного известного решения можно отнести то, что в результате осуществления способа формируется разреженный оптический поток (MVF) (не обеспечивается полное поле векторов движения), и применение алгоритма RANSAC не позволяет использовать данный способ в реальном времени в случае съемки изображений с качеством 4K+.The source CN 114862902 A (Liaoning University of Science and Technology, published 08/05/2022) discloses a method for extracting and matching features from an object-relational database (ORB) with self-adaptation to the illumination parameters of the imaged scene, wherein the method comprises obtaining parameters of the camera movement state, constructing an image pyramid (i.e. obtaining reduced images at scales of ½, ¼, etc. from the original image) in accordance with the camera movement state, extracting keypoints, adding or discarding them based on an improved quadratic tree algorithm, obtaining pairs of matched feature points, discarding erroneous pairs of feature points using a random sampling model parameter estimation algorithm (RANSAC). The disadvantages of this known solution include the fact that the method results in the formation of a sparse optical flow (MVF) (the full field of motion vectors is not provided), and the use of the RANSAC algorithm does not allow the use of this method in real time in the case of shooting images with 4K+ quality.

В источнике GB 2599217 B (Intel Corp., опубликован 29.03.2023) раскрыт способ оценки оптического потока на основе глубокого обучения и слияния совпадающих частей оптического потока. В способе обрабатывают пары изображений, причем первое и второе изображения каждой пары разделяют на первый и второй наборы фрагментов, соответственно. Процесс формирования оптического потока на основе модели глубокого обучения применяют либо к третьему и четвертому фрагментам из первого и второго наборов фрагментов, соответственно, либо к изображениям после понижающей дискретизации, соответствующим первому и второму изображениям, соответственно, для формирования вторых результатов формирования оптического потока, после чего первые и вторые результаты формирования оптического потока объединяют, формируя карту оптического потока для пары изображений. К недостаткам данного известного решения можно отнести то, что методы оценки движения на основе глубокого обучения являются требовательными к памяти и вычислительной мощности, что ограничивает их применение в мобильных устройствах. Кроме того, для реализации данного способа требуется особое средство для объединения оптических потоков в совпадающих фрагментах.GB 2599217 B (Intel Corp., published 03/29/2023) discloses a method for estimating optical flow based on deep learning and merging matching portions of the optical flow. The method processes pairs of images, wherein the first and second images of each pair are divided into first and second sets of fragments, respectively. The process of generating optical flow based on the deep learning model is applied either to the third and fourth fragments of the first and second sets of fragments, respectively, or to the downsampled images corresponding to the first and second images, respectively, to generate second optical flow generation results, after which the first and second optical flow generation results are combined to form an optical flow map for the pair of images. The disadvantages of this known solution include the fact that deep learning-based motion estimation methods are memory and computing power intensive, which limits their use in mobile devices. In addition, the implementation of this method requires a special means for combining optical flows in matching fragments.

В источнике US 20210099727 A1 (Intel Corp., опубликован 01.04.2021) раскрыт способ оценки движения в структуре смежных блоков для кодирования видеоданных. В нем используются блоки изображения переменных размеров, а также имеющие вертикально ориентированную, горизонтально ориентированную и квадратную форму. Используются заданные наборы кандидатов для различных размеров блоков. Количество кандидатов является постоянным для каждого размера блока. При этом данный способ не обеспечивает возможность адаптации к содержимому изображения.US 20210099727 A1 (Intel Corp., published 01.04.2021) discloses a method for estimating motion in a structure of adjacent blocks for encoding video data. It uses image blocks of variable sizes, as well as vertically oriented, horizontally oriented, and square shapes. Specified sets of candidates for different block sizes are used. The number of candidates is constant for each block size. However, this method does not provide the ability to adapt to the content of the image.

В источнике WO 2021025375 A1 (Samsung Electronics Co., Ltd., опубликован 11.02.2021) раскрыт способ оценки движения на основе «пирамидального» метода полного перебора, после чего поле векторов движения уточняют путем удаления выбросовых векторов движения с использованием квадратичных ограничений для сетки вершин, основанной на структуре изображения. В способе принимают опорное изображение и неопорное изображение, разделяют опорное изображение на множество блоков, определяют карту векторов движения с использованием алгоритма сначала грубой, а затем более точной оценки векторов движения, и формируют выходной кадр с использованием карты векторов движения, опорного изображения и неопорного изображения.WO 2021025375 A1 (Samsung Electronics Co., Ltd., published 11.02.2021) discloses a motion estimation method based on a "pyramidal" exhaustive search method, after which the motion vector field is refined by removing outlier motion vectors using quadratic constraints for a vertex grid based on the image structure. The method receives a reference image and a non-reference image, divides the reference image into a plurality of blocks, determines a motion vector map using an algorithm for first coarsely estimating and then more accurately estimating the motion vectors, and generates an output frame using the motion vector map, the reference image, and the non-reference image.

К недостаткам данного способа можно отнести то, что он основан на полном переборе (полном поиске по всему полю изображения) для нахождения исходных векторов движения, хотя в нем и используется физически достоверная регуляризация для коррекции найденных векторов движения.The disadvantages of this method include the fact that it is based on a complete enumeration (a complete search over the entire image field) to find the original motion vectors, although it does use physically reliable regularization to correct the found motion vectors.

В источнике US 11343526 B2 (Realtek Semiconductor Corp., опубликован 24.05.2022) раскрыт способ обработки видеоизображений, включающий в себя разделение кадра на множество блоков, формирование вектора движения для каждого блока из множества блоков на основе текущего блока данного кадра и соответствующего блока предыдущего кадра. Далее формируется глобальный вектор движения согласно полученному множеству векторов движения, вычисляется сумма абсолютных разностей пикселей для каждого блока текущего кадра на основе упомянутого глобального вектора движения. Распределение суммы абсолютных разностей пикселей в области, содержащей набор блоков текущего кадра, сопоставляют с множеством моделей для определения наиболее подходящей модели, и на основе выбранной наиболее подходящей модели размечают каждый блок в упомянутой области текущего кадра как либо блок переднего плана, либо блок заднего плана. В данном известном способе сегментация движения основана на распределении SAD, а также на уточнении поля векторов движения в области границы движущегося объекта. К недостаткам данного способа можно отнести необходимость вначале выполнить глобальную оценку движения, а также оценку SAD для каждого блока по отношению к глобальному движению, и поиск модели, наиболее соответствующей упомянутому распределению.The source US 11343526 B2 (Realtek Semiconductor Corp., published 05/24/2022) discloses a method for processing video images, including dividing a frame into a plurality of blocks, forming a motion vector for each block from the plurality of blocks based on the current block of this frame and the corresponding block of the previous frame. Next, a global motion vector is formed according to the obtained set of motion vectors, the sum of the absolute pixel differences is calculated for each block of the current frame based on said global motion vector. The distribution of the sum of the absolute pixel differences in the area containing the set of blocks of the current frame is compared with a set of models to determine the most suitable model, and based on the selected most suitable model, each block in said area of the current frame is marked as either a foreground block or a background block. In this known method, motion segmentation is based on the SAD distribution, as well as on refining the field of motion vectors in the boundary area of the moving object. The disadvantages of this method include the need to first perform a global motion estimate, as well as an SAD estimate for each block relative to the global motion, and a search for a model that best fits the distribution mentioned.

Решение по источнику US 20210306528, охарактеризованное выше, можно рассматривать в качестве ближайшего аналога предлагаемого изобретения.The solution according to source US 20210306528, described above, can be considered as the closest analogue of the proposed invention.

Учитывая недостатки уровня техники, необходимо решение для повышения точности оценки оптического потока (поля векторов движения, MVF) в условиях изменяющейся освещенности и в широком диапазоне уровней шума на изображениях.Given the shortcomings of the state of the art, a solution is needed to improve the accuracy of optical flow (motion vector field, MVF) estimation under changing illumination conditions and over a wide range of image noise levels.

Раскрытие изобретенияDisclosure of invention

Данный раздел, в котором раскрыты различные аспекты заявляемого изобретения, предназначен для обеспечения краткого обзора заявленных объектов изобретения и их вариантов осуществления. Ниже приведены подробные характеристики технических средств и способов, которыми реализованы сочетания признаков настоящего изобретения. Ни данное раскрытие изобретения, ни подробное описание, приведенное ниже вместе с сопровождающими чертежами, не следует рассматривать как определяющие объем правовой охраны заявляемого изобретения. Объем правовой охраны изобретения определяется только нижеследующей формулой изобретения.This section, which discloses various aspects of the claimed invention, is intended to provide a brief overview of the claimed objects of the invention and their embodiments. Below are detailed characteristics of the technical means and methods by which the combinations of features of the present invention are realized. Neither this disclosure of the invention, nor the detailed description given below together with the accompanying drawings, should be considered as determining the scope of legal protection of the claimed invention. The scope of legal protection of the invention is determined only by the following claims.

Техническая проблема, решаемая настоящим изобретением, состоит в уменьшении влияния сложных условий съемки (разная освещенность, разная экспозиция и разные уровни шума между кадрами, мерцающее электрическое освещение) на оценку оптического потока при съемке фотоизображения с расширенным динамическим диапазоном (HDR) или видеоизображения.The technical problem solved by the present invention consists in reducing the influence of complex shooting conditions (different illumination, different exposure and different noise levels between frames, flickering electric lighting) on the assessment of optical flow when shooting a photographic image with an extended dynamic range (HDR) or a video image.

Технический результат, достигаемый при использовании заявляемого изобретения, состоит в повышении точности оценки векторов движения по полю изображения. Кроме того, обеспечивается высокое быстродействие системы при оценке векторов движения по полю изображения.The technical result achieved by using the claimed invention consists in increasing the accuracy of motion vectors estimation across the image field. In addition, high system performance is ensured when estimating motion vectors across the image field.

Задача, решаемая изобретением, состоит в создании способа и системы пространственно параметризованной оценки поля векторов движения (формирования оптического потока), в которой по меньшей мере частично или полностью были бы устранены недостатки описанных выше известных решений из уровня техники.The problem solved by the invention consists in creating a method and system for spatially parameterized evaluation of the field of motion vectors (formation of optical flow), in which the shortcomings of the above-described known solutions from the state of the art would be at least partially or completely eliminated.

В первом аспекте настоящего изобретения указанная задача решается способом оценки движения в последовательности изображений, содержащим этапы, на которых: получают по меньшей мере два изображения из входящего битового потока, содержащего множество изображений, выполняют предварительную обработку изображений, содержащую по меньшей мере регулировку яркости; оценивают карту М параметров алгоритма оценки движения (ME) для упомянутых по меньшей мере двух изображений с использованием предварительно конфигурированной модели прогнозирования пространственных параметров алгоритма оценки движения к упомянутым по меньшей мере двум изображениям, причем размерность карты M параметров алгоритма оценки движения зависит от способа разбиения изображения на блоки; сохраняют карту М параметров алгоритма оценки движения; выполняют оценку движения путем применения алгоритма оценки движения и параметров с карты М параметров алгоритма оценки движения к каждому блоку упомянутых по меньшей мере двух изображений, причем карта М параметров алгоритма оценки движения определяет параметры алгоритма оценки движения отдельно для каждого блока; и выводят поле векторов движения (MVF) в виде двумерного (2D) массива смещений от положения каждого пикселя на первом изображении в по меньшей мере одно положение соответствующего пикселя на втором изображении из упомянутых по меньшей мере двух изображений.In a first aspect of the present invention, said problem is solved by a method for estimating motion in a sequence of images, comprising the steps of: obtaining at least two images from an input bitstream containing a plurality of images, performing pre-processing of the images comprising at least a brightness adjustment; estimating a map M of parameters of a motion estimation algorithm (ME) for said at least two images using a pre-configured model for predicting spatial parameters of a motion estimation algorithm for said at least two images, wherein the dimensionality of the map M of parameters of the motion estimation algorithm depends on the method of dividing the image into blocks; storing the map M of parameters of the motion estimation algorithm; performing motion estimation by applying the motion estimation algorithm and the parameters from the map M of parameters of the motion estimation algorithm to each block of said at least two images, wherein the map M of parameters of the motion estimation algorithm determines the parameters of the motion estimation algorithm separately for each block; and outputting a motion vector field (MVF) in the form of a two-dimensional (2D) array of offsets from the position of each pixel in the first image to at least one position of a corresponding pixel in the second image of said at least two images.

Предварительная обработка упомянутых по меньшей мере двух изображений может содержать этап уменьшения масштаба изображений с формированием пирамиды изображений с по меньшей мере двумя уменьшенными масштабами. Уменьшение масштаба изображений может содержать этап, на котором выполняют пространственную фильтрацию пикселей необработанного изображения. Получение упомянутых по меньшей мере двух изображений из битового потока может дополнительно содержать этап, на котором получают по меньшей мере одно из предварительной оценки MVF, кодированных метаданных, относящихся к параметрам съемки изображения, данных датчиков, не относящихся к формированию изображения, или этап, на котором получают по меньшей мере одно из данных канала яркости, трехканальных цветных изображений в цветовых пространствах RGB или YUV.Preliminary processing of said at least two images may comprise a step of image scaling with formation of an image pyramid with at least two reduced scales. Image scaling may comprise a step of performing spatial filtering of pixels of the raw image. Obtaining said at least two images from a bitstream may further comprise a step of obtaining at least one of a preliminary MVF estimate, coded metadata related to image shooting parameters, sensor data not related to image formation, or a step of obtaining at least one of brightness channel data, three-channel color images in RGB or YUV color spaces.

В одном или более вариантах выполнения предварительно конфигурированная модель прогнозирования пространственных параметров алгоритма оценки движения имеет архитектуру сверточной нейронной сети или дискриминационной или регрессионной модели машинного обучения. Алгоритм оценки движения может быть выбран из по меньшей мере одного из алгоритма, основанного на сопоставлении блоков методом полного перебора, использующего в качестве минимального значения функционала ошибки сумму модулей разности значения интенсивности каждого пиксела в блоке (SAD), алгоритма 3D рекурсивного поиска (3DRS), алгоритма блочной эрозии с субпиксельным уточнением векторов движения (MV), алгоритма субпиксельной дискретизации блоков-кандидатов.In one or more embodiments, the pre-configured model for predicting spatial parameters of the motion estimation algorithm has the architecture of a convolutional neural network or a discriminative or regressive machine learning model. The motion estimation algorithm can be selected from at least one of an algorithm based on matching blocks by an exhaustive search method, using the sum of the absolute values of the difference in the intensity value of each pixel in the block (SAD) as the minimum value of the error functional, a 3D recursive search (3DRS) algorithm, a block erosion algorithm with sub-pixel refinement of motion vectors (MV), a sub-pixel discretization algorithm of candidate blocks.

Во втором аспекте указанная задача решается системой оценки движения в последовательности изображений, содержащей: блок предварительной обработки изображений, выполненный с возможностью осуществления предварительной обработки по меньшей мере двух изображений, полученных из входящего битового потока, содержащего множество изображений; блок формирования карты M параметров алгоритма оценки движения (ME) для упомянутых по меньшей мере двух изображений, выполненный с возможностью формирования карты М параметров (ME) с использованием предварительно конфигурированной модели прогнозирования пространственных параметров алгоритма оценки движения и сохранения карты М параметров, причем размерность карты M параметров алгоритма оценки движения зависит от способа разбиения изображения на блоки; блок оценки движения, выполненный с возможностью оценки движения на упомянутых по меньшей мере двух изображениях путем применения алгоритма оценки движения и параметров с карты М параметров алгоритма оценки движения к каждому блоку упомянутых по меньшей мере двух изображений, причем карта М параметров алгоритма оценки движения определяет параметры алгоритма оценки движения отдельно для каждого блока; и вывода поля векторов движения (MVF) в виде двумерного (2D) массива смещений от положения каждого пикселя на первом изображении в по меньшей мере одно положение соответствующего пикселя на втором изображении из упомянутых по меньшей мере двух изображений.In a second aspect, said problem is solved by a system for estimating motion in a sequence of images, comprising: an image pre-processing unit configured to pre-process at least two images obtained from an input bitstream containing a plurality of images; a unit for generating a map of M parameters of a motion estimation algorithm (ME) for said at least two images, configured to generate a map of M parameters (ME) using a pre-configured model for predicting spatial parameters of the motion estimation algorithm and storing the map of M parameters, wherein the dimensionality of the map of M parameters of the motion estimation algorithm depends on the method for dividing the image into blocks; a motion estimation unit configured to estimate motion on said at least two images by applying a motion estimation algorithm and parameters from the map of M parameters of the motion estimation algorithm to each block of said at least two images, wherein the map of M parameters of the motion estimation algorithm determines the parameters of the motion estimation algorithm separately for each block; and outputting a motion vector field (MVF) in the form of a two-dimensional (2D) array of offsets from the position of each pixel in the first image to at least one position of a corresponding pixel in the second image of said at least two images.

Система может дополнительно содержать блок обучения модели прогнозирования пространственных параметров алгоритма оценки движения, выполненный с возможностью обучения упомянутой модели прогнозирования пространственных параметров алгоритма оценки движения путем итеративного выполнения алгоритма оценки параметров с N вариантами параметров и прогнозирования значения параметров для каждого из блоков изображения, которое дает наилучший результат. Значение параметров для каждого из блоков изображения, которое дает наилучший результат, может определяться посредством вычисления функции потерь между векторами поля векторов движения и эталонными векторами движения.The system may additionally comprise a unit for training a model for predicting spatial parameters of the motion estimation algorithm, configured to train said model for predicting spatial parameters of the motion estimation algorithm by iteratively executing the parameter estimation algorithm with N parameter variants and predicting the parameter value for each of the image blocks that yields the best result. The parameter value for each of the image blocks that yields the best result may be determined by calculating the loss function between the vectors of the motion vector field and the reference motion vectors.

Система может быть дополнительно выполнена с возможностью получения по меньшей мере одного из предварительной оценки MVF, кодированных метаданных, относящихся к параметрам съемки изображения, данных датчиков, не относящихся к формированию изображения. Система может быть выполнена с возможностью получения пары изображений в виде по меньшей мере одного из данных канала яркости, трехканальных цветных изображений в цветовых пространствах RGB или YUV.The system can be further configured to obtain at least one of the MVF preliminary estimate, coded metadata related to the image shooting parameters, sensor data not related to image formation. The system can be configured to obtain a pair of images in the form of at least one of the brightness channel data, three-channel color images in the RGB or YUV color spaces.

Предварительно конфигурированная модель прогнозирования пространственных параметров алгоритма оценки движения может иметь архитектуру сверточной нейронной сети или дискриминационной или регрессионной модели машинного обучения. В одном или более вариантах выполнения изобретения алгоритм оценки движения может быть выбран из по меньшей мере одного из алгоритма, основанного на сопоставлении блоков методом полного перебора, использующего в качестве минимального значения функционала ошибки сумму модулей разности значения интенсивности каждого пиксела в блоке (SAD), алгоритма 3D рекурсивного поиска (3DRS), алгоритма блочной эрозии с субпиксельным уточнением векторов движения (MV), алгоритма субпиксельной дискретизации блоков-кандидатов.The pre-configured model for predicting spatial parameters of the motion estimation algorithm may have the architecture of a convolutional neural network or a discriminative or regressive machine learning model. In one or more embodiments of the invention, the motion estimation algorithm may be selected from at least one of an algorithm based on matching blocks by an exhaustive search method, using the sum of the absolute values of the difference in the intensity value of each pixel in the block (SAD) as the minimum value of the error functional, a 3D recursive search algorithm (3DRS), a block erosion algorithm with sub-pixel refinement of motion vectors (MV), an algorithm for sub-pixel discretization of candidate blocks.

В третьем аспекте изобретение относится к способу обучения модели прогнозирования пространственных параметров алгоритма оценки движения, содержащему этапы, на которых: выбирают индекс k пары кадров; вычисляют оцененную карту параметров y: y=Y(SRC_k, REF_k), где SRC_k - исходный кадр, REF_k - опорный кадр; формируют поле векторов движения MVF_k(y) из MVF_{i, k}, где i=1..N, относительно оцененной карты параметров y путем вычисления взвешенной суммы MVF_1..N,чтобы оценить выход пространственно адаптивного алгоритма ME; вычисляют функцию потерь L между векторами поля векторов движения MVF_{y, k} и эталонными векторами движения GT_k; обновляют модель прогнозирования пространственных параметров алгоритма оценки движения путем обратного распространения градиента функции потерь L; определяют, достигнута ли требуемая сходимость в модели прогнозирования пространственных параметров алгоритма оценки движения; и сохраняют модель прогнозирования пространственных параметров алгоритма оценки движения в памяти.In a third aspect, the invention relates to a method for training a model for predicting spatial parameters of a motion estimation algorithm, comprising the steps of: selecting an index k of a pair of frames; calculating an estimated map of parameters y: y=Y(SRC_k, REF_k), where SRC_k- source frame, REF_k- reference frame; form the motion vector field MVF_k(y) from MVF_{i, k}, where i=1..N, relative to the estimated parameter map y by calculating the weighted sum MVF_1..N,to evaluate the output of the spatially adaptive ME algorithm; the loss function L between the vectors of the motion vector field MVF is calculated_{y, k}and reference motion vectors GT_k; updating the prediction model of spatial parameters of the motion estimation algorithm by backpropagating the gradient of the loss function L; determining whether the required convergence in the prediction model of spatial parameters of the motion estimation algorithm has been achieved; and storing the prediction model of spatial parameters of the motion estimation algorithm in memory.

На этапе формирования поля векторов движения для вычисления взвешенной суммы могут использоваться весовые коэффициенты, полученные путем применения функции Softmax.At the stage of forming the field of motion vectors, weighting coefficients obtained by applying the Softmax function can be used to calculate the weighted sum.

В четвертом аспекте изобретение относится к системе обучения модели прогнозирования пространственных параметров алгоритма оценки движения, содержащей: блок рендеринга, выполненный с возможностью рендеринга последовательностей кадров с эталонным полем векторов движения (GT); блок оценки движения (ME), выполненный с возможностью оценки движения для упомянутых последовательностей кадров с каждым набором параметров с карты M параметров алгоритма оценки движения; блок обучения модели прогнозирования пространственных параметров алгоритма оценки движения, выполненный с возможностью обучения модели прогнозирования пространственных параметров алгоритма оценки движения посредством вычисления функции потерь L между векторами поля векторов движения, сформированного блоком оценки движения, и эталонными векторами движения, обновления модели прогнозирования пространственных параметров алгоритма оценки движения путем обратного распространения градиента функции потерь L, определения, достигнута ли требуемая сходимость в модели прогнозирования пространственных параметров алгоритма оценки движения; и блок хранения весовых коэффициентов обученной модели прогнозирования пространственных параметров алгоритма оценки движения.In a fourth aspect, the invention relates to a system for training a model for predicting spatial parameters of a motion estimation algorithm, comprising: a rendering unit configured to render frame sequences with a reference field of motion vectors (GT); a motion estimation unit (ME) configured to estimate motion for said frame sequences with each set of parameters from a map M of motion estimation algorithm parameters; a unit for training a model for predicting spatial parameters of a motion estimation algorithm, configured to train a model for predicting spatial parameters of a motion estimation algorithm by calculating a loss function L between vectors of the field of motion vectors generated by the motion estimation unit and the reference motion vectors, updating the model for predicting spatial parameters of a motion estimation algorithm by backpropagating the gradient of the loss function L, determining whether the required convergence in the model for predicting spatial parameters of a motion estimation algorithm has been achieved; and a unit for storing weight coefficients of the trained model for predicting spatial parameters of a motion estimation algorithm.

Специалистам в данной области техники очевидно, что изобретательский замысел не ограничен изложенными выше аспектами, и изобретение может принимать форму других объектов изобретения, таких как по меньшей мере одно устройство, компьютерная программа или компьютерный программный продукт, содержащий машиночитаемый носитель, на котором записана компьютерная программа. Дополнительные признаки, которые могут характеризовать конкретные варианты осуществления настоящего изобретения, будут очевидны специалистам в данной области техники из приведенного ниже подробного описания вариантов осуществления.It is obvious to those skilled in the art that the inventive concept is not limited to the aspects set forth above, and the invention may take the form of other inventive objects, such as at least one device, a computer program or a computer program product comprising a computer-readable medium on which the computer program is recorded. Additional features that may characterize specific embodiments of the present invention will be obvious to those skilled in the art from the detailed description of the embodiments given below.

Краткое описание чертежейBrief description of the drawings

Чертежи приведены в данном документе для облегчения понимания сущности настоящего изобретения. Чертежи схематичны и не выполнены в масштабе. Они служат исключительно для иллюстрации и не предназначены для определения объема настоящего изобретения.The drawings are provided herein to facilitate an understanding of the present invention. The drawings are schematic and not drawn to scale. They are for illustrative purposes only and are not intended to define the scope of the present invention.

Фиг. 1 - блок-схема способа оценки движения по полю изображения по одному или более вариантам выполнения изобретения;Fig. 1 is a block diagram of a method for estimating movement across an image field according to one or more embodiments of the invention;

Фиг. 2 - блок-схема процесса конфигурирования модели прогнозирования пространственных параметров алгоритма оценки движения в соответствии с одним или более вариантами выполнения изобретения;Fig. 2 is a block diagram of the process of configuring a model for predicting spatial parameters of a motion estimation algorithm in accordance with one or more embodiments of the invention;

Фиг. 3 - блок-схема процесса обучения модели прогнозирования пространственных параметров алгоритма оценки движения в соответствии с одним или более вариантами выполнения изобретения;Fig. 3 is a block diagram of the process of training a model for predicting spatial parameters of a motion estimation algorithm in accordance with one or more embodiments of the invention;

Фиг. 4 - схематичное изображение возможных вариантов размеров и ориентации окон блоков движения в адаптивном алгоритме оценки движения согласно различным вариантам выполнения изобретения;Fig. 4 is a schematic representation of possible variants of the sizes and orientations of the motion block windows in the adaptive motion estimation algorithm according to various embodiments of the invention;

Фиг. 5 - принципиальная схема системы оценки движения в последовательности изображений по одному или более вариантам выполнения изобретения;Fig. 5 is a schematic diagram of a system for estimating motion in a sequence of images according to one or more embodiments of the invention;

Фиг. 6 - принципиальная схема системы обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения по одному или более вариантам выполнения настоящего изобретения;Fig. 6 is a schematic diagram of a system for training a model Y for predicting spatial parameters of a motion estimation algorithm according to one or more embodiments of the present invention;

Фиг. 7 - схематичное изображение структуры модели формирования карты пространственных параметров алгоритма оценки движения согласно по меньшей мере одному варианту выполнения изобретения;Fig. 7 is a schematic representation of the structure of a model for forming a map of spatial parameters of a motion estimation algorithm according to at least one embodiment of the invention;

Фиг. 8 - схематичное изображение структуры модели формирования карты пространственных параметров алгоритма оценки движения согласно по меньшей мере одному другому варианту выполнения изобретения.Fig. 8 is a schematic representation of the structure of a model for forming a map of spatial parameters of a motion estimation algorithm according to at least one other embodiment of the invention.

Осуществление изобретенияImplementation of the invention

Иллюстративные варианты осуществления настоящего изобретения подробно описаны ниже и проиллюстрированы на сопровождающих чертежах, на которых одинаковые или аналогичные ссылочные позиции обозначают одинаковые или аналогичные элементы или элементы, которые имеют одинаковые или аналогичные функции. Иллюстративные варианты осуществления, описанные с обращением к сопровождающим чертежам, являются лишь иллюстративными и используются только для пояснения настоящего изобретения, и их не следует рассматривать в плане каких-либо ограничений объема изобретения.Illustrative embodiments of the present invention are described in detail below and are illustrated in the accompanying drawings, in which the same or similar reference numerals designate the same or similar elements or elements that have the same or similar functions. Illustrative embodiments described with reference to the accompanying drawings are illustrative only and are used only to explain the present invention, and should not be considered in terms of any limitations on the scope of the invention.

Согласно изобретению, предложен алгоритм оценки плотного оптического потока или поля векторов движения (MVF), в котором используется сопоставление блоков. Такой алгоритм может быть востребован во многих сферах применения, связанных, например, с компьютерным зрением, а также для уменьшения шума на изображениях, сжатия видеоданных, реконструкции 3D сцен, SLAM, распознавания/обнаружения/сегментации объектов в видеоизображении, в сфере дополненной или виртуальной реальности (VR/AR), управления движением автономных транспортных средств и т.п. Алгоритм согласно изобретению может быть реализован в виде по меньшей мере одного программного модуля, по меньшей мере одного аппаратного блока, или любого сочетания программных модулей и аппаратных блоков, которое может быть очевидным для специалиста в данной области техники. Следует понимать, что изобретение не ограничено конкретным сочетанием аппаратных и программных блоков, используемых для его реализации в каждом возможном варианте выполнения изобретения.According to the invention, an algorithm for estimating a dense optical flow or motion vector field (MVF) is proposed, which uses block matching. Such an algorithm may be in demand in many areas of application, such as computer vision, as well as for reducing noise in images, compressing video data, reconstructing 3D scenes, SLAM, recognizing/detecting/segmenting objects in a video image, in the field of augmented or virtual reality (VR/AR), controlling the movement of autonomous vehicles, etc. The algorithm according to the invention may be implemented as at least one software module, at least one hardware block, or any combination of software modules and hardware blocks that may be obvious to a person skilled in the art. It should be understood that the invention is not limited to a specific combination of hardware and software blocks used for its implementation in each possible embodiment of the invention.

Согласно изобретению, входные данные для оценки поля векторов движения (MVF) представляют собой по меньшей мере 2 кадра фото- или видеоизображения. Выходные данные представляют собой поле векторов движения (оптический поток), который является двухмерным (2D) массивом смещений из положения каждого пикселя в первом кадре в положение соответствующего пикселя во втором кадре.According to the invention, the input data for estimating the motion vector field (MVF) are at least 2 frames of a photo or video image. The output data are a motion vector field (optical flow), which is a two-dimensional (2D) array of displacements from the position of each pixel in the first frame to the position of the corresponding pixel in the second frame.

При этом, как будет показано ниже, точность оценки MVF в соответствии с изобретением обеспечивается за счет использования адаптивной оценки движения (ME), что позволяет, в частности, адаптировать применяемый алгоритм к:In this case, as will be shown below, the accuracy of the MVF estimation in accordance with the invention is ensured by using adaptive motion estimation (ME), which allows, in particular, to adapt the algorithm used to:

- различным уровням шума на различных кадрах вследствие различной светочувствительности/коэффициента усиления/времени выдержки при съемке серии кадров в рамках брекетинга HDR;- different noise levels in different frames due to different ISO/gain/shutter times when shooting a series of frames in HDR bracketing;

- различным параметрам экспозиции в серии кадров для получения изображения HDR;- different exposure parameters in a series of frames to obtain an HDR image;

- изменениям естественного освещения между кадрами серии, вызванным движением;- changes in natural lighting between frames of a series caused by movement;

- размытию при движении в одном или более кадрах;- motion blur in one or more frames;

- автоматической регулировке баланса белого и/или сдвигам цветокоррекции в камере между кадрами; и- automatic white balance adjustment and/or in-camera color correction shifts between frames; and

- величине (интенсивности) движения.- the magnitude (intensity) of movement.

В предлагаемом изобретении применяется усовершенствованный алгоритм ME, основанный на сопоставлении блоков. Для каждого блока в первом кадре находят соответствующий ему блок в втором кадре на основании минимального значения функционала ошибки. В качестве функционала ошибки часто используют SAD: сумму модулей разности значения интенсивности каждого пиксела в блоке.The proposed invention uses an improved ME algorithm based on block matching. For each block in the first frame, the corresponding block in the second frame is found based on the minimum value of the error functional. SAD is often used as the error functional: the sum of the absolute values of the difference in the intensity of each pixel in the block.

Кроме того, предлагаемый алгоритм отличается высоким быстродействием за счет исключения необходимости дополнительных итераций ME и основан на применении облегченной модели с прямыми связями для прогнозирования пространственной карты параметров алгоритма оценки движения, на основании которой движение адаптивно оценивается в каждом блоке, используя выбранный параметр алгоритма.In addition, the proposed algorithm is characterized by high performance due to the elimination of the need for additional ME iterations and is based on the use of a lightweight model with direct connections for predicting a spatial map of the parameters of the motion estimation algorithm, on the basis of which the motion is adaptively estimated in each block using the selected parameter of the algorithm.

Способ оценки движения в последовательности изображенийMethod for Estimating Motion in Image Sequences

В первом аспекте настоящее изобретение относится к способу оценки движения по полю изображения, получающему на входе по меньшей мере пару изображений из входящего битового потока, содержащего множество изображений, и выводящему на выходе поле векторов движения в виде двумерного (2D) массива смещений от положения каждого пикселя на первом изображении в по меньшей мере одно положение соответствующего пикселя на втором изображении в упомянутой паре изображений. Следует отметить, что в конкретных вариантах выполнения изобретения, в качестве альтернативы или дополнения к упомянутым по меньшей мере двум изображениям, способ может содержать получение по меньшей мере одного кадра изображения. В одном или более вариантах выполнения изобретения получение пары изображений может дополнительно содержать этап, на котором получают по меньшей мере одно из предварительной оценки MVF, кодированных метаданных, относящихся к параметрам съемки изображения, данных датчиков, не относящихся к формированию изображения. В одном или более вариантах выполнения получение пары изображений может содержать получение по меньшей мере одного из данных канала яркости, трехканальных цветных изображений в цветовых пространствах RGB или YUV.In a first aspect, the present invention relates to a method for estimating motion over an image field, receiving at least a pair of images from an input bitstream containing a plurality of images at the input, and outputting at the output a field of motion vectors in the form of a two-dimensional (2D) array of offsets from the position of each pixel in the first image to at least one position of the corresponding pixel in the second image in said pair of images. It should be noted that in specific embodiments of the invention, as an alternative or in addition to said at least two images, the method may comprise obtaining at least one image frame. In one or more embodiments of the invention, obtaining a pair of images may further comprise a step of obtaining at least one of a preliminary MVF estimate, encoded metadata related to image shooting parameters, sensor data not related to image formation. In one or more embodiments, obtaining a pair of images may comprise obtaining at least one of brightness channel data, three-channel color images in RGB or YUV color spaces.

В общем случае, в одном или более вариантах выполнения изобретения способ по первому аспекту настоящего изобретения содержит этапы S1-S12, которые будут подробно рассмотрены ниже с обращением к Фиг. 1.In general, in one or more embodiments of the invention, the method according to the first aspect of the present invention comprises steps S1-S12, which will be discussed in detail below with reference to Fig. 1.

На этапе S1 получают пару изображений из входящего битового потока, содержащего множество изображений. Специалистам в данной области техники следует понимать, что в конкретных неограничивающих вариантах выполнения изобретения входящий битовый поток может содержать как видеоизображение, так и последовательность из одного или более неподвижных (фото) изображений. Как описано выше, пара изображений из входящего битового потока может быть представлена кадрами (изображениями) в необработанном и несжатом формате (в качестве неограничивающего примера, формате Color Filter Array - CFA), либо в одном из других известных форматов, при этом изображения могут представлять собой, в качестве неограничивающего примера, трехканальные цветные изображения в цветовом пространстве RGB (красный, зеленый и синий цветовые каналы) или YUV (содержащем яркостную составляющую Y и две цветоразностные составляющие U и V, при этом компоненты YUV могут определяться на основе компонентов RGB как YCbCr).At step S1, a pair of images is obtained from an input bitstream containing a plurality of images. Those skilled in the art should understand that in specific non-limiting embodiments of the invention, the input bitstream may contain both a video image and a sequence of one or more still (photo) images. As described above, the pair of images from the input bitstream may be represented by frames (images) in an unprocessed and uncompressed format (as a non-limiting example, the Color Filter Array - CFA format), or in one of the other known formats, wherein the images may represent, as a non-limiting example, three-channel color images in the RGB color space (red, green and blue color channels) or YUV (containing a luminance component Y and two color difference components U and V, wherein the YUV components can be determined based on the RGB components as YCbCr).

На этапе S2 выполняется предварительная обработка полученной на этапе S1 пары изображений. В качестве неограничивающего примера, предварительная обработка может содержать по меньшей мере одно из регулировки яркости и по меньшей мере одной ступени уменьшения масштаба.In step S2, pre-processing of the pair of images obtained in step S1 is performed. As a non-limiting example, the pre-processing may comprise at least one of brightness adjustment and at least one scaling step.

Так, в одном или более неограничивающих вариантах выполнения регулировка яркости пары изображений, содержащей опорное (REF) изображение и исходное изображение (SRC) может осуществляться с учетом следующих метаданных:Thus, in one or more non-limiting embodiments, the brightness adjustment of a pair of images containing a reference (REF) image and a source image (SRC) may be performed taking into account the following metadata:

- показателя экспозиции EV_ref, EV_src - exposure index EV _ref , EV _src

- значений времени экспозиции T_ref, T_src - values of exposure time T _ref , T _src

- значений общего коэффициента усиления G_ref, G_src - values of the total gain coefficient G _ref , G _src

- битового разрешения на пиксель BPP- bit resolution per pixel BPP

- уровня черного BL- black level BL

Регулировка яркости необходима для приблизительного выравнивания интенсивностей пикселей при различии в уровне экспозиции для возможности сравнения блоков, например, функцией SAD. Например, если первый кадр снят с экспозицией 1, а второй кадр с экспозицией в 16 раз меньше, то необходимо интенсивности пикселей второго кадра умножить на 16.Brightness adjustment is necessary for approximate alignment of pixel intensities with different exposure levels to enable block comparison, for example, using the SAD function. For example, if the first frame is taken with exposure 1, and the second frame with exposure 16 times less, then the pixel intensities of the second frame must be multiplied by 16.

Регулировка яркости может осуществляться, например, следующим образом:Brightness adjustment can be done, for example, as follows:

Уменьшение масштаба представляет собой по существу процесс формирования «пирамиды» из изображений с разными уровнями уменьшения масштаба:Zooming out is essentially the process of forming a "pyramid" of images with different levels of zoom out:

- имея опорное изображение в формате CFA Î_ref (Байеровскую мозаику), формируют по меньшей мере первый кадр с уменьшенным масштабом и второй кадр с уменьшенным масштабом в формате YUV (в предпочтительном варианте выполнения, ½ и ¼ масштаба исходного изображения, соответственно): Î_ref ¹, Î_ref ² - having a reference image in the CFA Î _ref format (Bayer mosaic), at least a first frame with a reduced scale and a second frame with a reduced scale in the YUV format are formed (in the preferred embodiment, ½ and ¼ of the scale of the original image, respectively): Î _ref ¹ , Î _ref ²

- имея исходное изображение в формате CFA Î_src (Байеровскую мозаику), формируют по меньшей мере первый кадр с уменьшенным масштабом и второй кадр с уменьшенным масштабом в формате YUV (в предпочтительном варианте выполнения, ½ и ¼ масштаба исходного изображения, соответственно): Î_src ¹, Î_src ².- having an original image in the CFA format Î _src (Bayer mosaic), at least a first frame with a reduced scale and a second frame with a reduced scale in the YUV format are formed (in the preferred embodiment, ½ and ¼ of the scale of the original image, respectively): Î _src ¹ , Î _src ² .

На этапе S3 получают на входе предварительно конфигурированную модель Y прогнозирования пространственных параметров алгоритма оценки движения. Процесс обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения согласно одному или более конкретным вариантам выполнения изобретения описан ниже. Следует отметить, что в одном или более вариантах выполнения изобретения модель Y прогнозирования пространственных параметров алгоритма оценки движения может сохраняться в той же системе, которая реализует способ согласно изобретению, или может быть принята из внешних источников, в частности с одного или более внешних серверов, вычислительных устройств и т.п. посредством одного или более сетевых соединений.At step S3, a pre-configured model Y of the prediction of spatial parameters of the motion estimation algorithm is received at the input. The process of training the model Y of the prediction of spatial parameters of the motion estimation algorithm according to one or more specific embodiments of the invention is described below. It should be noted that in one or more embodiments of the invention, the model Y of the prediction of spatial parameters of the motion estimation algorithm can be stored in the same system that implements the method according to the invention, or can be received from external sources, in particular from one or more external servers, computing devices, etc. via one or more network connections.

На этапе S4 выполняют прогнозирование карты M параметров алгоритма оценки движения (ME) для упомянутой пары изображений. Для этих целей применяют предварительно конфигурированную модель прогнозирования пространственных параметров алгоритма оценки движения к упомянутой паре изображений.At step S4, a prediction of a map M of parameters of the motion estimation algorithm (ME) is performed for said pair of images. For these purposes, a pre-configured model for predicting spatial parameters of the motion estimation algorithm is applied to said pair of images.

Размерность карты М параметров алгоритма оценки движения зависит от способа разбиения изображения на блоки. Здесь следует отметить, что объем правовой охраны изобретения не ограничен конкретными способами разбиения изображений на блоки, которые сами по себе хорошо известны в данной области техники. Размеры и формы блоков также могут быть различными, в качестве неограничивающего примера - блоки могут быть квадратными и/или прямоугольными, могут иметь одинаковую форму и размеры по всему изображению или различную форму и размеры в зависимости от конкретного участка изображения, и т.п. При этом карта М параметров алгоритма оценки движения обеспечивает прогноз параметра для каждого блока в изображении вне зависимости от формы, размера конкретного блока, его расположения на изображении и т.п.The dimension of the map M of the parameters of the motion estimation algorithm depends on the method of dividing the image into blocks. It should be noted here that the scope of legal protection of the invention is not limited to specific methods of dividing images into blocks, which are themselves well known in this field of technology. The sizes and shapes of the blocks can also be different, as a non-limiting example - the blocks can be square and/or rectangular, can have the same shape and size throughout the image or different shapes and sizes depending on a specific section of the image, etc. In this case, the map M of the parameters of the motion estimation algorithm provides a prediction of the parameter for each block in the image regardless of the shape, size of a specific block, its location in the image, etc.

В качестве модели прогнозирования пространственных параметров алгоритма оценки движения могут использоваться различные модели, в качестве неограничивающего примера - модели с архитектурой сверточной нейронной сети, однако для прогнозирования пространственных параметров алгоритма оценки движения может использоваться по существу любой подходящий классификатор, такой как модель машинного обучения, в качестве неограничивающего примера - модель логистической регрессии, дерево решений и т.п.Various models can be used as a model for predicting the spatial parameters of the motion estimation algorithm, including, but not limited to, models with a convolutional neural network architecture, but essentially any suitable classifier can be used to predict the spatial parameters of the motion estimation algorithm, such as a machine learning model, including, but not limited to, a logistic regression model, a decision tree, etc.

Кроме того, в одном или более неограничивающих вариантах выполнения изобретения для оценки движения может использоваться алгоритм из семейства алгоритмов 3D рекурсивного поиска (3DRS), содержащих формирование «пирамиды» путем использования изображения с исходным разрешением, двух- или четырехкратного масштабирования изображения с уменьшением разрешения (что также называется в контексте настоящего изобретения уменьшением масштаба) и поочередной оценкой движения в полученных кадрах; при этом на определенном уровне упомянутой «пирамиды» используется различная структура кандидатов векторов движения и количество итераций.In addition, in one or more non-limiting embodiments of the invention, an algorithm from the family of 3D recursive search (3DRS) algorithms may be used for motion estimation, comprising forming a "pyramid" by using an image with the original resolution, scaling the image two- or four-fold with a decrease in resolution (which is also called downscaling in the context of the present invention) and alternately estimating the motion in the obtained frames; wherein at a certain level of the said "pyramid" a different structure of motion vector candidates and a number of iterations are used.

Кроме того, может использоваться алгоритм блочной эрозии с субпиксельным уточнением векторов движения (MV), либо алгоритм субпиксельной дискретизации блоков-кандидатов.In addition, a block erosion algorithm with subpixel motion vector (MV) refinement or a subpixel candidate block sampling algorithm can be used.

По меньшей мере в одном неограничивающем варианте выполнения, для оценки карты M получают пару входных кадров (опорного и исходного) в первом пространственном разрешении: Î_ref1 , Î_src1. Из этих кадров извлекают данные канала яркости: Y_ref, Y_src. К этим данным может применяться предварительно конфигурированная модель, такая как, например, сверточная нейросеть. Входные данные модели представляют собой 2-канальный тензор (2xHxW), сформированный из Y_ref, Y_src (где H, W - высота и ширина обрабатываемого фрагмента изображения). На выходе модель выдает тензор N x H/8 x W/8 (где N - мощность множества допустимых значений набора параметров ME). Карта М параметров получается посредством выполнения функции argmax по N каналам, и таким образом M представляет собой матрицу H/8 x W/8. Каждый элемент матрицы М может принимать одно из N значений.In at least one non-limiting embodiment, for estimating the map M, a pair of input frames (reference and original) are obtained at a first spatial resolution: Î _ref 1 , Î _src 1. From these frames, the brightness channel data are extracted: Y _ref , Y _src . A pre-configured model, such as, for example, a convolutional neural network, can be applied to this data. The input data of the model are a 2-channel tensor (2xHxW) formed from Y _ref , Y _src (where H, W are the height and width of the processed image fragment). At the output, the model produces a tensor N x H/8 x W/8 (where N is the power of the set of admissible values of the set of parameters ME). The map M of parameters is obtained by executing the argmax function over N channels, and thus M is a matrix H/8 x W/8. Each element of the matrix M can take one of N values.

Полученную карту параметров сохраняют в виде массива M данных, например, целочисленного типа.The resulting parameter map is saved as an array M of data, for example, of integer type.

На этапе S5 инициализируют счетчик итераций алгоритма оценки движения (ME) - S=0. На этапе S6 инициализируют индексы блоков обрабатываемых изображения - x, y=0, 0.At stage S5, the iteration counter of the motion estimation (ME) algorithm is initialized - S=0. At stage S6, the indices of the blocks of the processed images are initialized - x, y=0, 0.

На этапе S7 выполняют оценку движения (ME) для каждого блока (x, y) с использованием параметра M[x, y] с карты M параметров алгоритма оценки движения, оцененной на этапе S4.In step S7, motion estimation (ME) is performed for each block (x, y) using the parameter M[x, y] from the map M of motion estimation algorithm parameters estimated in step S4.

Для этого в одном или более вариантах выполнения изобретения используется алгоритм 3D рекурсивного поиска (3DRS). Алгоритм инициализируется, и для кадров «пирамиды» изображений, сформированной на этапе S2 предварительной обработки изображений, выбирается первое пространственное разрешение (в качестве неограничивающего примера, масштаб изображения ¼).For this purpose, in one or more embodiments of the invention, a 3D recursive search (3DRS) algorithm is used. The algorithm is initialized, and a first spatial resolution (as a non-limiting example, an image scale of ¼) is selected for the frames of the image "pyramid" formed in the image preprocessing step S2.

В масштабе ¼ оценивается движение от первого кадра ко второму кадру в соответствующей паре кадров с использованием параметра, субдискретизированного для соответствующего блока с индексом x, y с карты M параметров. В масштабе ¼ оценивается движение от второго кадра к первому кадру в упомянутой паре кадров, опять же, с использованием параметра, субдискретизированного для соответствующего блока с индексом x, y с карты М параметров. Затем выбирается второе пространственное разрешение для кадров «пирамиды» изображений, сформированной на этапе S2 предварительной обработки изображений (в качестве неограничивающего примера, масштаб изображения ½). Оценивается движение от второго кадра к первому кадру в упомянутой паре кадров в масштабе ½ с использованием параметра с карты М параметров в соответствии с индексом данного блока в алгоритме 3DRS. В качестве основного параметра алгоритма оценки движения 3DRS используется абсолютное значение попиксельной суммы абсолютных разностей блоков (SAD), к которому применяется определенное пороговое значение отсечки (SadClippingValue), которое в одном или более вариантах выполнения изобретения может находиться в определенном диапазоне (в качестве неограничивающего примера - 0..19 (N=20)).At the scale of ¼, the motion from the first frame to the second frame in the corresponding frame pair is estimated using the parameter downsampled for the corresponding block with index x, y from the parameter map M. At the scale of ¼, the motion from the second frame to the first frame in said frame pair is estimated, again using the parameter downsampled for the corresponding block with index x, y from the parameter map M. Then, a second spatial resolution is selected for the frames of the image "pyramid" formed in the step S2 of image preprocessing (as a non-limiting example, the image scale of ½). The motion from the second frame to the first frame in said frame pair is estimated at the scale of ½ using the parameter from the parameter map M in accordance with the index of this block in the 3DRS algorithm. The main parameter of the 3DRS motion estimation algorithm is the absolute value of the per-pixel sum of absolute differences of blocks (SAD), to which a certain threshold clipping value (SadClippingValue) is applied, which in one or more embodiments of the invention may be in a certain range (as a non-limiting example - 0..19 (N=20)).

На этом этапе выбирают вектор движения для текущего блока (x, y) из векторов-кандидатов.At this stage, the motion vector for the current block (x, y) is selected from the candidate vectors.

В качестве альтернативных параметров адаптивного алгоритма ME в одном или более вариантах выполнения, кроме упомянутого выше значения отсечки SAD, могут быть использованы различные другие параметры, такие как, в качестве неограничивающего примера, размер или форма окна SAD, структура (паттерн) или ориентация окон или блоков изображения (см. Фиг. 4). В частности, расположение векторов движения (MV)-кандидатов. Размеры и форма (соотношение сторон прямоугольника) окон блоков оценки движения в соответствии различными вариантами выполнения изобретения могут быть выбраны из: 8×8, 16×16, 16×24, 24×16, 24×24 или других значений количества пикселей по двум сторонам прямоугольника (N=5). Величина N обозначает мощность множества допустимых значений адаптируемого параметра МЕ. As alternative parameters of the adaptive ME algorithm in one or more embodiments, in addition to the above-mentioned SAD cutoff value, various other parameters can be used, such as, as a non-limiting example, the size or shape of the SAD window, the structure (pattern) or orientation of the windows or image blocks (see Fig. 4). In particular, the location of the motion vectors (MV) candidates. The sizes and shapes (aspect ratio of the rectangle) of the windows of the motion estimation blocks in accordance with various embodiments of the invention can be selected from: 8×8, 16×16, 16×24, 24×16, 24×24 or other values of the number of pixels on two sides of the rectangle (N=5). The value N denotes the power of the set of admissible values of the adaptive ME parameter.

В одном или более неограничивающих вариантах выполнения изобретения в качестве параметра алгоритма ME может выступать структура MV-кандидатов, включающая в себя количество и расположение MV-кандидатов. Количество может быть, например, равным 5 или 10 MV-кандидатов на блок. Кроме того, могут использоваться различные типы MV-кандидатов - пространственные, временные, случайно обновляемые (RandomUpdate), иерархические и т.п. В качестве параметра алгоритма ME также может выступать диапазон генератора случайных чисел (диапазон случайного поиска) MV-кандидатов, а также приоритетность кандидатов (например, могут рассматриваться в первую очередь пространственные MV-кандидаты, граничащие с соседними блоками, либо MV-кандидаты из соседних кадров (т.е. темпоральные MV-кандидаты, либо кандидаты с другого уровня масштабирования изображения (например, с уровня «пирамиды» изображений с более низким разрешением)). Вышеперечисленные различные виды MV-кандидатов могут ранжироваться по степени приоритетности и рассматриваться в разном порядке.In one or more non-limiting embodiments of the invention, the ME algorithm parameter may be a structure of MV candidates, including the number and arrangement of MV candidates. The number may be, for example, equal to 5 or 10 MV candidates per block. In addition, various types of MV candidates may be used - spatial, temporal, randomly updated (RandomUpdate), hierarchical, etc. The ME algorithm can also take as a parameter the range of the random number generator (random search range) of MV candidates, as well as the priority of the candidates (e.g., spatial MV candidates bordering neighboring blocks, or MV candidates from neighboring frames (i.e., temporal MV candidates, or candidates from another image scaling level (e.g., from the level of the "pyramid" of lower-resolution images)). The above-mentioned different types of MV candidates can be ranked by priority and considered in different orders.

Тип функции потерь при сопоставлении блоков может быть различным в разных частях изображения, и в одном или более вариантах выполнения могут адаптивно использоваться разные функции. В соответствии с изобретением, модель выполнена с возможностью определения того, какую функцию следует использовать в данном блоке, на основании статистического критерия, полученного в ходе обучения модели. Например, может использоваться функция суммы абсолютных разностей (SAD):The type of loss function for matching blocks may be different in different parts of the image, and in one or more embodiments, different functions may be used adaptively. According to the invention, the model is configured to determine which function should be used in a given block based on a statistical criterion obtained during training of the model. For example, the sum of absolute differences (SAD) function may be used:

В одном или более других вариантах выполнения изобретения может использоваться функция суммы Евклидовой разности:In one or more other embodiments of the invention, a Euclidean difference sum function may be used:

Также могут быть использованы, например, функция потерь Census (см., например, https://courses.cs.duke.edu/spring06/cps296.1/handouts/Zabih%20Woodfill%201994.pdf), обобщающая локальную структуру изображения, например:Also, for example, the Census loss function (see, for example, https://courses.cs.duke.edu/spring06/cps296.1/handouts/Zabih%20Woodfill%201994.pdf) can be used, which generalizes the local structure of the image, for example:

или, в качестве ее частного случая, функция потерь Triple Census:or, as a special case of it, the Triple Census loss function:

На этапе S8 обновляют индексы блоков x, y, чтобы далее выполнить оценку векторов движения для следующих блоков.At step S8, the indices of the blocks x, y are updated in order to further estimate the motion vectors for the following blocks.

На этапе S9 проверяют, все ли блоки обработаны алгоритмом. Если нет - процесс возвращается к этапу S7 и повторяется для блоков со следующими значениями индексов x, y. Если да - процесс переходит к этапу S10, на котором обновляется счетчик итераций алгоритма оценки движения (ME) (к значению S добавляется 1), инициализированный на этапе S5, и процесс переходит к этапу S11, на котором проверяется, все ли итерации цикла алгоритма оценки движения выполнены. Если да - на этапе S12 выводится итоговое поле векторов движения (MVF). Если нет - процесс возвращается к этапу S6 инициализации индексов блоков обрабатываемых изображений x, y.At step S9, it is checked whether all blocks have been processed by the algorithm. If not, the process returns to step S7 and is repeated for blocks with the next values of the x, y indices. If yes, the process goes to step S10, where the motion estimation (ME) algorithm iteration counter is updated (1 is added to the value of S), initialized at step S5, and the process goes to step S11, where it is checked whether all iterations of the motion estimation algorithm cycle have been completed. If yes, the final motion vector field (MVF) is output at step S12. If not, the process goes to step S6 of initializing the indices of the blocks of the x, y images being processed.

Следует отметить, что способ согласно изобретению не ограничен какими-либо конкретными параметрами алгоритма ME, приведенными выше, или их сочетаниями. Кроме того, специалистам в данной области техники могут быть очевидны другие параметры алгоритма ME и их сочетания, которые также могут быть использованы в процессе оценки векторов движения согласно изобретению, не выходящие за рамки объема правовой охраны изобретения.It should be noted that the method according to the invention is not limited to any specific parameters of the ME algorithm given above or their combinations. In addition, other parameters of the ME algorithm and their combinations may be obvious to specialists in this field of technology, which can also be used in the process of estimating the motion vectors according to the invention, without going beyond the scope of legal protection of the invention.

Конфигурирование модели прогнозирования пространственных параметров алгоритма оценки движения Configuring the spatial parameter prediction model of the motion estimation algorithm

На Фиг. 2 показана блок-схема процесса подготовки обучающего набора данных для конфигурирования модели прогнозирования пространственных параметров алгоритма оценки движения в соответствии с одним или более вариантами выполнения изобретенияFig. 2 shows a flow chart of the process of preparing a training data set for configuring a model for predicting spatial parameters of a motion estimation algorithm in accordance with one or more embodiments of the invention.

На этапе S11 принимают в качестве входных данных синтетический обучающий набор исходных данных, включающий К обучающих примеров, каждый из указанных примеров включает, по меньшей мере один опорный кадр, один исходный кадр и истинное поле векторов движения (GT). Кроме того, каждый обучающий пример может включать один или более дополнительных опорных кадров, метаданные, описанные выше, и данные с датчиков, не связанных с захватом изображения. На этапе S12 инициализируют счетчик элементов набора данных k=0. На этапе S13 получают из синтетического набора данных пару синтетических кадров REF_k (опорный кадр) и SRC_k (исходный кадр). На этапе S14 инициализируют набор параметров ME P={p1, p2, ..pN}, i=1.At step S11, a synthetic training set of original data is received as input data, comprising K training examples, each of said examples comprising at least one reference frame, one original frame and a true motion vector field (GT). In addition, each training example may comprise one or more additional reference frames, metadata described above and data from sensors not associated with image capture. At step S12, the counter of elements of the data set k=0 is initialized. At step S13, a pair of synthetic frames REF _k (reference frame) and SRC _k (original frame) is obtained from the synthetic data set. At step S14, the set of parameters ME P={p1, p2, ..pN}, i=1 is initialized.

На этапе S15 выполняют оценку движения (ME) с использованием параметра p На этапе S16 сохраняют результат оценки движения в поле векторов движения MVF_{i, k}. На этапе S17 обновляют значение i путем приращения на единицу. Сравнение значения i с N (где N - мощность множества допустимых параметров ME) дает результат либо i < N, и в таком случае процесс возвращается к этапу S15 и вновь выполняют ME с использованием параметра p с приращенным i, либо i=N, и в таком случае процесс переходит к этапу S18, на котором обновляют значение k путем приращения на единицу.In step S15, motion estimation (ME) is performed using the parameter p. In step S16, the result of motion estimation is stored in the motion vector field MVF _{i, k} . In step S17, the value of i is updated by incrementing by one. Comparison of the value of i with N (where N is the cardinality of the set of admissible parameters of ME) yields the result either i < N, in which case the process returns to step S15 and ME is performed again using the parameter p with the incremented i, or i=N, in which case the process proceeds to step S18, where the value of k is updated by incrementing by one.

Сравнение текущего значения k со значением K для используемого синтетического обучающего набора данных движения может давать результат k < K, и в таком случае процесс возвращается к этапу S13, на котором получают из синтетического набора данных пару синтетических кадров REF_k и SRC_k, и повторяться снова, либо k=K, и в таком случае процесс переходит к этапу S20 передачи подготовленного набора обучающих данных в систему обучения модели для осуществления ее дальнейшего обучения.Comparison of the current value of k with the value of K for the synthetic training set of motion data used may yield the result k < K, in which case the process returns to step S13, where a pair of synthetic frames REF _k and SRC _k are obtained from the synthetic data set and repeated again, or k=K, in which case the process proceeds to step S20 of transmitting the prepared training set of data to the model training system for further training.

Обучение модели прогнозирования пространственных параметров алгоритма оценки движения Training a model for predicting spatial parameters of a motion estimation algorithm

На Фиг. 3 представлена блок-схема процесса обучения модели прогнозирования пространственных параметров алгоритма оценки движения в соответствии с одним или более вариантами выполнения изобретения. В результате обучения модель способна строить поле векторов движения из векторов движения, полученных при различных значениях параметров, в результате чего получается контролирующий сигнал для глубокого обучения модели. В одном или более неограничивающих вариантах выполнения модель прогнозирования пространственных параметров алгоритма оценки движения имеет архитектуру сверточной нейронной сети. Кроме того, как указано выше, в способе согласно изобретению может быть использована иная дискриминационная или регрессионная модель машинного обучения.Fig. 3 shows a block diagram of the process of training a model for predicting spatial parameters of a motion estimation algorithm in accordance with one or more embodiments of the invention. As a result of training, the model is capable of constructing a field of motion vectors from motion vectors obtained at different parameter values, resulting in a control signal for deep training of the model. In one or more non-limiting embodiments, the model for predicting spatial parameters of a motion estimation algorithm has a convolutional neural network architecture. In addition, as indicated above, another discriminatory or regression machine learning model may be used in the method according to the invention.

На этапе S21 инициализируют модель Y прогнозирования пространственных параметров алгоритма оценки движения. В одном или более конкретных неограничивающих вариантах выполнения модель может представлять собой сверточную нейронную сеть. В качестве входных данных на этапе инициализации модели используется 2-канальный тензор ([С x F] x 128×128) (где значение F соответствует количеству входных изображений, например F=2 для случая одно опорное изображение плюс одно исходное изображение, а значение С соответствует количеству цветовых каналов во входных изображениях, например С=1 для случая использования одного канала яркости входных изображений). Выходные данные представляют собой тензор [2 x N] х 32×32 (где значение «2» соответствует размерности 2-мерного вектора движения, а N - мощность множества допустимых параметров ME).At step S21, a model Y for predicting spatial parameters of the motion estimation algorithm is initialized. In one or more specific non-limiting embodiments, the model may be a convolutional neural network. The input data at the model initialization step is a 2-channel tensor ([C x F] x 128×128) (where the value of F corresponds to the number of input images, for example F=2 for the case of one reference image plus one original image, and the value of C corresponds to the number of color channels in the input images, for example C=1 for the case of using one brightness channel of the input images). The output data is a tensor [2 x N] x 32×32 (where the value "2" corresponds to the dimension of the 2-dimensional motion vector, and N is the power of the set of admissible ME parameters).

На этапе S22 выбирают индекс k пары кадров.At step S22, the index k of a pair of frames is selected.

На этапе S23 вычисляют оцененную карту параметров:At step S23, the estimated parameter map is calculated:

y=Y(SRC_k, REF_k)y=Y(SRC _k , REF _k )

где SRC_k - исходный кадр, REF_k - опорный кадр.where SRC _k is the source frame, REF _k is the reference frame.

Для этого из обучающего набора данных выбирается случайный элемент. Выполняется обрезка участка изображения в 128×128 пикселей, и формируется соответствующее поле векторов движения (MVF) размерности 32×32 из эталонного поля векторов движения (GT) и MVF_1…N.To do this, a random element is selected from the training data set. A 128×128 pixel image area is cropped, and a corresponding motion vector field (MVF) of dimension 32×32 is formed from the reference motion vector field (GT) and MVF _1…N .

На этапе S24 выполняют построение поля векторов движения MVF_k(y) из MVF_{i, k}, где i=1..N, относительно оцененной карты параметров y, вычисленной на этапе S23.At step S24, the motion vector field MVF _k (y) is constructed from MVF _{i, k} , where i=1..N, relative to the estimated parameter map y calculated at step S23.

Данный этап содержит вычисления т.н. прямого прохода по модели. Выходные данные пропускаются через функцию Softmax по всей размерности канала. Выходные данные слоя, реализующего функцию Softmax, используются для вычисления взвешенной суммы MVF_1..N,чтобы оценить выход пространственно адаптивного алгоритма ME.This stage contains the calculations of the so-called forward pass through the model. The output data is passed through the Softmax function over the entire channel dimension. The output data of the layer implementing the Softmax function are used to calculate the weighted sum of the MVF_1..N,to evaluate the output of the spatially adaptive ME algorithm.

Далее, на этапе S25 вычисляется функция потерь (ошибка) L между векторами поля векторов движения MVF_{y, k} и векторами движения для эталонного изображения GT_k. Упомянутая выше взвешенная сумма MVF_1..N, полученная на предыдущем этапе, сравнивается с полем векторов движения для эталонного изображения с использованием одной из функций потерь, которые известны в данной области техники (в качестве неограничивающего примера - функции SmoothL1Loss).Next, in step S25, the loss function (error) L is calculated between the vectors of the motion vector field MVF _{y, k} and the motion vectors for the reference image GT _k . The above-mentioned weighted sum MVF _1..N obtained in the previous step is compared with the motion vector field for the reference image using one of the loss functions known in the art (as a non-limiting example, the SmoothL1Loss function).

На этапе S26 модель Y прогнозирования пространственных параметров алгоритма оценки движения обновляется путем обратного распространения градиента ошибки L. При этом весовые коэффициенты модели обновляются, как известно в данной области техники. При этом в одном или более вариантах выполнения изобретения может использоваться, в частности, алгоритм оптимизации Адама.In step S26, the model Y of the prediction of the spatial parameters of the motion estimation algorithm is updated by backpropagating the gradient of the error L. In this case, the weight coefficients of the model are updated, as is known in the art. In this case, in one or more embodiments of the invention, in particular, the Adam optimization algorithm can be used.

В одном или более конкретных неограничивающих вариантах выполнения изобретения в качестве алгоритма сопоставления блоков может использоваться алгоритм ME с полным поиском, или алгоритм с полным поиском по всей «пирамиде» изображений, или другие подходящие алгоритмы, известные в данной области техники, например алгоритм нахождения соответствий между участками PatchMatch.In one or more specific non-limiting embodiments of the invention, the block matching algorithm may use an ME algorithm with an exhaustive search, or an algorithm with an exhaustive search over the entire “pyramid” of images, or other suitable algorithms known in the art, such as the PatchMatch algorithm for finding correspondences between regions.

На этапе S27 определяется, достигнута ли требуемая сходимость в модели Y прогнозирования пространственных параметров алгоритма оценки движения. Если сходимость достигнута, процесс переходит к этапу S28, если нет - процесс возвращается к этапу S22 выбора индекса k пары кадров и повторяется вновь.In step S27, it is determined whether the required convergence in the model Y of the spatial parameter prediction of the motion estimation algorithm has been achieved. If convergence has been achieved, the process proceeds to step S28; if not, the process returns to step S22 of selecting the index k of the pair of frames and is repeated again.

На этапе S28 полученная модель Y прогнозирования пространственных параметров алгоритма оценки движения сохраняется в памяти. Полученная модель выполнена с возможностью построения поля векторов движения из векторов движения, полученных для различных значений указанных параметров, обеспечивая таким образом контролирующий сигнал для глубокого обучения модели.At step S28, the obtained model Y of predicting the spatial parameters of the motion estimation algorithm is stored in memory. The obtained model is configured to construct a field of motion vectors from the motion vectors obtained for different values of the said parameters, thus providing a control signal for deep learning of the model.

В качестве неограничивающего примера, модель Y прогнозирования пространственных параметров алгоритма оценки движения может конфигурироваться следующим образом.As a non-limiting example, a model Y for predicting spatial parameters of a motion estimation algorithm may be configured as follows.

Сначала формируют пары кадров, в которых присутствует движение, используя любые подходящие средства формирования 3D графики (в качестве неограничивающего примера - программу Blender).First, pairs of frames are formed in which movement is present, using any suitable means of generating 3D graphics (as a non-limiting example, the Blender program).

При этом к кадрам предъявляются следующие требования:The following requirements are imposed on the personnel:

- Изображения в кадрах должны содержать полусферический фон с расширенным динамическим диапазоном HDR и с высоким разрешением, а также движущиеся 3D объекты;- Images in frames must contain a hemispherical background with extended dynamic range (HDR) and high resolution, as well as moving 3D objects;

- Движение камеры должно быть конфигурировано в соответствующей программе (например, ПО Blender) таким образом, чтобы имитировать дрожание рук пользователя, держащего камеру;- The camera movement must be configured in an appropriate program (e.g. Blender software) in such a way as to simulate the shaking of the user's hands holding the camera;

- Скорость объектов должна быть также конфигурирована в соответствующем ПО таким образом, чтобы она соответствовала диапазону характерных скоростей объектов реального мира;- The speed of objects must also be configured in the corresponding software in such a way that it corresponds to the range of characteristic speeds of objects in the real world;

- Рендеринг кадров должен осуществляться с разрешением, соответствующим целевому применению, в качестве неограничивающего примера - 3000×4000 x 16 бит на пиксель;- Frames must be rendered at a resolution appropriate to the intended use, such as, but not limited to, 3000x4000 x 16 bits per pixel;

- Оптическое поле эталонного изображения на выходе должно быть конфигурировано в соответствующем ПО (т.е., например, Blender);- The optical field of the reference image at the output must be configured in the appropriate software (i.e., for example, Blender );

- Рендерируемые кадры должны подвергнуться обратной обработке для моделирования изображения в формате CFA с датчика камеры с соответствующим шумом и параметрами экспозиции.- Rendered frames must be reverse processed to simulate a CFA image from the camera sensor with appropriate noise and exposure parameters.

Набор параметров ME в данном контексте следует понимать как набор категорий (например, для отсечки SAD - P={0, 1, 2, …, 19}, где числа указывают индекс класса значения параметра). Оценка движения выполняется для каждой сформированной пары кадров, и с каждым параметром, указанным в наборе P. Полученное MVF сохраняется по отдельности для каждого сочетания пары кадров и параметра. Итоговый обучающий набор данных содержит следующие элементы:The set of ME parameters in this context should be understood as a set of categories (e.g. for the SAD cutoff - P={0, 1, 2, …, 19}, where the numbers indicate the class index of the parameter value). Motion estimation is performed for each generated frame pair, and with each parameter specified in the set P. The resulting MVF is stored separately for each combination of frame pair and parameter. The final training dataset contains the following elements:

- опорный кадр;- reference frame;

- исходный кадр;- original frame;

- MVF эталонного изображения;- MVF of the reference image;

- MVF для параметра 1;- MVF for parameter 1;

- MVF для параметра 2;- MVF for parameter 2;

......

- MVF для параметра N.- MVF for parameter N.

В одном или более вариантах выполнения изобретения инициализируется набор параметров P={p1, p2, ..pN}. При этом параметры p относятся к категорическому классу. Инициализируется цикл, проходящий по всему набору параметров P (индекс i). Оценка движения выполняется для кадров REF_k, SRC_k с использованием глобального параметра p для алгоритма ME. Полученное поле векторов движения сохраняется для тензора MVF_{i, k}. Циклы заканчиваются по достижении значений i и k. Таким образом формируется по меньшей мере обучающий набор данных из MVF_{i, k}тензоров данных, где i=1..N; k=1..K.In one or more embodiments of the invention, a set of parameters P={p1, p2, ..pN} is initialized. In this case, the parameters p belong to the categorical class. A loop is initialized that goes through the entire set of parameters P (index i). Motion estimation is performed for frames REF _k , SRC _k using the global parameter p for the ME algorithm. The obtained field of motion vectors is stored for the tensor MVF _{i, k} . The loops end upon reaching the values i and k. In this way, at least a training data set is formed from MVF _{i, k} data tensors, where i=1..N; k=1..K.

Поле векторов движения MVF_k(y)строится из MVF_{i, k}, i=1..N относительно параметра карты y. Для обеспечения возможности обратного распространения градиента ошибки во время обучения MVF_k(y) строится в виде взвешенной суммы MVF_{i, k}, где соответствующее значение y является весовым коэффициентом. Во время тестирования (применения) модели, MVF_k(y) строится путем применения операции agrmax в разрезе i=1..N.The motion vector field MVF _k(y) is constructed from MVF _{i, k} , i=1..N with respect to the map parameter y. To ensure the possibility of backpropagation of the error gradient during training, MVF _k(y) is constructed as a weighted sum of MVF _{i, k} , where the corresponding value of y is a weighting coefficient. During testing (application) of the model, MVF _k(y) is constructed by applying the agrmax operation in the section i=1..N.

На Фиг. 7 показано схематичное изображение структуры модели формирования карты пространственных параметров алгоритма оценки движения согласно по меньшей мере одному варианту выполнения изобретения, где на входе в модель вводятся два значения канала яркости (Y) (для опорного и исходного кадров, соответственно), а на выходе выводится карта параметров M алгоритма оценки движения. На Фиг. 8 показано схематичное изображение структуры модели прогнозирования пространственных параметров алгоритма оценки движения по другому неограничивающему варианту выполнения изобретения, отличающемуся от предыдущего варианта тем, что в модель вводится предварительно оцененное поле векторов движения MVF, которое подвергается обработке серией сверточных фильтров до размера блока фильтров 128х32х32 и конкатенируется со значениями блока фильтров каналов яркости совпадающей размерности.Fig. 7 shows a schematic representation of the structure of the model for forming a map of spatial parameters of the motion estimation algorithm according to at least one embodiment of the invention, where two values of the brightness channel (Y) are entered at the input of the model (for the reference and original frames, respectively), and a map of parameters M of the motion estimation algorithm is output at the output. Fig. 8 shows a schematic representation of the structure of the model for predicting spatial parameters of the motion estimation algorithm according to another non-limiting embodiment of the invention, which differs from the previous embodiment in that a pre-estimated field of motion vectors MVF is entered into the model, which is subjected to processing by a series of convolutional filters up to a filter block size of 128x32x32 and is concatenated with the values of the filter block of brightness channels of the same dimension.

Система оценки движенияMotion assessment system в последовательности изображенийin a sequence of images

Во втором аспекте настоящее изобретение относится к системе оценки движения в последовательности изображений, которая выполнена с возможностью реализации способа по первому описанному выше аспекту изобретения. Принципиальная схема системы оценки движения в последовательности изображений приведена на Фиг. 5.In a second aspect, the present invention relates to a system for estimating motion in a sequence of images, which is designed to implement the method according to the first aspect of the invention described above. A basic diagram of the system for estimating motion in a sequence of images is shown in Fig. 5.

Обращаясь к Фиг. 5, система оценки движения в последовательности изображений содержит по меньшей мере следующие элементы. Позицией 101 обозначен объектив камеры, захватывающей изображение, для которого необходимо выполнить оценку движения (исходное изображение). Позицией 102 обозначен датчик изображения, на котором формируется необработанное изображение трехмерной сцены реального мира на основе оптического излучения, захваченного камерой через объектив 101, предпочтительно в формате CFA, соответствующем изображению с датчика изображения без обработки и сжатия. Позицией 103 обозначен буфер CFA, в котором временно сохраняются изображения в формате CFA, захваченные камерой. Позицией 104 обозначен блок предварительной обработки изображений, в который поступают изображения из буфера 103 CFA для выполнения предварительной обработки изображений, в частности, состоящей в регулировке яркости и/или других параметров изображения, а также в преобразовании изображений в другие форматы, подходящие для дальнейшей обработки, в качестве неограничивающего примера - в формат YUV.Referring to Fig. 5, the system for estimating motion in a sequence of images comprises at least the following elements. Reference 101 denotes a lens of a camera capturing an image for which it is necessary to perform motion estimation (original image). Reference 102 denotes an image sensor on which an unprocessed image of a three-dimensional scene of the real world is formed on the basis of optical radiation captured by the camera through the lens 101, preferably in the CFA format corresponding to the image from the image sensor without processing and compression. Reference 103 denotes a CFA buffer in which the images in CFA format captured by the camera are temporarily stored. Reference 104 denotes an image pre-processing unit that receives images from the CFA buffer 103 for performing image pre-processing, in particular consisting of adjusting the brightness and/or other image parameters, as well as converting the images into other formats suitable for further processing, as a non-limiting example - into the YUV format.

Позицией 105 обозначен буфер YUV, в котором временно сохраняются изображения, прошедшие преобразование в блоке 104 предварительной обработки изображений. Позицией 106 обозначен блок формирования пространственной карты параметров алгоритма оценки движения, позицией 107 обозначен блок хранения пространственной карты параметров алгоритма оценки движения, сформированной блоком 106 формирования пространственной карты параметров алгоритма оценки движения.The YUV buffer is designated by position 105, in which the images that have undergone transformation in the preliminary image processing block 104 are temporarily stored. The block for forming a spatial map of the parameters of the motion estimation algorithm is designated by position 106, and the block for storing the spatial map of the parameters of the motion estimation algorithm, formed by the block 106 for forming a spatial map of the parameters of the motion estimation algorithm, is designated by position 107.

Позицией 108 обозначен блок оценки движения (ME), выполненный с возможностью формирования и вывода поля векторов движения в виде двумерного (2D) массива смещений от положения каждого пикселя на первом изображении в по меньшей мере одно положение соответствующего пикселя на втором изображении в упомянутой паре изображений. Позицией 109 обозначен буфер полей векторов движения (MVF), выполненный с возможностью временного сохранения MVF, сформированных блоком 108 ME.The position 108 denotes a motion estimation unit (ME), configured with the possibility of forming and outputting a field of motion vectors in the form of a two-dimensional (2D) array of offsets from the position of each pixel in the first image to at least one position of the corresponding pixel in the second image in the mentioned pair of images. The position 109 denotes a buffer of motion vector fields (MVF), configured with the possibility of temporarily storing MVFs formed by the ME unit 108.

Условные обозначения 1...8 на Фиг. 5 указывают последовательность передачи данных между блоками системы оценки движения в последовательности изображений в соответствии с одним или более вариантами выполнения изобретения. Условное обозначение RAM указывает на то, что блоки 103, 105, 107 и 109, описанные выше, могут быть реализованы посредством одного или более оперативных запоминающих устройств (RAM). Часть, обозначенная на Фиг. 5 условным обозначением CPU, указывает на то, что соответствующие блоки 104, 108 могут быть реализованы посредством по меньшей мере одного процессора, такого как центральный процессор общего назначения. Часть, обозначенная на Фиг. 5 условным обозначением GPU/NPU/DSP, указывает на то, что соответствующий блок 106 может быть реализован посредством по меньшей мере одного специализированного процессора, такого как графический процессор (GPU), нейронный процессор (NPU) или цифровой сигнальный процессор (DSP).The symbols 1...8 in Fig. 5 indicate a sequence of data transmission between the blocks of the motion estimation system in a sequence of images in accordance with one or more embodiments of the invention. The symbol RAM indicates that the blocks 103, 105, 107 and 109 described above can be implemented by means of one or more random access memories (RAM). The part indicated in Fig. 5 by the symbol CPU indicates that the corresponding blocks 104, 108 can be implemented by means of at least one processor, such as a general-purpose central processor. The part indicated in Fig. 5 by the symbol GPU/NPU/DSP indicates that the corresponding block 106 can be implemented by means of at least one specialized processor, such as a graphics processor (GPU), a neural processor (NPU) or a digital signal processor (DSP).

Специалистам в данной области техники следует понимать, что объем изобретения не ограничен перечисленными выше материально-техническими средствами, реализующими компоненты системы согласно изобретению, и для реализации перечисленных выше блоков могут использоваться и иные средства, такие как одна или более интегральных схем (ASIC), программируемых пользователем вентильных матриц (FPGA), один или более микропроцессоров и т.п. Кроме того, в системе могут быть предусмотрены иные компоненты помимо перечисленных выше, например - по меньшей мере одно постоянное запоминающее устройство (ROM), которое может быть реализовано одним или более видами памяти, известными специалистам в данной области техники. Также следует понимать, что перечисленные выше блоки могут быть реализованы на практике в виде различных сочетаний аппаратных и программных средств, насколько это применимо в контексте настоящего изобретения.It should be understood by those skilled in the art that the scope of the invention is not limited to the above-listed material and technical means implementing the components of the system according to the invention, and other means can be used to implement the above-listed blocks, such as one or more integrated circuits (ASIC), field programmable gate arrays (FPGA), one or more microprocessors, etc. In addition, other components can be provided in the system in addition to those listed above, for example, at least one read-only memory (ROM), which can be implemented by one or more types of memory known to those skilled in the art. It should also be understood that the above-listed blocks can be implemented in practice in the form of various combinations of hardware and software, as applicable in the context of the present invention.

Система обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения Model Y learning system for predicting spatial parameters of motion estimation algorithm

В дополнительном аспекте настоящее изобретение может относиться к системе обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения, которая выполнена с возможностью использования в способе по первому аспекту и/или в системе по второму аспекту настоящего изобретения для обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения формированию поля векторов движения (MVF) в соответствующих вариантах выполнения настоящего изобретения. Принципиальная схема системы обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения по одному или более вариантам выполнения настоящего изобретения приведена на Фиг. 6.In a further aspect, the present invention may relate to a system for training a model Y for predicting spatial parameters of a motion estimation algorithm, which is configured to be used in the method according to the first aspect and/or in the system according to the second aspect of the present invention for training the model Y for predicting spatial parameters of a motion estimation algorithm to form a motion vector field (MVF) in the respective embodiments of the present invention. A schematic diagram of the system for training the model Y for predicting spatial parameters of a motion estimation algorithm according to one or more embodiments of the present invention is shown in Fig. 6.

Обращаясь к Фиг. 6, система обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения содержит по меньшей мере следующие элементы. Позицией 201 на Фиг. 6 обозначена рабочая станция для рендеринга. Позиция 202 обозначает узел серверного процессора (CPU), позиция 203 обозначает узел серверного процессора с графическим ускорителем (GPU), позиция 204 - долговременную память. Условные обозначения 1...8, 7’ на Фиг. 6 указывают последовательность передачи данных между блоками системы обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения в соответствии с одним или более вариантами выполнения изобретения.Referring to Fig. 6, the system for training the model Y for predicting spatial parameters of the motion estimation algorithm comprises at least the following elements. The reference numeral 201 in Fig. 6 denotes a rendering workstation. The reference numeral 202 denotes a server processor node (CPU), the reference 203 denotes a server processor node with a graphics accelerator (GPU), the reference 204 denotes a long-term memory. The symbols 1...8, 7' in Fig. 6 indicate a sequence of data transmission between the units of the system for training the model Y for predicting spatial parameters of the motion estimation algorithm in accordance with one or more embodiments of the invention.

Позиция 210 обозначает блок рендеринга, выполненный с возможностью рендеринга последовательностей кадров с эталонным полем векторов движения (GT). Позиция 211 обозначает блок хранения линейных TIFF файлов, полученных в результате работы программы для рендеринга; блок 212 - блок хранения двоичных файлов MVF, соответствующих эталонным полям векторов движения (GT). Позиция 221 обозначает блок обратной обработки для моделирования изображения в формате CFA. Позиция 222 обозначает блок оценки движения (ME) с каждым набором параметров. Позиция 223 обозначает блок хранения файлов CFA (с присущими им шумом, параметрами экспозиции), позиция 224 обозначает блок хранения полей векторов движения MVF_{i, k}.Position 210 denotes a rendering unit configured to render frame sequences with a reference field of motion vectors (GT). Position 211 denotes a unit for storing linear TIFF files obtained as a result of the rendering program; unit 212 is a unit for storing binary MVF files corresponding to the reference fields of motion vectors (GT). Position 221 denotes a unit for inverse processing for modeling an image in the CFA format. Position 222 denotes a unit for motion estimation (ME) with each set of parameters. Position 223 denotes a unit for storing CFA files (with their inherent noise, exposure parameters), position 224 denotes a unit for storing motion vector fields MVF _{i, k} .

Позицией 231 на Фиг. 6 обозначен блок обучения модели. Позицией 232 обозначен блок формирования наборов данных. Позицией 233 обозначен блок хранения весовых коэффициентов обученной модели. Как и в системе по второму аспекту настоящего изобретения, описанному выше, специалистам в данной области техники следует понимать, что объем изобретения не ограничен перечисленными выше материально-техническими средствами, реализующими компоненты системы согласно изобретению, и для реализации перечисленных выше блоков могут использоваться и иные средства, такие как одна или более интегральных схем (ASIC), программируемых пользователем вентильных матриц (FPGA), один или более микропроцессоров и т.п. Кроме того, в системе могут быть предусмотрены иные компоненты помимо перечисленных выше, например - по меньшей мере одно оперативное запоминающее устройство (RAM), которое может быть реализовано одним или более видами памяти, известными специалистам в данной области техники. Также следует понимать, что перечисленные выше блоки могут быть реализованы на практике в виде различных сочетаний аппаратных и программных средств, насколько это применимо в контексте настоящего изобретения.The model training unit is designated by reference numeral 231 in Fig. 6. The data set generation unit is designated by reference numeral 232. The trained model weight coefficient storage unit is designated by reference numeral 233. As in the system according to the second aspect of the present invention described above, those skilled in the art should understand that the scope of the invention is not limited to the above-listed material and technical means implementing the components of the system according to the invention, and other means can be used to implement the above-listed units, such as one or more integrated circuits (ASIC), field programmable gate arrays (FPGA), one or more microprocessors, etc. In addition, other components can be provided in the system in addition to those listed above, for example, at least one random access memory (RAM), which can be implemented by one or more types of memory known to those skilled in the art. It should also be understood that the above-listed units can be implemented in practice in the form of various combinations of hardware and software, as far as applicable in the context of the present invention.

Описанная выше система обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения выполнена с возможностью получения обучающих примеров выполнения алгоритма ME с учетом различных наборов параметров. При этом предлагаемый способ, в реализации которого применяется система обучения модели Y прогнозирования пространственных параметров алгоритма оценки движения, может применяться в широком спектре различных алгоритмов сопоставления блоков. При этом способ согласно изобретению и соответствующие системы, реализующие способ, не зависят от характера конкретных параметров. В сети с унифицированным обучением может быть использован широкий выбор параметров и их сочетаний, неограничивающие примеры которых приведены выше.The above-described system for training the Y model for predicting spatial parameters of the motion estimation algorithm is designed with the ability to obtain training examples of the ME algorithm execution taking into account various sets of parameters. In this case, the proposed method, in the implementation of which the system for training the Y model for predicting spatial parameters of the motion estimation algorithm is used, can be used in a wide range of different block matching algorithms. In this case, the method according to the invention and the corresponding systems implementing the method do not depend on the nature of specific parameters. A wide selection of parameters and their combinations can be used in a network with unified training, non-limiting examples of which are given above.

В конкретных применениях изобретение может обеспечивать следующие полезные эффекты. Например, при уменьшении шума на видеоизображении при очень низком освещении изобретение при незначительном возрастании вычислительной сложности по сравнению с традиционными алгоритмами сопоставления блоков для оценки движения позволяет обрабатывать видеоизображения с частотой повторения кадров (FPS), равной 30 кадрам в секунду, с разрешением 4К в режиме реального времени. При этом в GPU используется приблизительно ~250 тысяч весовых коэффициентов и типичная сверточная модель с одним прямым проходом.In specific applications, the invention can provide the following useful effects. For example, when reducing noise in a video image under very low lighting, the invention, with a slight increase in computational complexity compared to traditional block matching algorithms for motion estimation, allows processing video images with a frame repetition rate (FPS) of 30 frames per second, with a resolution of 4K in real time. At the same time, the GPU uses approximately ~250 thousand weighting coefficients and a typical convolutional model with one forward pass.

При ночной съемке HDR изображений обеспечивается незначительное возрастание вычислительной сложности по сравнению с традиционными алгоритмами ME с сопоставлением блоков и с обеспечением возможности обработки HDR фотоизображений в разумное для коммерческого продукта время.Night-time HDR imaging provides a small increase in computational complexity compared to traditional block-matching ME algorithms while still allowing HDR photo processing to be completed in a time frame that is reasonable for a commercial product.

Специалистам в данной области техники будет понятно, что изобретение может быть реализовано посредством различных сочетаний аппаратных и программных средств, и никакие из этих конкретных сочетаний не ограничивают объем правовой охраны настоящего изобретения. Технология согласно изобретению может быть реализована одним или более компьютерами, процессорами (CPU), такими как процессоры общего назначения или специализированные процессоры, такие как цифровые сигнальные процессоры (DSP), или одной или более ASIC, FPGA, логическими элементами и т.п. В качестве альтернативы, один или более из ее элементов или этапов способа может быть реализован в виде программных средств, таких как, например, программа или программы, компьютерные программные элементы или модули, которые управляют одним или более компьютерами, CPU и т.п. Эти программные средства могут быть реализованы на одном или более машиночитаемых носителях, которые хорошо известны специалистам в данной области техники, могут быть сохранены в одном или более блоках памяти, таких как ROM, RAM, флэш-память, EEPROM и т.п., или при необходимости переданы, например, с удаленных серверов по одному или более проводным и/или беспроводным сетевым соединениям, сети Интернет, соединению Ethernet, локальным вычислительным сетям (LAN) или другим локальным или глобальным вычислительным сетям.It will be clear to those skilled in the art that the invention may be implemented by various combinations of hardware and software, and none of these specific combinations limit the scope of protection of the present invention. The technology according to the invention may be implemented by one or more computers, processors (CPU), such as general-purpose processors or specialized processors, such as digital signal processors (DSP), or one or more ASICs, FPGAs, logic elements, etc. Alternatively, one or more of its elements or method steps may be implemented as software, such as, for example, a program or programs, computer program elements or modules that control one or more computers, CPUs, etc. These software tools may be implemented on one or more machine-readable media that are well known to those skilled in the art, may be stored in one or more memory units such as ROM, RAM, flash memory, EEPROM, etc., or, if necessary, transmitted, for example, from remote servers over one or more wired and/or wireless network connections, the Internet, an Ethernet connection, local area networks (LANs), or other local or global computer networks.

Специалистам в данной области техники следует понимать, что выше описаны и показаны на чертежах лишь некоторые из возможных примеров технологий и материально-технических средств, которыми могут быть реализованы варианты выполнения настоящего изобретения. Подробное описание вариантов выполнения изобретения, приведенное выше, не предназначено для ограничения или определения объема правовой охраны настоящего изобретения.Those skilled in the art should understand that only some of the possible examples of technologies and material and technical means by which embodiments of the present invention can be implemented are described and shown in the drawings above. The detailed description of embodiments of the invention given above is not intended to limit or determine the scope of legal protection of the present invention.

Промышленная применимостьIndustrial applicability

В конкретных вариантах практической реализации изобретение может быть использовано в области компьютерного зрения, при обработке изображений для повышения качества изображений, удаления шума на изображениях, сжатия видеоизображений, восстановления трехмерных (3D) сцен, одновременного определения местоположения и картографирования (SLAM), распознавания/обнаружения/сегментации объектов и действий на видеоизображении. Кроме того, изобретение может быть использовано в системах виртуальной реальности (VR) или дополненной реальности (AR), в системах управления автономными транспортными средствами и т.п.In specific embodiments of practical implementation, the invention can be used in the field of computer vision, in image processing to improve image quality, remove noise from images, compress video images, restore three-dimensional (3D) scenes, simultaneously determine the location and map (SLAM), recognize/detect/segment objects and actions in a video image. In addition, the invention can be used in virtual reality (VR) or augmented reality (AR) systems, in autonomous vehicle control systems, etc.

В частности, в применениях по преобразованию видеоизображения с повышением частоты кадров (FRUC) очень важно иметь информацию о действительных векторах движения для интерполяции промежуточного кадра. При сжатии видеоданных, чем лучше оценен вектор движения, тем выше коэффициент сжатия.In particular, in video frame rate upconversion (FRUC) applications, it is very important to have information about the actual motion vectors for interpolation of the intermediate frame. In video data compression, the better the motion vector is estimated, the higher the compression ratio.

При этом видеоизображения, на которых присутствует значительное количество шума, как правило, требуют более высокую скорость передачи данных, чем обычные изображения, поэтому повышение коэффициента сжатия за счет более точной оценки движения приводит к большей экономии полосы пропускания при передаче видеопотока.However, video images that contain a significant amount of noise typically require a higher data rate than normal images, so increasing the compression ratio through more accurate motion estimation results in greater bandwidth savings when transmitting a video stream.

В применениях, связанных с VR/AR, для улучшения реакции гарнитуры на движения пользователя используется технология асинхронного репроецирования. При этом используются один или более ранее рендерированных кадров и более новая информация о движении с датчиков гарнитуры для экстраполяции предыдущего кадра на прогноз того, каким образом должен выглядеть нормально рендерированный кадр. В робототехнике, автономно управляемых транспортных средствах, медицинских устройствах изобретение может обеспечить возможность одновременного определения местоположения и отображения.In VR/AR applications, asynchronous reprojection technology is used to improve the headset's response to user motion. This uses one or more previously rendered frames and newer motion information from the headset's sensors to extrapolate the previous frame into a prediction of what a normally rendered frame should look like. In robotics, autonomous vehicles, and medical devices, the invention can enable simultaneous location and display.

Специалистами в данной области техники после внимательного прочтения вышеприведенного описания с обращением к сопровождающим чертежам могут быть предусмотрены другие варианты выполнения, охватываемые объемом настоящего изобретения, и все такие очевидные изменения, модификации и/или эквивалентные замены считаются включенными в объем настоящего изобретения. Все источники из уровня техники, приведенные и рассмотренные в настоящем документе, настоящим включены в данное описание путем ссылки, насколько это применимо.Other embodiments within the scope of the present invention may be envisaged by those skilled in the art upon careful reading of the above description with reference to the accompanying drawings, and all such obvious changes, modifications and/or equivalent substitutions are considered to be included within the scope of the present invention. All prior art references cited and discussed herein are hereby incorporated by reference into this description to the extent applicable.

При том, что настоящее изобретение было описано и проиллюстрировано с обращением к различным вариантам его выполнения, специалистам в данной области техники следует понимать, что в его форму и конкретные детали могут быть внесены различные изменения, не выходящие за рамки объема настоящего изобретения, который определяется только нижеприведенной формулой изобретения и ее эквивалентами.While the present invention has been described and illustrated with reference to various embodiments thereof, it will be understood by those skilled in the art that various changes may be made in form and specific details thereof without departing from the scope of the present invention, which is defined only by the following claims and their equivalents.

Claims

1. A method for estimating motion in a sequence of images, comprising the steps of:

obtaining at least two images from an input bitstream containing a plurality of images,

performing pre-processing of images, comprising at least brightness adjustment;

estimating a map M of parameters of a motion estimation algorithm (ME) for said at least two images using a pre-trained model for predicting spatial parameters of a motion estimation algorithm to said at least two images, wherein the dimension of the map M of parameters of the motion estimation algorithm depends on the method of dividing the image into blocks;

save the map M of parameters of the motion estimation algorithm;

performing motion estimation by applying a motion estimation algorithm and parameters from a motion estimation algorithm parameter map M to each block of said at least two images, wherein the motion estimation algorithm parameter map M determines the motion estimation algorithm parameters separately for each block; and

outputting a motion vector field (MVF) in the form of a two-dimensional (2D) array of offsets from the position of each pixel in the first image to at least one position of the corresponding pixel in the second image of the at least two images.

2. The method according to claim 1, wherein the preliminary processing of said at least two images comprises a step of reducing the scale of the images with the formation of a pyramid of images with at least two reduced scales.

3. The method according to claim 2, wherein the image scaling comprises a step in which spatial filtering of the pixels of the raw image is performed.

4. The method according to claim 1, wherein obtaining said at least two images from the bitstream further comprises a step of obtaining at least one of a preliminary MVF estimate, coded metadata related to image capture parameters, and sensor data not related to image formation.

5. The method according to claim 1, wherein obtaining said at least two images from the bitstream comprises a step in which at least one of the brightness channel data, three-channel color images in the RGB or YUV color spaces are obtained.

6. The method according to claim 1, wherein the pre-configured model for predicting spatial parameters of the motion estimation algorithm has the architecture of a convolutional neural network or a discriminative or regression machine learning model.

7. The method according to claim 1, wherein the motion estimation algorithm is selected from at least one of an algorithm based on comparing blocks by the exhaustive search method, using as the minimum value of the error functional the sum of the absolute values of the difference in the intensity value of each pixel in the block (SAD), a 3D recursive search algorithm (3DRS), a block erosion algorithm with sub-pixel refinement of motion vectors (MV), an algorithm for sub-pixel discretization of candidate blocks.

8. The method according to claim 1, further comprising a step in which a model for predicting spatial parameters of the motion estimation algorithm is trained, wherein training the model for predicting spatial parameters of the algorithm comprises steps in which:

receiving as input a synthetic training set of initial data, including K training examples, wherein each of the training examples includes at least one reference frame, one original frame and a true motion vector (GT) field;

a pair of synthetic frames is obtained from said synthetic data set, containing a reference frame REF _k and a source frame SRC _k ;

select index k of a pair of frames;

calculate the estimated map of parameters y:

y=Y(SRC _k , REF _k );

form a motion vector field MVF _k (y) from MVF _{i, k} , where i = 1…N, relative to the estimated parameter map y by calculating the weighted sum MVF _1…N , in order to estimate the output of the spatially adaptive ME algorithm;

calculate the loss function L between the vectors of the motion vector field MVF _{y, k} and the reference motion vectors GT _k ;

update the prediction model of spatial parameters of the motion estimation algorithm by backpropagating the gradient of the loss function L;

determine whether the required convergence has been achieved in the spatial parameter prediction model of the motion estimation algorithm; and

store the weight coefficients of the trained model for predicting spatial parameters of the motion estimation algorithm in memory.

9. The method according to item 8, in which, at the stage of forming the field of motion vectors, weighting coefficients obtained by applying the Softmax function are used to calculate the weighted sum.

10. A system for estimating motion in a sequence of images, comprising:

an image pre-processing unit configured to perform pre-processing of at least two images obtained from an input bit stream containing a plurality of images, wherein the image pre-processing unit is configured to at least adjust brightness;

a block for generating a map of M parameters of a motion estimation algorithm (ME) for the at least two mentioned images, configured to generate a map of M parameters (ME) using a pre-trained model for predicting spatial parameters of the motion estimation algorithm and storing the map of M parameters, wherein the dimension of the map of M parameters of the motion estimation algorithm depends on the method of dividing the image into blocks;

a motion estimation unit configured to estimate motion on said at least two images by applying a motion estimation algorithm and parameters from a motion estimation algorithm parameter map M to each block of said at least two images, wherein the motion estimation algorithm parameter map M determines the motion estimation algorithm parameters separately for each block; and

outputting a motion vector field (MVF) in the form of a two-dimensional (2D) array of offsets from the position of each pixel in the first image to at least one position of a corresponding pixel in the second image of said at least two images.

11. The system according to item 10, further comprising a unit for training a model for predicting spatial parameters of a motion estimation algorithm, configured to train said model for predicting spatial parameters of a motion estimation algorithm by iteratively executing the parameter estimation algorithm with N parameter variants and predicting the parameter value for each of the image blocks that gives the best result.

12. The system according to claim 11, wherein the value of the parameters for each of the image blocks that gives the best result is determined by calculating the loss function between the vectors of the motion vector field and the reference motion vectors.

13. The system of claim 10, further configured to receive at least one of a preliminary MVF estimate, coded metadata related to image capture parameters, and sensor data not related to image formation.

14. The system according to claim 10, configured to obtain a pair of images in the form of at least one of the brightness channel data, three-channel color images in the RGB or YUV color spaces.

15. The system of claim 10, wherein the pre-configured model for predicting spatial parameters of the motion estimation algorithm has the architecture of a convolutional neural network or a discriminative or regression machine learning model.

16. The system of claim 10, wherein the motion estimation algorithm is selected from at least one of an algorithm based on block matching by the exhaustive search method, using as the minimum value of the error functional the sum of the absolute values of the difference in the intensity value of each pixel in the block (SAD), a 3D recursive search algorithm (3DRS), a block erosion algorithm with sub-pixel refinement of motion vectors (MV), an algorithm for sub-pixel discretization of candidate blocks.

17. The system according to claim 11, wherein the training unit for the model for predicting spatial parameters of the motion estimation algorithm additionally comprises:

a rendering unit configured to render sequences of frames with a reference motion vector (GT) field;

a motion estimation unit (ME) configured to estimate motion for said frame sequences with each set of parameters from the map M of parameters of the motion estimation algorithm;

a motion estimation algorithm spatial parameter prediction model training unit configured to train the motion estimation algorithm spatial parameter prediction model by calculating the loss function L between the vectors of the motion vector field generated by the motion estimation unit and the reference motion vectors, updating the motion estimation algorithm spatial parameter prediction model by backpropagating the gradient of the loss function L, determining whether the required convergence has been achieved in the motion estimation algorithm spatial parameter prediction model; and

block for storing weight coefficients of the trained model for predicting spatial parameters of the motion estimation algorithm.