[go: up one dir, main page]

TWI884776B - Image data collection system, image model training method, and device for improving image resolution - Google Patents

Image data collection system, image model training method, and device for improving image resolution Download PDF

Info

Publication number
TWI884776B
TWI884776B TW113115657A TW113115657A TWI884776B TW I884776 B TWI884776 B TW I884776B TW 113115657 A TW113115657 A TW 113115657A TW 113115657 A TW113115657 A TW 113115657A TW I884776 B TWI884776 B TW I884776B
Authority
TW
Taiwan
Prior art keywords
image
resolution
training
low
model
Prior art date
Application number
TW113115657A
Other languages
Chinese (zh)
Other versions
TW202542833A (en
Inventor
孟忠 黎
陳亮嘉
李維耘
鄭育廷
Original Assignee
台達電子工業股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 台達電子工業股份有限公司 filed Critical 台達電子工業股份有限公司
Priority to TW113115657A priority Critical patent/TWI884776B/en
Application granted granted Critical
Publication of TWI884776B publication Critical patent/TWI884776B/en
Publication of TW202542833A publication Critical patent/TW202542833A/en

Links

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The embodiments of this application provide an image data collection system, an image model training method, and a device for improving image resolution. In this application, an image capturing device is used to capture an image of an object at different focal lengths to obtain a first image and a second image respectively, and the first image and the second image are processed to obtain a first processed image with high resolution and a second processed image with low resolution. Image alignment is performed on these processed images to obtain a high-resolution and low-resolution image pair. Many high-resolution and low-resolution image pairs are collected as a training image data set to train a model for upgrading low-resolution images to high-resolution images. The trained model can significantly improve the ability to restore image details.

Description

影像數據採集系統、影像模型的訓練方法、提高影像解析度的裝置Image data acquisition system, image model training method, and device for improving image resolution

本申請實施例涉及影像處理技術,特別涉及一種影像數據採集系統、影像模型的訓練方法、提高影像解析度的裝置。 This application embodiment relates to image processing technology, and in particular to an image data acquisition system, an image model training method, and a device for improving image resolution.

高解析度(high-resolution,HR)的影片可以為觀眾帶來更好的視覺效果。隨著光學成像技術、光耦合器和高速通訊技術的發展,8K影片的擷取、儲存和投影已經是成熟的技術。但是,高解析度的影片需要較大的儲存空間和傳輸頻寬,相關設備的價格也較昂貴。數位超解析度(super-resolution,SR)技術是當今非常流行的影像處理技術。該技術的精神是利用低解析度(low-resolution,LR)影像的空間域資訊或空間頻域資訊進行上採樣來估計光學系統的光學傳遞函數(optical transfer function),如圖1所示,這與傳統的基於插值的上採樣方式(如,最近鄰法(nearest neighbor)、雙線性法(bilinear)、雙立方法(bicubic)等)不同。 High-resolution (HR) videos can bring better visual effects to the audience. With the development of optical imaging technology, optical couplers and high-speed communication technology, the capture, storage and projection of 8K videos have become mature technologies. However, high-resolution videos require larger storage space and transmission bandwidth, and the price of related equipment is also relatively expensive. Digital super-resolution (SR) technology is a very popular image processing technology today. The spirit of this technology is to use the spatial domain information or spatial frequency domain information of low-resolution (LR) images for upsampling to estimate the optical transfer function of the optical system, as shown in Figure 1. This is different from the traditional upsampling method based on interpolation (such as the nearest neighbor method, bilinear method, bicubic method, etc.).

一般來說,超解析度技術有兩種方法,即光學方法和深度神經網路(DNN)方法。光學方法是依靠光學系統理解來提高分辨率,而DNN方法則利用機器學習從資料中學會一些模式,可適應更廣泛的情況。儘管光學方法可以 達到較高的可解釋性,但雜訊可能造成反捲積過程中出現錯誤。DNN方法可以使用更複雜的模式,且對以相機拍攝的自然影像來說,可以達到比光學方法更好的表現。 Generally speaking, there are two approaches to super-resolution technology, namely optical methods and deep neural network (DNN) methods. Optical methods rely on understanding the optical system to improve resolution, while DNN methods use machine learning to learn some patterns from data and can adapt to a wider range of situations. Although optical methods can achieve higher interpretability, noise may cause errors in the deconvolution process. DNN methods can use more complex patterns and can achieve better performance than optical methods for natural images taken with cameras.

超解析度技術是近年來興起的技術,其中大多數是基於深度學習的方法。傳統的插值上採樣方法通常只涉及目標像素周圍的4~9個像素進行插值操作,而由許多卷積層(Convolutional Layer)建構的深度學習網路可以分析影像中的特徵,這意味著具有更大的感受野(receptive field),因此比傳統插值方法具有更好的非線性映射能力,可以實現超解析度的目標。近年來大量的研究也證明深度學習方法的有效性。 Super-resolution technology has emerged in recent years, most of which are based on deep learning methods. Traditional interpolation sampling methods usually only involve 4 to 9 pixels around the target pixel for interpolation operations, while deep learning networks constructed by many convolutional layers can analyze features in images, which means that they have a larger receptive field and therefore have better nonlinear mapping capabilities than traditional interpolation methods, which can achieve super-resolution goals. A large number of studies in recent years have also proved the effectiveness of deep learning methods.

高解析度影像提供更多細節,但其高像素密度會增加傳輸頻寬、視訊儲存上的成本和相關產品的成本。雖然採用HR影像感測器是獲取HR影像的最直接方法,但這類感測器和光學設備的製造流程和成本的限制通常使得這種方式在許多場合或大規模部署中不切實際。隨著成像技術的應用和精度需求(如,影像分析、影像顯示、顯微鏡等)不斷發展,對更高影像解析度的需求不斷增加。並且,隨著超解析度技術在影片和影像處理上的廣泛發展和應用,開發滿足視訊內容的時間和空間連續性的超解析度解決方案變得至關重要。 High-resolution images provide more details, but their high pixel density increases the transmission bandwidth, video storage costs, and the cost of related products. Although the use of HR image sensors is the most direct way to obtain HR images, the manufacturing process and cost limitations of such sensors and optical equipment usually make this approach impractical in many occasions or large-scale deployments. With the continuous development of the application and accuracy requirements of imaging technology (such as image analysis, image display, microscopes, etc.), the demand for higher image resolution continues to increase. In addition, with the widespread development and application of super-resolution technology in video and image processing, it is crucial to develop super-resolution solutions that meet the temporal and spatial continuity of video content.

現今,大多數的超解析度數據集都是由“合成”的資料組成,這些資料是透過諸如雙線性和雙立方等數值方法產生的低解析度影像,如圖2所示。這種數據集很容易建構,然而真實光學系統中的下取樣過程比這些簡單的模型更為複雜。當針對真實影像進行操作時,基於這些合成的數據集訓練得出的DNN通常無法將高頻細節超解析度到與合成的LR影像有相同的清晰度和銳利度。因此,在生產高解析度影像方面,採用合成的訓練影像存在此一弱點。 Today, most super-resolution datasets consist of "synthesized" data, which are low-resolution images generated by numerical methods such as bilinear and bicubic, as shown in Figure 2. Such datasets are easy to construct, but the downsampling process in real optical systems is more complex than these simple models. When operating on real images, DNNs trained on these synthetic datasets are generally unable to super-resolve high-frequency details to the same clarity and sharpness as synthetic LR images. Therefore, the use of synthetic training images has this weakness in producing high-resolution images.

本申請實施例提供了一種影像數據採集系統、影像模型的訓練方法、提高影像解析度的裝置,能夠提升影像細節的恢復能力。所述技術方案如下: This application embodiment provides an image data acquisition system, an image model training method, and a device for improving image resolution, which can enhance the ability to restore image details. The technical solution is as follows:

根據本申請實施例的一個方面,提供了一種影像數據採集系統,包括:一影像擷取裝置,用以在第一焦距下擷取一物體的影像以得出第一影像,在第二焦距下擷取該物體的影像以得出第二影像,該第一影像和該第二影像具有相同的解析度;一儲存器,用以儲存該影像擷取裝置擷取得到的該第一影像和該第二影像;一處理模組,從該儲存器獲取該第一影像和該第二影像,用以處理該第一影像以得出具有第一解析度的第一處理後影像,處理該第二影像以得出具有第二解析度的第二處理後影像,該第一解析度大於該第二解析度;以及一註冊(registration)模組,從該處理模組獲取該第一處理後影像和該第二處理後影像,並對該第一處理後影像和該第二處理後影像進行影像對齊(image alignment),以獲得一高解析度和低解析度影像對(image pair)。 According to one aspect of an embodiment of the present application, an image data acquisition system is provided, comprising: an image capture device, used to capture an image of an object at a first focal length to obtain a first image, and to capture an image of the object at a second focal length to obtain a second image, wherein the first image and the second image have the same resolution; a storage device, used to store the first image and the second image captured by the image capture device; and a processing module, which obtains the first image from the storage device. image and the second image, for processing the first image to obtain a first processed image with a first resolution, and processing the second image to obtain a second processed image with a second resolution, wherein the first resolution is greater than the second resolution; and a registration module, for obtaining the first processed image and the second processed image from the processing module, and performing image alignment on the first processed image and the second processed image to obtain a high-resolution and low-resolution image pair.

根據本申請實施例的另一個方面,提供了一種影像模型的訓練方法,包括:透過一光學系統獲取一影像數據集,該影像數據集包括多個高解析度和低解析度影像對(image pair),每一高解析度和低解析度影像對包括基於第一焦距得出的具有一第一解析度的一第一訓練影像(training image)和基於一第二焦距得出的具有一第二解析度的一第二訓練影像,該第一訓練影像和該第二訓練影像具有相同或相應的影像內容,該第一解析度大於該第二解析度;以及將該影像數據集輸入一神經網路模型(neural network model)中以訓練得出一訓練後的影像模型,其中該第一訓練影像作為該神經網路模型的輸入,該第二訓練影像作為訓練標籤。 According to another aspect of the embodiment of the present application, a method for training an image model is provided, comprising: obtaining an image data set through an optical system, the image data set comprising a plurality of high-resolution and low-resolution image pairs, each high-resolution and low-resolution image pair comprising a first training image having a first resolution obtained based on a first focal length and a second training image having a second resolution obtained based on a second focal length, the first training image and the second training image having the same or corresponding image content, and the first resolution is greater than the second resolution; and inputting the image data set into a neural network model to train a trained image model, wherein the first training image is used as an input of the neural network model, and the second training image is used as a training label.

根據本申請實施例的再一個方面,提供了一種提高影像解析度的裝置,包括:一輸入單元,用以接收一低解析度影像;一控制器,與該輸入單元耦接,該控制器中佈建有一影像轉換模型,該影像轉換模型用以將該低解析度影像轉換成一高解析度影像,其中該影像轉換模型是利用一影像數據集訓練得出,該影像數據集包括多個高解析度和低解析度影像對(image pair),每一高解析度和低解析度影像對包括基於一第一焦距得出的具有一第一解析度的一第一訓練影像(training image)和基於一第二焦距得出的具有一第二解析度的一第二訓練影像,該第一訓練影像和該第二訓練影像具有相同或相應的影像內容,該第一解析度大於該第二解析度;以及一輸出單元,與該控制器耦接,用以輸出該高解析度影像,其中該高解析度影像的解析度高於該低解析度影像的解析度。 According to another aspect of the embodiment of the present application, a device for improving image resolution is provided, comprising: an input unit for receiving a low-resolution image; a controller coupled to the input unit, wherein an image conversion model is configured in the controller, and the image conversion model is used to convert the low-resolution image into a high-resolution image, wherein the image conversion model is trained using an image data set, and the image data set includes a plurality of high-resolution and low-resolution image pairs (image pairs), each of the high-resolution and low-resolution image pairs includes a first training image (training image) with a first resolution obtained based on a first focal length. image) and a second training image with a second resolution obtained based on a second focal length, the first training image and the second training image have the same or corresponding image content, and the first resolution is greater than the second resolution; and an output unit coupled to the controller for outputting the high-resolution image, wherein the resolution of the high-resolution image is higher than the resolution of the low-resolution image.

本申請實施例提供的技術方案可以包括如下有益效果: The technical solution provided by this application embodiment may include the following beneficial effects:

本申請實施例中,採用影像擷取裝置在不同焦距下擷取物體的影像來分別得出第一影像和第二影像,並對第一影像和第二影像進行處理以得出高解析度的第一處理後影像和低解析度的第二處理後影像,並對這些處理後影像進行影像對齊,來獲得高解析度和低解析度影像對。將許多高解析度和低解析度影像對集合起來作為訓練影像數據集來訓練用以將低解析度影像提升為高解析度影的模型(如,神經網路模型),訓練得出的模型可以顯著提高恢復影像細節的能力。 In the embodiment of the present application, an image capture device is used to capture images of an object at different focal lengths to obtain a first image and a second image respectively, and the first image and the second image are processed to obtain a high-resolution first processed image and a low-resolution second processed image, and these processed images are aligned to obtain a high-resolution and low-resolution image pair. Many high-resolution and low-resolution image pairs are collected as a training image data set to train a model (such as a neural network model) for upgrading low-resolution images to high-resolution images. The trained model can significantly improve the ability to restore image details.

應當理解的是,以上的一般描述和後文的細節描述僅是示例性和解釋性的,並不能限制本申請。 It should be understood that the above general description and the following detailed description are merely exemplary and explanatory and do not limit the present application.

1:物體 1: Object

2:布幕 2: Curtain

10:影像擷取裝置 10: Image capture device

20:儲存器 20: Storage

30:計算設備 30: Computing equipment

32:處理模組 32: Processing module

34:註冊模組 34:Register module

35:裁切模組 35: Cutting module

36:標準偏差過濾器 36: Standard Deviation Filter

82:餘弦圖案 82: Cosine pattern

83:中間區域 83: Middle area

85、85’:頻譜中心 85, 85’: Spectrum center

86a、86b:峰值 86a, 86b: Peak

87a、87b:峰值 87a, 87b: Peak

100:影像數據採集系統 100: Image data acquisition system

1901:UHD8k數據集 1901:UHD8k dataset

1902:預訓練模型 1902: Pre-training model

1903:DSD數據集 1903:DSD data set

1904:從影片中添加的數據集 1904: Dataset added from the movie

1905:VSR模型 1905: VSR model

2400:提高影像解析度的裝置 2400: Device for improving image resolution

2410:輸入單元 2410: Input unit

2420:控制器 2420: Controller

2422:影像轉換模型 2422: Image conversion model

2430:輸出單元 2430: Output unit

2500:高解析度轉換系統 2500: High-resolution conversion system

Im1:第一影像 Im1: First Image

Im2:第二影像 Im2: Second Image

R1:第一區域 R1: First area

R2:第二區域 R2: Second area

S701~S708:步驟 S701~S708: Steps

S1201~S1208:步驟 S1201~S1208: Steps

為了更清楚地說明本申請實施例中的技術方案,下面將對實施例描述中所需要使用的圖式作簡單地介紹,顯而易見地,下面描述中的圖式僅僅是本申請的一些實施例,對於所屬技術領域具有通常知識者來講,在不付出創造性勞動的前提下,還可以根據這些圖式獲得其他的圖式。 In order to more clearly explain the technical solutions in the embodiments of this application, the following will briefly introduce the figures required for the description of the embodiments. Obviously, the figures described below are only some embodiments of this application. For those with ordinary knowledge in the relevant technical field, other figures can be obtained based on these figures without creative labor.

[圖1]顯示超解析度技術中光學傳遞函數的原理的示意圖。 [Figure 1] Schematic diagram showing the principle of optical transfer function in super-resolution technology.

[圖2]顯示合成的訓練影像的生成過程的示意圖。 [Figure 2] Schematic diagram showing the generation process of synthetic training images.

[圖3]顯示真實的訓練影像的生成過程的示意圖。 [Figure 3] Schematic diagram showing the generation process of real training images.

[圖4]顯示根據本申請實施例的採用影像擷取裝置擷取物體的影像的示意圖。 [Figure 4] is a schematic diagram showing an image of an object captured by an image capture device according to an embodiment of the present application.

[圖5]顯示根據本申請實施例的影像數據採集系統的方塊圖。 [Figure 5] shows a block diagram of an image data acquisition system according to an embodiment of the present application.

[圖6]顯示根據本申請實施例的物體影像的處理過程的示意圖。 [Figure 6] is a schematic diagram showing the object image processing process according to the embodiment of the present application.

[圖7]顯示根據本申請實施例的獲取高解析度和低解析度影像對的流程示意圖。 [Figure 7] shows a schematic diagram of the process of obtaining a pair of high-resolution and low-resolution images according to an embodiment of the present application.

[圖8]顯示DSD放大倍率校正過程的餘弦圖案與餘弦圖案的頻譜的示意圖。 [Figure 8] Schematic diagram showing the cosine pattern and the spectrum of the cosine pattern during the DSD amplification correction process.

[圖9]顯示設計場景數據集(DSD)中影像校準的過程。 [Figure 9] shows the process of image calibration in the Design Scene Dataset (DSD).

[圖10]顯示設計場景數據集(DSD)中影像校準結果的示例。 [Figure 10] shows an example of image calibration results in the Design Scene Dataset (DSD).

[圖11]顯示基於模型訓練方法獲得較佳裁切尺寸的示意圖。 [Figure 11] shows a schematic diagram of obtaining the optimal cropping size based on the model training method.

[圖12]顯示產生合成數據集的裁切過程的流程示意圖。 [Figure 12] shows a schematic diagram of the cropping process to generate the synthetic dataset.

[圖13]顯示使用不同裁切尺寸的數據集訓練三種DNN模型的測試結果。 [Figure 13] shows the test results of training three DNN models using datasets with different cropping sizes.

[圖14]顯示使用不同裁切尺寸的數據集訓練MANtiny模型的測試結果。 [Figure 14] shows the test results of training the MANtiny model using datasets with different cropping sizes.

[圖15]顯示使用不同裁切尺寸的數據集訓練RFDN模型的測試結果。 [Figure 15] shows the test results of training the RFDN model using datasets with different cropping sizes.

[圖16]顯示使用不同裁切尺寸的數據集訓練SRCNN模型的測試結果。 [Figure 16] shows the test results of training the SRCNN model using datasets with different cropping sizes.

[圖17]顯示基於頻譜分析方法獲得較佳裁剪尺寸的流程示意圖。 [Figure 17] shows a schematic diagram of the process of obtaining the optimal cutting size based on the spectrum analysis method.

[圖18]顯示空間頻率與頻譜相關性的關係圖。 [Figure 18] A graph showing the correlation between spatial frequency and spectrum.

[圖19]顯示一種影片超解析度模型的訓練流程圖。 [Figure 19] shows a training flow chart of a video super-resolution model.

[圖20]顯示一種冗餘影像的辨識流程圖。 [Figure 20] shows a redundant image recognition flow chart.

[圖21]顯示影像的梯度直方圖。 [Figure 21] shows the gradient histogram of the image.

[圖22]顯示沿著水平軸的頻譜截面圖。 [Figure 22] shows a cross-section of the spectrum along the horizontal axis.

[圖23]顯示各種方法獲得的調變傳遞函數(MTF)結果圖。 [Figure 23] shows the modulation transfer function (MTF) results obtained by various methods.

[圖24]顯示根據本申請實施例的提高影像解析度的裝置的方塊示意圖。 [Figure 24] shows a block diagram of a device for improving image resolution according to an embodiment of the present application.

[圖25]顯示根據本申請實施例的高解析度轉換系統的設置示意圖。 [Figure 25] shows a schematic diagram of the configuration of a high-resolution conversion system according to an embodiment of the present application.

以下將結合本申請實施例中的圖式,對本申請實施例中的技術方案進行清楚、完整地描述。顯然地,所描述的實施例僅是本申請一部分的實施例,而非全部實施例。基於本申請中的實施例,所屬技術領域具有通常知識者在沒有作出創造性勞動前提下所獲得的所有其他實施例,都屬於本申請保護的範圍。 The following will combine the figures in the embodiments of this application to clearly and completely describe the technical solutions in the embodiments of this application. Obviously, the described embodiments are only part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by people with ordinary knowledge in the relevant technical field without creative labor are within the scope of protection of this application.

在超解析度技術(super-resolution,SR)中,採用合成的(synthetic)訓練影像訓練得出的模型通常無法得出與合成的低解析度(low-resolution,LR)影像有相同的清晰度和銳利度的高解析度(high-resolution,HR)影像。本申請實施例中,為了讓模型學習真實光學系統中的傳遞函數(transfer function),建立了一個“真實(real)”的數據集,如圖3所示,並以真實的數據集來訓練模型。 In super-resolution (SR) technology, a model trained with synthetic training images usually cannot produce high-resolution (HR) images with the same clarity and sharpness as synthetic low-resolution (LR) images. In the embodiment of the present application, in order to allow the model to learn the transfer function in a real optical system, a "real" data set is established, as shown in Figure 3, and the model is trained with the real data set.

請參閱圖4至圖6,圖4顯示根據本申請實施例的採用影像擷取裝置擷取物體的影像的示意圖,圖5顯示根據本申請實施例的影像數據採集系統的方塊圖,圖6顯示根據本申請實施例的物體影像的處理過程的示意圖。如圖5所示,本申請實施例的影像數據採集系統100包括影像擷取裝置(如,照相鏡頭)10、儲存器20和計算設備30,儲存器20可配置在影像擷取裝置10中,也可配置在計算設備30內。計算設備(如,電腦)30中配置有處理模組32和註冊(registration) 模組34,處理模組32包括裁切模組35,這些模組32、34及35可以硬體、軟體、韌體或結合硬體和軟體的方式來實現。 Please refer to Figures 4 to 6, Figure 4 shows a schematic diagram of using an image capture device to capture an image of an object according to an embodiment of the present application, Figure 5 shows a block diagram of an image data acquisition system according to an embodiment of the present application, and Figure 6 shows a schematic diagram of the object image processing process according to an embodiment of the present application. As shown in Figure 5, the image data acquisition system 100 of the embodiment of the present application includes an image capture device (e.g., a camera lens) 10, a storage 20 and a computing device 30, and the storage 20 can be configured in the image capture device 10, and can also be configured in the computing device 30. The computing device (e.g., computer) 30 is equipped with a processing module 32 and a registration module 34. The processing module 32 includes a cutting module 35. These modules 32, 34, and 35 can be implemented in hardware, software, firmware, or a combination of hardware and software.

如圖4所示,影像擷取裝置10用以擷取物體的影像1以得出該物體1的影像。在進行影像擷取時,物體1可設置在一個純色的布幕2前,使得擷取到的物體1影像具有純色的背景,以便於後續的影像處理。影像擷取裝置10包括或為變焦鏡頭(zoom lens),其在第一焦距(如,焦距=86mm)下擷取該物體1的影像以得出第一影像Im1,在第二焦距(如,焦距=43mm)下擷取該物體1的影像以得出第二影像Im2。影像擷取裝置10擷取到的第一影像Im1和第二影像Im2的影像尺寸相同,具有相同的解析度。在影像擷取裝置10的第一焦距大於第二焦距的情況下,第一影像Im1中物體1的尺寸會大於第二影像Im2中物體1的尺寸,如圖4所示。 As shown in FIG4 , the image capture device 10 is used to capture an image 1 of an object to obtain an image of the object 1. When performing image capture, the object 1 can be placed in front of a solid-color curtain 2 so that the captured image of the object 1 has a solid-color background to facilitate subsequent image processing. The image capture device 10 includes or is a zoom lens, which captures the image of the object 1 at a first focal length (e.g., focal length = 86 mm) to obtain a first image Im1, and captures the image of the object 1 at a second focal length (e.g., focal length = 43 mm) to obtain a second image Im2. The first image Im1 and the second image Im2 captured by the image capture device 10 have the same image size and the same resolution. When the first focal length of the image capture device 10 is greater than the second focal length, the size of the object 1 in the first image Im1 will be greater than the size of the object 1 in the second image Im2, as shown in FIG. 4.

影像擷取裝置10擷取得到的第一影像Im1和第二影像Im2可以儲存到自身具備的儲存器20中,或者,也可以傳送到計算設備30,儲存在計算設備30的儲存器20中。儲存器20可以是非揮發性儲存器,也可以是揮發性儲存器。 The first image Im1 and the second image Im2 captured by the image capture device 10 can be stored in the memory 20 of the image capture device 10, or can be transmitted to the computing device 30 and stored in the memory 20 of the computing device 30. The memory 20 can be a non-volatile memory or a volatile memory.

計算設備30從儲存器20獲取第一影像Im1和第二影像Im2,計算設備30的處理模組32用以處理第一影像Im1以得出具有第一解析度的第一處理後影像,處理第二影像Im2以得出具有第二解析度的第二處理後影像。 The computing device 30 obtains the first image Im1 and the second image Im2 from the memory 20, and the processing module 32 of the computing device 30 is used to process the first image Im1 to obtain a first processed image with a first resolution, and to process the second image Im2 to obtain a second processed image with a second resolution.

第一處理後影像的第一解析度大於第二處理後影像的第二解析度。影像擷取裝置10在較長的焦距下擷取得到的影像,其局部細節較為豐富,在較短的焦距下擷取得到的影像,則涵蓋的視野範圍較大,但細節部分較不清晰。因此,希望在較長焦距下,獲取高解析度影像,在較短焦距下,獲取低解析度影像。如果第一焦距為f1,第二焦距為f2,f1和f2存在以下關係:f1=A * f2,其中 A>1。那麼,由於希望在長焦距下擷取到高解析度影像,在短焦距下擷取到低解析度影像,如果第一處理後影像的第一解析度為X,那麼第二處理後影像的第二解析度可以是X/A。也就是說,第一焦距和第二焦距之間的倍率關係可以用來決定第一解析度和第二解析度之間的倍率關係,這是因為影像的放大倍率與焦距成正比,且在這種情況下得出影像解析度又與影像放大倍率相關。舉例來說,第一解析度為第二解析度的兩倍。例如,第一解析度為8K,第二解析度為4K。在其他實施例中,第一解析度和第二解析度也可以是其他倍率關係,而不限於兩倍的關係。如果A=2,所獲得的第一處理後影像和第二處理後影像可以用於訓練適於提升兩倍影像解析度的模型。 The first resolution of the first processed image is greater than the second resolution of the second processed image. The image captured by the image capture device 10 at a longer focal length has richer local details, while the image captured at a shorter focal length covers a larger field of view, but the details are less clear. Therefore, it is desired to obtain a high-resolution image at a longer focal length and a low-resolution image at a shorter focal length. If the first focal length is f1 and the second focal length is f2, f1 and f2 have the following relationship: f1=A * f2, where A>1. Then, since it is desired to capture a high-resolution image at a long focal length and a low-resolution image at a short focal length, if the first resolution of the first processed image is X, then the second resolution of the second processed image can be X/A. That is, the magnification relationship between the first focal length and the second focal length can be used to determine the magnification relationship between the first resolution and the second resolution, because the magnification of the image is proportional to the focal length, and in this case, the image resolution is related to the image magnification. For example, the first resolution is twice the second resolution. For example, the first resolution is 8K and the second resolution is 4K. In other embodiments, the first resolution and the second resolution can also be other magnification relationships, not limited to a two-fold relationship. If A=2, the obtained first processed image and second processed image can be used to train a model suitable for improving the image resolution by two times.

具體地,請參圖6,處理模組32中的裁切模組35可以裁切第一影像Im1中的第一區域R1以得出該第一處理後影像,裁切第二影像Im2中的第二區域R2以得出該第二處理後影像,其中第一區域R1的影像內容與第二區域R2的影像內容相對應,第一區域R1大於第二區域R2。由於第一區域R1的尺寸大於第二區域R2的尺寸,故第一區域R1的解析度大於第二區域R2的解析度。如果第一焦距為f1,第二焦距為f2,f1和f2存在以下關係:f1=A * f2,其中A>1。那麼,如果第一區域R1的解析度為X,那麼第二區域R2的解析度可以是X/A。如果A=2,所獲得的第一區域R1和第二區域R2的影像可以用於訓練適於提升兩倍影像解析度的模型。舉例來說,如圖6所示,第一區域R1的解析度為1500×1500,第二區域R2的解析度為750×750。亦即,對應第一區域R1的第一處理後影像的第一解析度大於對應第二區域R2的第二處理後影像的第二解析度。 Specifically, please refer to FIG6 , the cropping module 35 in the processing module 32 can crop the first area R1 in the first image Im1 to obtain the first processed image, and crop the second area R2 in the second image Im2 to obtain the second processed image, wherein the image content of the first area R1 corresponds to the image content of the second area R2, and the first area R1 is larger than the second area R2. Since the size of the first area R1 is larger than the size of the second area R2, the resolution of the first area R1 is greater than the resolution of the second area R2. If the first focal length is f1 and the second focal length is f2, f1 and f2 have the following relationship: f1=A * f2, where A>1. Then, if the resolution of the first area R1 is X, the resolution of the second area R2 can be X/A. If A=2, the obtained images of the first region R1 and the second region R2 can be used to train a model suitable for improving the image resolution by two times. For example, as shown in Figure 6, the resolution of the first region R1 is 1500×1500, and the resolution of the second region R2 is 750×750. That is, the first resolution of the first processed image corresponding to the first region R1 is greater than the second resolution of the second processed image corresponding to the second region R2.

影像數據採集系統100還可包括一標準偏差過濾器(standard deviation filter)36,用以基於第一影像Im1的灰階值標準差和第一區域R1的影像 的灰階值標準差(standard deviation of grayscale values),決定是否剔除或保留所裁切的第一區域R1的影像。例如,如果所裁切的第一區域R1的影像的灰階值標準差大於(或等於)第一影像Im1的灰階值標準差的R倍(R為0~1,如0.5)時,表示所裁切的第一區域R1的影像包含了第一影像Im1的細節部分,故適於作為用來訓練模型的訓練影像。如果所裁切的第一區域R1的影像的灰階值標準差小於第一影像Im1的灰階值標準差的R倍(R為0~1,如0.5)時,表示所裁切的第一區域R1的影像缺乏第一影像Im1的細節部分,所裁切到的區域可能為背景影像等不適於作為用來訓練模型的訓練影像。因此,透過標準偏差過濾器36的使用可保留適合的裁切影像,剔除不適合的裁切影像,故可進一步提升訓練影像數據集的品質。如果標準偏差過濾器36確定保留所裁切的第一區域R1的影像,則後續將進一步裁切第二影像Im2中對應第一影像Im1的第一區域R1的區域,即第二區域R2。 The image data acquisition system 100 may further include a standard deviation filter 36 for determining whether to remove or retain the cropped image of the first region R1 based on the standard deviation of grayscale values of the first image Im1 and the standard deviation of grayscale values of the image of the first region R1. For example, if the standard deviation of grayscale values of the cropped image of the first region R1 is greater than (or equal to) R times (R is 0-1, such as 0.5) the standard deviation of grayscale values of the first image Im1, it means that the cropped image of the first region R1 contains the details of the first image Im1 and is therefore suitable as a training image for training the model. If the grayscale standard deviation of the cropped first region R1 is less than R times the grayscale standard deviation of the first image Im1 (R is 0~1, such as 0.5), it means that the cropped first region R1 lacks the details of the first image Im1, and the cropped area may be a background image or the like that is not suitable as a training image for training the model. Therefore, by using the standard deviation filter 36, suitable cropped images can be retained and unsuitable cropped images can be removed, so the quality of the training image dataset can be further improved. If the standard deviation filter 36 determines to retain the cropped first region R1, the region corresponding to the first region R1 of the first image Im1 in the second image Im2 will be further cropped, that is, the second region R2.

註冊模組34從處理模組32獲取第一處理後影像和第二處理後影像,並對該第一處理後影像和該第二處理後影像進行影像對齊(image alignment),以獲得一高解析度和低解析度影像對(image pair)。可以採用適當的演算法將該第一處理後影像和該第二處理後影像進行對齊,例如,可以依據該第一處理後影像和該第二處理後影像的頻譜相位圖(phase map of spectrum)之間的差異來進行影像對齊。註冊模組34可以將對齊後的該第一處理後影像和該第二處理後影像註冊為一影像對,這個影像對由高解析度影像和低解析度影像所組成,故稱為高解析度和低解析度影像對,這個影像對可儲存到儲存器20或其他儲存器中,並可作為用來訓練模型(如,神經網路模型(neural network model))的訓練影像對。其中,第二處理後影像作為該模型的輸入,第一處理後影像作為 該模型的輸出。由許多高解析度和低解析度影像對組成的數據集可稱為設計場景數據集(designed scene dataset,DSD)。在模型的訓練過程中引入DSD是一項有價值的增強功能,可以顯著提高模型從低解析度影像提升為高解析度影像(如,4k影像提升到8k影像)時恢復影像細節的能力。 The registration module 34 obtains the first processed image and the second processed image from the processing module 32, and performs image alignment on the first processed image and the second processed image to obtain a high-resolution and low-resolution image pair. An appropriate algorithm may be used to align the first processed image and the second processed image, for example, the image alignment may be performed based on the difference between the phase map of spectrum of the first processed image and the second processed image. The registration module 34 can register the aligned first processed image and the second processed image as an image pair. This image pair is composed of a high-resolution image and a low-resolution image, so it is called a high-resolution and low-resolution image pair. This image pair can be stored in the memory 20 or other memory, and can be used as a training image pair for training a model (such as a neural network model). The second processed image is used as the input of the model, and the first processed image is used as the output of the model. The data set composed of many high-resolution and low-resolution image pairs can be called a designed scene dataset (DSD). Introducing DSD into the model training process is a valuable enhancement that can significantly improve the model's ability to recover image details when upscaling from low-resolution images to high-resolution images (e.g., 4k images to 8k images).

本申請實施例中,採用影像擷取裝置在不同焦距下擷取物體的影像來分別得出第一影像和第二影像,並對第一影像和第二影像進行處理以得出高解析度的第一處理後影像和低解析度的第二處理後影像,並對這些處理後影像進行影像對齊,來獲得高解析度和低解析度影像對。將許多高解析度和低解析度影像對集合起來作為訓練影像數據集來訓練用以將低解析度影像提升為高解析度影的模型(如,神經網路模型),訓練得出的模型可以顯著提高恢復影像細節的能力。 In the embodiment of the present application, an image capture device is used to capture images of an object at different focal lengths to obtain a first image and a second image respectively, and the first image and the second image are processed to obtain a high-resolution first processed image and a low-resolution second processed image, and these processed images are aligned to obtain a high-resolution and low-resolution image pair. Many high-resolution and low-resolution image pairs are collected as a training image data set to train a model (such as a neural network model) for upgrading low-resolution images to high-resolution images. The trained model can significantly improve the ability to restore image details.

圖7顯示根據本申請實施例的獲取高解析度和低解析度影像對的流程示意圖。如圖7所示,該流程包括放大倍率校正(magnification calibration)(步驟S701)、獲取高解析度圖像(步驟S702)、改變焦距(步驟S703)、獲取低解析度圖像(步驟S704)、裁切流程(步驟S705)、影像註冊(步驟S706)、相似度判斷(步驟S707)和儲存影像對(步驟S708)等步驟。 FIG7 shows a schematic diagram of the process of obtaining a high-resolution and low-resolution image pair according to an embodiment of the present application. As shown in FIG7 , the process includes magnification calibration (step S701), obtaining a high-resolution image (step S702), changing the focal length (step S703), obtaining a low-resolution image (step S704), a cropping process (step S705), image registration (step S706), similarity determination (step S707), and storing image pairs (step S708).

相機拍攝的自然影像具有不確定性,例如環境引起的雜訊、失真、像差和誤差。為了減少這些不確定性,必須在設計的環境中控制數據集的取得。透過改變變焦鏡頭的焦距來擷取高解析度影像和低解析度影像。例如,在擷取到高解析度影像(步驟S702)後,可以改變變焦鏡頭的焦距(步驟S703),將焦距縮短為原來的兩倍,來獲得低解析度影像(步驟S704)。之後,對獲取的高解析度影像和低解析度影像進行裁切流程(步驟S705)。在裁切過程中,高解析度影 像和低解析度影像的影像大小維持一定的比例關係,例如A=2時,如果高解析度影像為1500×1500,則低解析度影像為750×750;如果高解析度影像為400×400,則低解析度影像為200×200。 Natural images captured by a camera have uncertainties, such as noise, distortion, aberration, and error caused by the environment. In order to reduce these uncertainties, the acquisition of data sets must be controlled in a designed environment. High-resolution images and low-resolution images are captured by changing the focal length of the zoom lens. For example, after capturing a high-resolution image (step S702), the focal length of the zoom lens can be changed (step S703) to shorten the focal length to twice the original to obtain a low-resolution image (step S704). Afterwards, the obtained high-resolution image and low-resolution image are subjected to a cropping process (step S705). During the cropping process, the image sizes of the high-resolution image and the low-resolution image maintain a certain proportional relationship. For example, when A=2, if the high-resolution image is 1500×1500, the low-resolution image is 750×750; if the high-resolution image is 400×400, the low-resolution image is 200×200.

其中,可以將布幕2設定為白色背景,這有兩個好處。首先,白色背景的灰階範圍較小,便於使用上述的標準偏差過濾器36來確定所裁切的區域影像是否包含圖像細節。其次,高解析度情況下焦深(depth of focus)較小,這意味著背景變得更加模糊。如果將更模糊的HR放入訓練數據集中,模型將學習如何模糊影像,這不是訓練模型的目標。白色背景在聚焦和散焦(defocus)時表現出類似的特性。它可以降低散焦造成的誤差。為了減少失真的影響,可以僅選擇影像的中間區域進行裁剪。 Among them, the curtain 2 can be set to a white background, which has two advantages. First, the grayscale range of the white background is smaller, which is convenient for using the above-mentioned standard deviation filter 36 to determine whether the cropped area image contains image details. Second, the depth of focus is smaller in the case of high resolution, which means that the background becomes more blurred. If a more blurred HR is put into the training data set, the model will learn how to blur the image, which is not the goal of training the model. The white background exhibits similar characteristics when it is focused and defocused. It can reduce the error caused by defocus. In order to reduce the impact of distortion, only the middle area of the image can be selected for cropping.

設計場景數據集(DSD)的準備流程圖如圖7所示。它包含兩個核心步驟:放大倍率校正(步驟S701)和影像註冊(步驟S706)。透過這些步驟S701、S706,可以減少變焦鏡頭調整和相機移位所帶來的不確定性。 The preparation flow chart of the design scene dataset (DSD) is shown in Figure 7. It contains two core steps: magnification correction (step S701) and image registration (step S706). Through these steps S701 and S706, the uncertainty caused by zoom lens adjustment and camera shift can be reduced.

其中,在放大倍率校正(步驟S701)中,可以透過對校正影像(calibration image)進行放大倍率校正,來確定影像擷取裝置的調焦旋鈕分別在第一焦距和第二焦距下的位置,從而可以分別在第一焦距和第二焦距下擷取得出第一影像Im1和第二影像Im2。放大倍率的校正可採用兩種方法:餘弦譜法(cosine pattern spectrum)和傅立葉梅林變換(Fourier Mellin transform)。基於此,加入焦距限制機制,使得每次放大影像時調焦旋鈕可以轉到同一位置。較佳地,以放大兩倍來說,校準結果與兩倍之間的偏差小於0.0025。如圖8所示,校正影像(如,餘弦圖案82)用於校準放大倍率。分別從HR影像和LR影像中的餘弦圖案82裁剪出一塊中間區域83,HR影像的中間區域83的尺寸大小與LR影像的 中間區域83的尺寸大小一樣,對HR影像和LR影像的中間區域83進行快速傅立葉變換(fast Fourier transform)可分別得出其頻譜圖,如圖8底部的兩個圖示所示。針對HR頻譜,兩峰值86a、86b與頻譜中心85的距離相等。類似地,針對LR頻譜,兩峰值87a、87b與頻譜中心85’的距離相等。在高解析度(HR)影像的頻譜中,週期較大頻率較低,兩峰值的位置更接近頻譜中心。利用傅立葉變換的性質,可以透過高解析度影像和低解析度影像的頻譜中的峰值距離來計算放大倍率。例如,若放大倍率為兩倍(即,A=2),LR頻譜中兩峰值87a、87b與頻譜中心85’的距離(或平均距離)會是HR頻譜中兩峰值86a、86b與其頻譜中心85的距離(或平均距離)的兩倍。 Among them, in the magnification calibration (step S701), the magnification calibration can be performed on the calibration image to determine the position of the focus knob of the image capture device at the first focal length and the second focal length, so that the first image Im1 and the second image Im2 can be captured at the first focal length and the second focal length, respectively. The calibration of the magnification can be performed by two methods: cosine pattern spectrum and Fourier Mellin transform. Based on this, a focal length restriction mechanism is added so that the focus knob can be turned to the same position each time the image is magnified. Preferably, for example, the deviation between the calibration result and the double magnification is less than 0.0025. As shown in Figure 8, the calibration image (e.g., cosine pattern 82) is used to calibrate the magnification. A middle region 83 is cropped from the cosine pattern 82 in the HR image and the LR image, respectively. The size of the middle region 83 of the HR image is the same as that of the middle region 83 of the LR image. The middle region 83 of the HR image and the LR image are subjected to fast Fourier transform (FFT) to obtain their spectrum diagrams, respectively, as shown in the two diagrams at the bottom of FIG8 . For the HR spectrum, the distances between the two peaks 86a and 86b and the spectrum center 85 are equal. Similarly, for the LR spectrum, the distances between the two peaks 87a and 87b and the spectrum center 85' are equal. In the spectrum of the high-resolution (HR) image, the larger the period and the lower the frequency, the closer the positions of the two peaks are to the spectrum center. By using the properties of Fourier transform, the magnification can be calculated by the peak distance in the spectrum of the high-resolution image and the low-resolution image. For example, if the magnification is two times (i.e., A=2), the distance (or average distance) between the two peaks 87a, 87b in the LR spectrum and the spectrum center 85' will be twice the distance (or average distance) between the two peaks 86a, 86b in the HR spectrum and its spectrum center 85.

在DSD中,本申請是透過改變相機的焦距來獲取高解析度影像和低解析度影像,但相機的視野(Field of view,FOV)可能會改變,且直接裁剪影像對可能導致數據集被汙染。在影像註冊(步驟S706)中,將兩張裁切影像放入同一視野中,可以依據這兩張裁切影像的頻譜相位圖之間的差異來進行影像對齊,如圖9所示。亦即,在步驟S707中,如果在此視野中,兩張裁切圖像之間的誤差過大(兩者差異超過一定值,如5%),則重新進行裁切或對位;兩張裁切圖像之間的誤差小(兩者相似度(correlation)大於一定值,如95%),則可將其儲存成影像對(步驟S708)。圖10給出了此影像校準結果的示例。 In DSD, the present application obtains high-resolution images and low-resolution images by changing the focal length of the camera, but the field of view (FOV) of the camera may change, and directly cropping the image pair may cause the data set to be contaminated. In image registration (step S706), the two cropped images are placed in the same field of view, and the image alignment can be performed based on the difference between the spectral phase maps of the two cropped images, as shown in FIG9 . That is, in step S707, if the error between the two cropped images in this field of view is too large (the difference between the two exceeds a certain value, such as 5%), then re-crop or reposition; if the error between the two cropped images is small (the similarity (correlation) between the two is greater than a certain value, such as 95%), then they can be stored as an image pair (step S708). Figure 10 shows an example of this image calibration result.

由於計算機可能不具備高解析度影像數據集(如,4k到8k影像組成的數據集)所需的運算能力。為了使數據集變得可訓練,可以將影像裁切為小尺寸。為了確定哪種影像尺寸可以得出令人滿意的訓練表現,以下提出兩種決定較佳影像尺寸的策略。 Since computers may not have the computing power required for high-resolution image datasets (e.g., datasets consisting of 4k to 8k images), in order to make the dataset trainable, the images can be cropped to a smaller size. In order to determine which image size can produce satisfactory training performance, two strategies for determining the optimal image size are proposed below.

模型訓練方法: Model training method:

獲得不影響模型表現的最小或較佳裁切尺寸(cropped size)可以透過以下步驟來實現:獲取高解析度影像和具有與該高解析度影像對應的影像內容的低解析度影像;以多種不同尺寸對該高解析度影像進行裁切,並裁切該低解析度影像中相同的區域,以獲得高解析度和低解析度影像對;基於每一種尺寸下的高解析度和低解析度影像對對模型的訓練結果,決定出裁切尺寸。 Obtaining the minimum or optimal cropped size that does not affect the performance of the model can be achieved through the following steps: obtaining a high-resolution image and a low-resolution image with image content corresponding to the high-resolution image; cropping the high-resolution image at multiple different sizes and cropping the same area in the low-resolution image to obtain high-resolution and low-resolution image pairs; determining the cropping size based on the training results of the model for the high-resolution and low-resolution image pairs at each size.

具體地,此方法使用不同裁切尺寸的訓練數據集來訓練模型。對於訓練數據集,可以從現有的影像數據集(UHD8k數據集)中收集影像,UHD8k數據集提供了2029張8K(7680x4320)影像。雖然數據集中影像解析度很高,但由於電腦硬體和訓練速度的限制,需要將它們裁剪成較小的尺寸。舉例來說,將數據集裁剪為不同的正方形尺寸:2000、1000、800、400、200、100、80、70、60、50、30、20。為了讓每個裁剪尺寸中的特徵具有相同的屬性,小尺寸數據集是從大尺寸數據集中裁剪出來的。為了確保裁切尺寸和訓練表現之間的關係適應不同的CNN模型,這些選擇三種不同的模型MANtiny、SRCNN和RFDN進行測試。模型訓練方法流程如圖11所示。 Specifically, this method uses training datasets with different cropping sizes to train the model. For the training dataset, images can be collected from an existing image dataset (UHD8k dataset), which provides 2029 8K (7680x4320) images. Although the images in the dataset are of high resolution, they need to be cropped into smaller sizes due to the limitations of computer hardware and training speed. For example, the dataset is cropped into different square sizes: 2000, 1000, 800, 400, 200, 100, 80, 70, 60, 50, 30, 20. In order for the features in each cropping size to have the same properties, the small-size dataset is cropped from the large-size dataset. To ensure that the relationship between cropping size and training performance is applicable to different CNN models, three different models, MANtiny, SRCNN, and RFDN, are selected for testing. The model training method flow is shown in Figure 11.

以8K影像來說,其裁切的過程如圖12所示的流程。如圖12所示,從8K影像開始,對8K影像進行隨機裁切(步驟S1201)。繼續此過程,直到儲存最小尺寸的影像。在步驟S1202、S1203和S1204中進行標準差測試,標準差測試是為了測試影像是否有一些特徵可以讓模型學習。一般來說,背景是灰階變化較小的色塊。為了確保8k影像中的特徵被裁剪,可以以灰階值標準差作為指標。首先,計算裁切影像的灰階值標準差Stdcr(步驟S1202),並計算8k影像的灰階值標準差Std8k(步驟S1203)。在步驟S1204中進行兩者的比較。例如,如果所裁切的影像的灰階值標準差大於(或等於)8K影像的灰階值標準差的R倍(R為0~1, 如0.5)時,表示所裁切的影像包含了8K影像的細節部分,故適於作為用來訓練模型的訓練影像。如果所裁切的影像的灰階值標準差小於8K影像的灰階值標準差的R倍(R為0~1,如0.5)時,表示所裁切的影像缺乏8K影像的細節部分,所裁切到的區域可能為背景影像等不適於作為用來訓練模型的訓練影像,需回到步驟S1201重新進行裁切。另外,對8k影像進行下採樣(步驟S1205)來得出4k影像(步驟S1206),例如,可以應用Lanczos方法來產生低解析度(如,0.5x解析度)影像。而後,在4k影像中裁切出與從步驟S1201得出的且滿足步驟S1204之條件的裁切影像具有相同或相似影像內容的區域(步驟S1207),從而得出已裁切的數據對(步驟S1208)。 Taking 8K images as an example, the cropping process is as shown in Figure 12. As shown in Figure 12, starting from the 8K image, the 8K image is randomly cropped (step S1201). Continue this process until the image of the minimum size is stored. In steps S1202, S1203 and S1204, a standard deviation test is performed. The standard deviation test is to test whether the image has some features that allow the model to learn. Generally speaking, the background is a color block with a small grayscale change. In order to ensure that the features in the 8k image are cropped, the grayscale value standard deviation can be used as an indicator. First, the grayscale value standard deviation Std cr of the cropped image is calculated (step S1202), and the grayscale value standard deviation Std 8k of the 8k image is calculated (step S1203). A comparison is performed between the two in step S1204. For example, if the grayscale value standard deviation of the cropped image is greater than (or equal to) R times the grayscale value standard deviation of the 8K image (R is 0~1, such as 0.5), it means that the cropped image contains the details of the 8K image and is suitable as a training image for training the model. If the grayscale value standard deviation of the cropped image is less than R times the grayscale value standard deviation of the 8K image (R is 0~1, such as 0.5), it means that the cropped image lacks the details of the 8K image, and the cropped area may be a background image, etc., which is not suitable as a training image for training the model, and it is necessary to return to step S1201 for re-cropping. In addition, the 8k image is downsampled (step S1205) to obtain a 4k image (step S1206). For example, the Lanczos method can be applied to generate a low-resolution (e.g., 0.5x resolution) image. Then, a region having the same or similar image content as the cropped image obtained from step S1201 and satisfying the condition of step S1204 is cropped from the 4k image (step S1207), thereby obtaining a cropped data pair (step S1208).

對於每個模型,裁剪尺寸=100x100後波動小於1%,結果如圖13至圖16所示。 For each model, the fluctuation is less than 1% after cropping to size = 100x100, and the results are shown in Figures 13 to 16.

頻譜分析方法: Spectrum analysis method:

獲得不影響模型表現的最小或較佳裁切尺寸可以透過以下步驟來實現:獲取高解析度影像和具有與該高解析度影像對應的影像內容的低解析度影像;對該高解析度影像和該低解析度影像進行快速傅立葉轉換以分別獲得高解析度影像頻譜(spectrum)和低解析度影像頻譜;針對該高解析度影像和該低解析度影像中的低頻區域(low-frequency region),以多種不同尺寸對該高解析度影像和該低解析度影像進行裁切;基於這些尺寸下,該高解析度影像頻譜和該低解析度影像頻譜之間的相關性,從這些尺寸中決定出裁切尺寸。 Obtaining the minimum or optimal cropping size that does not affect the model performance can be achieved through the following steps: obtaining a high-resolution image and a low-resolution image having image content corresponding to the high-resolution image; performing fast Fourier transformation on the high-resolution image and the low-resolution image to obtain a high-resolution image spectrum and a low-resolution image spectrum respectively; cropping the high-resolution image and the low-resolution image at a plurality of different sizes for the low-frequency region in the high-resolution image and the low-resolution image; and determining the cropping size from these sizes based on the correlation between the high-resolution image spectrum and the low-resolution image spectrum at these sizes.

具體地,影像是由許多不同頻率的平面波組成的。影像裁剪的過程可以看作是阻擋裁剪區域之外的訊號。裁剪尺寸越大,則可以包含越低頻率的結構。想要瞭解在什麼頻率下,高解析度影像和低解析度影像在頻譜上開始出現 差異。為了瞭解這點,低頻區域被裁剪成不同的尺寸。整個流程圖如圖17所示。透過計算現有數據集(如,UHD8k數據集)中多張影像的頻譜相關性,可以繪製出相關性與頻率之間的關係,如圖18所示。確定頻率後,可以分析裁剪過程如何影響頻譜。裁剪時尺寸變小,頻譜就會發生畸變,導致資訊遺失。透過以不同大小裁剪模擬訊號,可以確定出哪種裁剪尺寸對於防止資訊遺失是安全的。如圖18所示的結果,在頻率>0.052 1/像素時,4k和8k頻譜之間的差異變得顯著。 Specifically, an image is composed of many plane waves of different frequencies. The process of image cropping can be viewed as blocking the signal outside the cropped region. The larger the crop size, the lower frequency structure can be included. We want to understand at what frequency the high-resolution image and the low-resolution image begin to differ in the spectrum. To understand this, the low-frequency region is cropped into different sizes. The entire flow chart is shown in Figure 17. By calculating the spectral correlation of multiple images in an existing dataset (e.g., UHD8k dataset), the relationship between the correlation and frequency can be plotted, as shown in Figure 18. After determining the frequency, we can analyze how the cropping process affects the spectrum. When cropping, the spectrum is distorted as the size becomes smaller, resulting in information loss. By cropping the analog signal at different sizes, it is possible to determine which crop size is safe for preventing information loss. As shown in the results in Figure 18, the difference between the 4k and 8k spectrum becomes significant at frequencies > 0.052 1/pixel.

從上述兩種策略,可以得出數據集在裁剪尺寸>100x100時就足夠了。然而,為了確保在特殊情況下的穩定性,可以選擇裁剪尺寸=400x400作為標準。 From the above two strategies, it can be concluded that the dataset is sufficient when the crop size > 100x100. However, to ensure stability in special cases, crop size = 400x400 can be chosen as the standard.

本申請實施例並提供一種影像模型的訓練方法,包括:透過一光學系統獲取一影像數據集,該影像數據集包括多個高解析度和低解析度影像對,每一高解析度和低解析度影像對包括基於第一焦距得出的具有第一解析度的第一訓練影像(training image)和基於第二焦距得出的具有第二解析度的第二訓練影像,該第一訓練影像和該第二訓練影像具有相同或相應的影像內容,該第一解析度大於該第二解析度;以及將該影像數據集輸入一神經網路模型中以訓練得出一訓練後的影像模型,其中該第一訓練影像作為該神經網路模型的輸入,該第二訓練影像作為訓練標籤。 The present application embodiment provides a method for training an image model, comprising: obtaining an image data set through an optical system, the image data set comprising a plurality of high-resolution and low-resolution image pairs, each high-resolution and low-resolution image pair comprising a first training image (training image) having a first resolution obtained based on a first focal length and a second training image having a second resolution obtained based on a second focal length, the first training image and the second training image having the same or corresponding image content, and the first resolution is greater than the second resolution; and inputting the image data set into a neural network model to train a trained image model, wherein the first training image is used as the input of the neural network model, and the second training image is used as a training label.

圖19顯示一種影片超解析度模型的訓練流程圖。首先,採用UHD8k數據集1901進行訓練,UHD8k數據集是一種合成數據集,可以採用合成數據集先進行預訓練以得出預訓練模型1902。然後,再使用由真實數據組成的設計場景數據集(即,DSD)1903來對該預訓練模型1902進行訓練,以得出最終的視訊超解析度(video super resolution,VSR)模型1905。除了使用DSD外,也可以進一步從影片中獲取更多的高解析度和低解析度影像對作為數據集(此屬於合成數據集)1904來進行模型訓練,得出該VSR模型。視訊超解析度模型可用 來提升視訊/影片的解析度,每一幀都經過超解析度模型,使得解析度提高為例如兩倍。 FIG19 shows a training flow chart of a video super-resolution model. First, the UHD8k dataset 1901 is used for training. The UHD8k dataset is a synthetic dataset. The synthetic dataset can be used for pre-training to obtain a pre-trained model 1902. Then, the pre-trained model 1902 is trained using a designed scene dataset (i.e., DSD) 1903 composed of real data to obtain the final video super-resolution (VSR) model 1905. In addition to using DSD, more high-resolution and low-resolution image pairs can be obtained from the video as a dataset (this belongs to the synthetic dataset) 1904 for model training to obtain the VSR model. The video super-resolution model can be used to upgrade the resolution of a video/movie, where each frame is passed through the super-resolution model, increasing the resolution by, for example, twice.

在從影片中獲取更多的高解析度和低解析度影像對作為數據集來參與模型訓練的過程中,主要從影片中擷取代表性影像,生成代表性影像的高解析度版本和低解析度版本來參與的訓練。為了防止重複影像造成模型訓練結果過度擬合,可以進一步排除影片中的冗餘影像和模糊影像中至少一者,將該影片中剩餘的影像作為代表性影像。具體地,可以透過比較相鄰影像幀之間的相似性來識別冗餘影像,透過評估影像的清晰度來識別模糊影像。 In the process of obtaining more high-resolution and low-resolution image pairs from the film as data sets to participate in model training, representative images are mainly extracted from the film to generate high-resolution and low-resolution versions of the representative images to participate in the training. In order to prevent repeated images from causing over-fitting of the model training results, at least one of the redundant images and blurred images in the film can be further excluded, and the remaining images in the film can be used as representative images. Specifically, redundant images can be identified by comparing the similarity between adjacent image frames, and blurred images can be identified by evaluating the clarity of the image.

在從影片中擷取圖像之前,可以使用雙立方法(Bicubic)對輸入的高解析度(如,8K)影片進行下採樣(如,縮小到原來的1/16),以減少後續步驟的計算時間。在識別出影片中的冗餘影像和模糊影像後,記錄這些不合格幀的幀號,以排除這些幀作為模型訓練的數據集。 Before extracting images from a video, the input high-resolution (e.g., 8K) video can be downsampled (e.g., reduced to 1/16 of the original) using the bicubic method to reduce the computation time of subsequent steps. After identifying redundant and blurred images in the video, the frame numbers of these unqualified frames are recorded to exclude these frames as data sets for model training.

冗餘影像辨識流程如圖20所示。包含三個幀的滑動視窗在影片的各個幀上移動。計算第一幀和第二幀之間的結構相似性指數(SSIM),以及第三幀和第二幀之間的SSIM。如果這些SSIM值都高於冗餘閾值(Tr),則將第二幀識別為冗餘影像。 The redundant image identification process is shown in Figure 20. A sliding window containing three frames moves on each frame of the video. The structural similarity index (SSIM) between the first frame and the second frame, and the SSIM between the third frame and the second frame are calculated. If these SSIM values are both higher than the redundancy threshold ( Tr ), the second frame is identified as a redundant image.

亦即,如果SSIM(f2,f1)>Tr且SSIM(f2,f3)>Tr,則f2為冗餘影像。 That is, if SSIM(f 2 ,f 1 )> Tr and SSIM(f 2 ,f 3 )> Tr , then f 2 is a redundant image.

在後續計算中,如果第二幀被識別為冗餘影像,則保留第一幀。否則,第二幀將成為新的第一幀。 In subsequent calculations, if the second frame is identified as a redundant image, the first frame is retained. Otherwise, the second frame will become the new first frame.

在模糊檢測中,首先設定閾值(Sintensity),以幀的平均灰階值來檢測淡入和淡出幀。如果相鄰兩幀之間的SSIM小於閾值(Tsc),則表示偵測到場景變化。亦即,如果SSIM(fi,f i-1)<Tsc,則將f i 作為新場景的第一幀。然後,計算該場景中第一幀的清晰度,並將其作為參考清晰度(Sref1,Sref2)。如果下一 幀的清晰度(Si1,Si2)小於模糊閾值(Tbl)與參考清晰度的乘積,則稱該幀為模糊的。亦即,如果Si1<Tbl×Sref1或Si2<Tbl×Sref2,則f i 是模糊的。影像的清晰度可以透過兩種方法計算:基於梯度的方法和基於頻譜的方法。參考清晰度Sref1和Sref2可以是分別經由該兩種方法得出的。Si1和Si2也是分別經由該兩種方法得出的。 In blur detection, first set the threshold (S intensity ) and use the average grayscale value of the frame to detect fade-in and fade-out frames. If the SSIM between two adjacent frames is less than the threshold (T sc ), it means that a scene change is detected. That is, if SSIM ( fi , fi - 1 ) < T sc , fi is taken as the first frame of the new scene. Then, calculate the clarity of the first frame in the scene and use it as the reference clarity (S ref1 , S ref2 ). If the clarity of the next frame (S i1 , S i2 ) is less than the product of the blur threshold (T bl ) and the reference clarity, the frame is called blurred. That is, if Si1 < Tbl × Sref1 or Si2 < Tbl × Sref2 , then fi is blurred. The sharpness of an image can be calculated by two methods: a gradient-based method and a spectrum-based method. The reference sharpness Sref1 and Sref2 can be obtained by the two methods, respectively. Si1 and Si2 are also obtained by the two methods, respectively.

以下介紹基於梯度(Gradient)的方法。如果影像中包含了大量的銳利邊緣,則可以認為該影像是銳利影像,這表示銳利影像的梯度圖的強度會高於模糊影像的梯度圖的強度。可以透過拉普拉斯導數(Laplacian derivatives)來得出影像的梯度圖,並可以直方圖來表示梯度的強度分佈。如圖21所示,圖21左側的(a)為模糊影像的梯度直方圖,而圖21右側的(b)為銳利影像的梯度直方圖。對於梯度的分佈,清晰的影像比模糊的影像具有較大的變異數,這意味著標準偏差也可以作為銳利度的指標。在基於梯度的方法中,可以採用以下兩個指標來計算影像的清晰度: The following introduces the gradient-based method. If an image contains a large number of sharp edges, it can be considered that the image is a sharp image, which means that the intensity of the gradient map of the sharp image will be higher than the intensity of the gradient map of the blurred image. The gradient map of the image can be obtained through Laplacian derivatives, and the gradient intensity distribution can be represented by a histogram. As shown in Figure 21, (a) on the left side of Figure 21 is the gradient histogram of the blurred image, and (b) on the right side of Figure 21 is the gradient histogram of the sharp image. For the distribution of gradients, clear images have a larger variance than blurred images, which means that the standard deviation can also be used as an indicator of sharpness. In the gradient-based method, the following two indicators can be used to calculate the clarity of the image:

強度(SIg):梯度直方圖中梯度按強度排序,選擇前0.1%,然後計算其平均值作為清晰強度。 Intensity (S Ig ): The gradients in the gradient histogram are sorted by intensity, the top 0.1% are selected, and then their average is calculated as the clear intensity.

變異數(SV):梯度直方圖的標準差。 Variance ( SV ): standard deviation of the gradient histogram.

以下介紹基於頻譜的方法。傅立葉變換可以展現影像的頻譜。圖22顯示沿著水平軸(如x軸)的頻譜截面圖,圖22左側的(a)為清晰影像的頻譜截面,圖22右側的(b)為模糊影像的頻譜截面。清晰影像的截止頻率比模糊影像的截止頻率大,這是因為高頻訊號在模糊影像中受到抑制。並且,清晰影像的DC區域的頻寬小於模糊影像。清晰影像的低頻區域響應會比模糊影像弱,這是因為強度擴散到高頻區域。在基於頻譜的方法中,可以採用以下兩個指標來計算影像的清晰度: The following introduces the spectrum-based method. Fourier transform can display the spectrum of an image. Figure 22 shows a spectrum cross-section along the horizontal axis (such as the x-axis). (a) on the left side of Figure 22 is a spectrum cross-section of a clear image, and (b) on the right side of Figure 22 is a spectrum cross-section of a blurred image. The cutoff frequency of a clear image is larger than that of a blurred image because high-frequency signals are suppressed in a blurred image. In addition, the bandwidth of the DC region of a clear image is smaller than that of a blurred image. The response of the low-frequency region of a clear image is weaker than that of a blurred image because the intensity diffuses to the high-frequency region. In the spectrum-based method, the following two indicators can be used to calculate the clarity of an image:

變異係數(COV):COV可以透過下式計算,σx和σy分別是頻譜圖中x軸和y軸上強度的標準差。μx和μy分別是x軸和y軸上強度的平均值。 Coefficient of variation (COV): COV can be calculated as follows: σ x and σ y are the standard deviations of the intensity on the x-axis and y-axis of the spectrum graph, respectively. μ x and μ y are the mean values of the intensity on the x-axis and y-axis, respectively.

Figure 113115657-A0305-12-0018-1
Figure 113115657-A0305-12-0018-1

強度:先將頻譜強度歸一化到0~1範圍,再透過下式計算頻譜的總和。分別收集頻譜圖中x軸和y軸上的弱響應,即分別為sum(Ix<0.0001)和sum(Iy<0.0001)。wh是頻譜的寬度和高度。模糊影像的強度(SIs)較高,因為其對高頻的響應比清晰影像弱。 Intensity: Normalize the spectrum intensity to the range of 0~1, and then calculate the sum of the spectrum using the following formula. Collect the weak responses on the x-axis and y-axis of the spectrum graph, i.e. sum(I x <0.0001) and sum(I y <0.0001), respectively. w and h are the width and height of the spectrum. The intensity (S Is ) of the blurred image is higher because its response to high frequencies is weaker than that of the clear image.

Figure 113115657-A0305-12-0018-2
Figure 113115657-A0305-12-0018-2

這裡使用USAF解析度基準來分析所提出的方法在分辨率能力方面的改進。從SSIM、PSNR和DoM的數值可以看出,使用DSD訓練的MANtiny模型優於使用Lanczos3插值方法產生的數據所訓練的模型,如下表1所示。 The USAF resolution benchmark is used here to analyze the improvement of the proposed method in terms of resolution capability. From the values of SSIM, PSNR, and DoM, it can be seen that the MANtiny model trained with DSD outperforms the model trained with data generated by the Lanczos3 interpolation method, as shown in Table 1 below.

Figure 113115657-A0305-12-0018-3
Figure 113115657-A0305-12-0018-3

每種方法的調變傳遞函數(MTF)如圖23所示。x軸是空間頻率,可以看作為圖案的密度。y軸是對比,它代表系統解析密集圖案的能力。當lp/mm<2.24時,本申請提出的方法(即,DSD)具有與8k相同的表現,這意味著本申請的方法增強了該階段的分辨率。使用lanczos3訓練的模型與4k和插值法沒有什麼不同,因為它學習如何解決lanczso3下取樣引起的退化,而不是光學系統中的退化過程。 The modulation transfer function (MTF) of each method is shown in Figure 23. The x-axis is the spatial frequency, which can be regarded as the density of the pattern. The y-axis is the contrast, which represents the ability of the system to resolve dense patterns. When lp/mm<2.24, the method proposed in this application (i.e., DSD) has the same performance as 8k, which means that the method of this application enhances the resolution at this stage. The model trained with lanczos3 is no different from 4k and interpolation methods because it learns how to solve the degradation caused by lanczso3 downsampling, rather than the degradation process in the optical system.

如圖24所示,本申請實施例並提供一種提高影像解析度的裝置2400,其包括:一輸入單元2410、一控制器2420及一輸出單元2430。輸入單元2410用以接收一低解析度影像,輸入單元2410包含但不限於有線或無線輸入界面,有線輸入界面包括USB-C傳輸界面等,無線輸入界面包括WI-FI、藍芽、蜂窩網路傳輸界面等。控制器2420與該輸入單元2402耦接,控制器也可以是具有算數處理邏輯的控制器(例如,中央處理器(CPU)或圖形處理器(GPU))。控制器240中佈建有一影像轉換模型2422,該影像轉換模型2422用以將該低解析度影像轉換成高解析度影像,其中該影像轉換模型2422是利用一影像數據集訓練得出,該影像數據集包括多個高解析度和低解析度影像對,每一高解析度和低解析度影像對包括基於第一焦距得出的具有第一解析度的第一訓練影像和基於第二焦距下得出的具有第二解析度的第二訓練影像,該第一訓練影像和該第二訓練影像具有相同或相應的影像內容,該第一解析度大於該第二解析度。輸出單元2430與控制器2420耦接,輸出單元2430包含但不限於顯示界面,也可以是一個與儲存單元耦接的輸出界面。輸出單元2430用以輸出該高解析度影像,其中該高解析度影像的解析度高於該低解析度影像的解析度。 As shown in FIG. 24 , the present application embodiment provides a device 2400 for improving image resolution, which includes: an input unit 2410, a controller 2420, and an output unit 2430. The input unit 2410 is used to receive a low-resolution image. The input unit 2410 includes but is not limited to a wired or wireless input interface. The wired input interface includes a USB-C transmission interface, etc., and the wireless input interface includes a WI-FI, Bluetooth, cellular network transmission interface, etc. The controller 2420 is coupled to the input unit 2402, and the controller can also be a controller with arithmetic processing logic (for example, a central processing unit (CPU) or a graphics processing unit (GPU)). An image conversion model 2422 is configured in the controller 240, and the image conversion model 2422 is used to convert the low-resolution image into a high-resolution image, wherein the image conversion model 2422 is trained using an image data set, and the image data set includes a plurality of high-resolution and low-resolution image pairs, each of which includes a first training image with a first resolution obtained based on a first focal length and a second training image with a second resolution obtained based on a second focal length, and the first training image and the second training image have the same or corresponding image content, and the first resolution is greater than the second resolution. The output unit 2430 is coupled to the controller 2420, and the output unit 2430 includes but is not limited to a display interface, and can also be an output interface coupled to a storage unit. The output unit 2430 is used to output the high-resolution image, wherein the resolution of the high-resolution image is higher than the resolution of the low-resolution image.

圖25顯示根據本申請實施例的高解析度轉換系統2500的設置示意圖。利用本申請實施例所訓練獲得的高解析度轉換模型,可以將輸入的低解析度影像或影片提升為高解析度影像或影片,以作為輸出。例如,可以將4K影像提升為解析度為8K的高品質影像。並且,可以隨時將應用本發明概念而獲得的設計場景數據集(即,DSD)輸入到該高解析度轉換系統2500中已訓練好的高解析度轉換模型(如圖22中所示的傳遞函數模型),以進一步優化該高解析度轉換模型,提升其輸出的影像或影片的品質,並提升該高解析度轉換模型在訓練上的靈活性。另外,該高解析度轉換模型轉換後的影像或影片可以直接進行實時投 影或實時顯示,用戶可以即時觀看到影像轉換後的效果,減少用戶等待轉換過程的時間。 FIG25 shows a schematic diagram of the configuration of a high-resolution conversion system 2500 according to an embodiment of the present application. Using the high-resolution conversion model trained by the embodiment of the present application, the input low-resolution image or video can be upgraded to a high-resolution image or video for output. For example, a 4K image can be upgraded to a high-quality image with a resolution of 8K. Furthermore, the designed scene data set (i.e., DSD) obtained by applying the concept of the present invention can be input into the trained high-resolution conversion model (such as the transfer function model shown in FIG22 ) in the high-resolution conversion system 2500 at any time to further optimize the high-resolution conversion model, improve the quality of the image or video outputted by it, and enhance the flexibility of the high-resolution conversion model in training. In addition, the images or videos converted by the high-resolution conversion model can be directly projected or displayed in real time, so that users can instantly see the effect of the image conversion, reducing the time users have to wait for the conversion process.

雖然本揭示已用較佳實施例揭露如上,然其並非用以限定本揭示,本揭示所屬技術領域中具有通常知識者在不脫離本揭示之精神和範圍內,當可作各種之更動與潤飾,因此本揭示之保護範圍當視後附之申請專利範圍所界定者為準。 Although the present disclosure has been disclosed as above with the preferred embodiment, it is not intended to limit the present disclosure. A person with ordinary knowledge in the technical field to which the present disclosure belongs can make various changes and modifications without departing from the spirit and scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the scope defined by the attached patent application.

10:影像擷取裝置 10: Image capture device

20:儲存器 20: Storage

30:計算設備 30: Computing equipment

32:處理模組 32: Processing module

34:註冊模組 34:Register module

35:裁切模組 35: Cutting module

100:影像數據採集系統 100: Image data acquisition system

Claims (20)

一種影像數據採集系統,包括: 一影像擷取裝置,用以在第一焦距下擷取一物體的影像以得出第一影像,在第二焦距下擷取該物體的影像以得出第二影像,該第一影像和該第二影像具有相同的解析度; 一儲存器,用以儲存該影像擷取裝置擷取得到的該第一影像和該第二影像; 一處理模組,從該儲存器獲取該第一影像和該第二影像,用以處理該第一影像以得出具有一第一解析度的一第一處理後影像,處理該第二影像以得出具有一第二解析度的一第二處理後影像,該第一解析度大於該第二解析度;以及 一註冊(registration)模組,從該處理模組獲取該第一處理後影像和該第二處理後影像,並對該第一處理後影像和該第二處理後影像進行影像對齊(image alignment),以獲得一高解析度和低解析度影像對(image pair), 其中該註冊模組依據該第一處理後影像和該第二處理後影像的頻譜相位圖(phase map of spectrum)之間的差異來進行影像對齊,其中在進行該影像對齊後,如果該第一處理後影像和該第二處理後影像兩者的相似度(correlation)大於一定值,則該註冊模組將該第一處理後影像和該第二處理後影像儲存成該高解析度和低解析度影像對。 An image data acquisition system includes: an image capture device for capturing an image of an object at a first focal length to obtain a first image, and capturing an image of the object at a second focal length to obtain a second image, wherein the first image and the second image have the same resolution; a memory for storing the first image and the second image captured by the image capture device; a processing module for acquiring the first image and the second image from the memory, and processing the first image to obtain a first processed image with a first resolution, and processing the second image to obtain a second processed image with a second resolution, wherein the first resolution is greater than the second resolution; and A registration module obtains the first processed image and the second processed image from the processing module, and performs image alignment on the first processed image and the second processed image to obtain a high-resolution and low-resolution image pair, wherein the registration module performs image alignment based on the difference between the phase map of spectrum of the first processed image and the second processed image, wherein after the image alignment, if the correlation between the first processed image and the second processed image is greater than a certain value, the registration module stores the first processed image and the second processed image as the high-resolution and low-resolution image pair. 如請求項1所述的影像數據採集系統,其中該處理模組包括一裁切模組,其用以裁切該第一影像中的一第一區域以得出該第一處理後影像,裁切該第二影像中的一第二區域以得出該第二處理後影像,該第一區域的影像內容與該第二區域的影像內容相對應,該第一區域的解析度大於該第二區域的解析度。An image data acquisition system as described in claim 1, wherein the processing module includes a cropping module, which is used to crop a first area in the first image to obtain the first processed image, and to crop a second area in the second image to obtain the second processed image, the image content of the first area corresponds to the image content of the second area, and the resolution of the first area is greater than the resolution of the second area. 如請求項2所述的影像數據採集系統,其中該第一焦距為f1,該第二焦距為f2,則f1 = A * f2,其中A > 1,且其中該第一區域的解析度為X,且該第二區域的解析度為X/A。An image data acquisition system as described in claim 2, wherein the first focal length is f1, the second focal length is f2, then f1 = A * f2, wherein A > 1, and wherein the resolution of the first area is X, and the resolution of the second area is X/A. 如請求項3所述的影像數據採集系統,其中A=2,且所獲得的高解析度和低解析度影像對用於訓練適於提升兩倍影像解析度的模型。An image data acquisition system as described in claim 3, wherein A=2, and the obtained high-resolution and low-resolution image pairs are used to train a model suitable for increasing the image resolution by twice. 如請求項2所述的影像數據採集系統,其中該第一區域的解析度大於等於100 ×100。An image data acquisition system as described in claim 2, wherein a resolution of the first area is greater than or equal to 100×100. 如請求項2所述的影像數據採集系統,還包括一標準偏差過濾器(standard deviation filter),用以基於該第一影像的灰階值標準差和該第一區域的影像的灰階值標準差,決定是否剔除或保留所裁切的該第一區域的影像。The image data acquisition system as described in claim 2 further includes a standard deviation filter for determining whether to discard or retain the cropped image of the first area based on the standard deviation of the grayscale value of the first image and the standard deviation of the grayscale value of the image of the first area. 如請求項1所述的影像數據採集系統,其中該影像擷取裝置的第一焦距和第二焦距是透過對校正影像(calibration image)進行放大倍率校正(magnification calibration)來確定。An image data acquisition system as described in claim 1, wherein the first focal length and the second focal length of the image capture device are determined by performing magnification calibration on a calibration image. 如請求項7所述的影像數據採集系統,其中對該校正影像的放大倍率校正是基於餘弦譜法(cosine pattern spectrum)和傅立葉梅林變換(Fourier Mellin transform)來實現。An image data acquisition system as described in claim 7, wherein the magnification correction of the correction image is implemented based on cosine pattern spectrum and Fourier Mellin transform. 一種影像模型的訓練方法,包括: 透過一光學系統獲取一影像數據集,該影像數據集包括多個高解析度和低解析度影像對(image pair),每一高解析度和低解析度影像對包括基於一第一焦距得出的具有一第一解析度的一第一訓練影像(training image)和基於一第二焦距得出的具有一第二解析度的一第二訓練影像,該第一訓練影像和該第二訓練影像具有相同或相應的影像內容,該第一解析度大於該第二解析度; 將該影像數據集輸入一神經網路模型(neural network model)中以訓練得出一訓練後的影像模型,其中該第一訓練影像作為該神經網路模型的輸入,該第二訓練影像作為訓練標籤;以及 依據該第一訓練影像和該第二訓練影像的頻譜相位圖(phase map of spectrum)之間的差異來進行影像對齊,其中在進行該影像對齊後,如果該第一訓練影像和該第二訓練影像兩者的相似度(correlation)大於一定值,則將該第一訓練影像和該第二訓練影像儲存成該高解析度和低解析度影像對。 A method for training an image model, comprising: Obtaining an image dataset through an optical system, the image dataset comprising a plurality of high-resolution and low-resolution image pairs, each of which comprises a first training image having a first resolution based on a first focal length and a second training image having a second resolution based on a second focal length, the first training image and the second training image having the same or corresponding image content, and the first resolution is greater than the second resolution; Inputting the image dataset into a neural network model to train a trained image model, wherein the first training image is used as an input of the neural network model, and the second training image is used as a training label; and Image alignment is performed based on the difference between the phase map of spectrum of the first training image and the second training image, wherein after the image alignment, if the correlation between the first training image and the second training image is greater than a certain value, the first training image and the second training image are stored as the high-resolution and low-resolution image pair. 如請求項9所述的影像模型的訓練方法,還包括: 裁切一第一影像中的一第一區域以得出該第一訓練影像; 裁切一第二影像中的一第二區域以得出該第二訓練影像, 其中該第一區域的影像內容與該第二區域的影像內容相對應,該第一區域的解析度大於該第二區域的解析度。 The image model training method as described in claim 9 further includes: Cropping a first area in a first image to obtain the first training image; Cropping a second area in a second image to obtain the second training image, wherein the image content of the first area corresponds to the image content of the second area, and the resolution of the first area is greater than the resolution of the second area. 如請求項10所述的影像模型的訓練方法,其中該第一焦距為f1,該第二焦距為f2,則f1 = A * f2,其中A > 1,且其中該第一區域的解析度為X,且該第二區域的解析度為X/A。The image model training method as described in claim 10, wherein the first focal length is f1, the second focal length is f2, then f1 = A * f2, wherein A > 1, and wherein the resolution of the first area is X, and the resolution of the second area is X/A. 如請求項11所述的影像模型的訓練方法,其中A=2,且該訓練後的影像模型為適於提升兩倍影像解析度的模型。The image model training method as described in claim 11, wherein A=2, and the trained image model is a model suitable for improving the image resolution by two times. 如請求項10所述的影像模型的訓練方法,還包括: 基於該第一影像的灰階值標準差(standard deviation of grayscale values)和該第一區域的影像的灰階值標準差,決定是否剔除或保留所裁切的該第一區域的影像。 The image model training method as described in claim 10 further includes: Determining whether to remove or retain the cropped image of the first area based on the standard deviation of grayscale values of the first image and the standard deviation of grayscale values of the image of the first area. 如請求項9所述的影像模型的訓練方法,還包括:決定該影像數據集中影像的最小裁切尺寸(cropped size),其包括: 獲取一高解析度影像和具有與該高解析度影像對應的影像內容的一低解析度影像; 以多種不同尺寸對該高解析度影像進行裁切,並擷取該低解析度影像中相同的區域,以獲得高解析度和低解析度影像對; 基於每一種尺寸下的高解析度和低解析度影像對對模型的訓練結果,決定出該最小裁切尺寸。 The image model training method as described in claim 9 further includes: determining the minimum cropped size of the image in the image dataset, which includes: Obtaining a high-resolution image and a low-resolution image having image content corresponding to the high-resolution image; Cropping the high-resolution image at multiple different sizes and capturing the same area in the low-resolution image to obtain high-resolution and low-resolution image pairs; Based on the training results of the model for the high-resolution and low-resolution image pairs at each size, determining the minimum cropped size. 如請求項9所述的影像模型的訓練方法,還包括:決定該影像數據集中影像的最小裁切尺寸,其包括: 獲取高解析度影像和具有與該高解析度影像對應的影像內容的低解析度影像; 對該高解析度影像和該低解析度影像進行快速傅立葉轉換(fast Fourier transform)以分別獲得高解析度影像頻譜(spectrum)和低解析度影像頻譜; 針對該高解析度影像和該低解析度影像中的低頻區域(low-frequency region),以多種不同尺寸對該高解析度影像和該低解析度影像進行裁切; 基於這些尺寸下,該高解析度影像頻譜和該低解析度影像頻譜之間的相關性,從這些尺寸中決定出該最小裁切尺寸。 The training method of the image model as described in claim 9 further includes: determining the minimum cropping size of the image in the image dataset, which includes: Obtaining a high-resolution image and a low-resolution image having image content corresponding to the high-resolution image; Performing a fast Fourier transform on the high-resolution image and the low-resolution image to obtain a high-resolution image spectrum and a low-resolution image spectrum, respectively; For the low-frequency region in the high-resolution image and the low-resolution image, cropping the high-resolution image and the low-resolution image at a plurality of different sizes; Based on the correlation between the high-resolution image spectrum and the low-resolution image spectrum at these sizes, determining the minimum cropping size from these sizes. 如請求項9所述的影像模型的訓練方法,還包括: 從一影片中擷取代表性影像;以及 生成所述代表性影像的高解析度版本和低解析度版本來參與該神經網路模型的訓練。 The image model training method as described in claim 9 further includes: Extracting a representative image from a video; and Generating a high-resolution version and a low-resolution version of the representative image to participate in the training of the neural network model. 如請求項16所述的影像模型的訓練方法,其中從該影片中擷取所述代表性影像的步驟包括: 識別該影片中的冗餘影像及/或識別該影片中的模糊影像;以及 排除該影片中的冗餘影像和模糊影像中至少一者,將該影片中剩餘的影像作為所述代表性影像。 The training method of the image model as described in claim 16, wherein the step of extracting the representative image from the video includes: Identifying redundant images in the video and/or identifying blurred images in the video; and Excluding at least one of the redundant images and blurred images in the video, and using the remaining images in the video as the representative images. 如請求項17所述的影像模型的訓練方法,其中識別該影片中的冗餘影像的步驟包括: 透過比較相鄰影像幀之間的相似性來識別該冗餘影像。 The method for training an image model as described in claim 17, wherein the step of identifying redundant images in the video includes: Identifying the redundant images by comparing the similarities between adjacent image frames. 如請求項17所述的影像模型的訓練方法,其中識別該影片中的模糊影像的步驟包括: 透過評估影像的清晰度來識別該模糊影像。 The image model training method as described in claim 17, wherein the step of identifying the blurred image in the video includes: Identifying the blurred image by evaluating the clarity of the image. 一種提高影像解析度的裝置,包括: 一輸入單元,用以接收一低解析度影像; 一控制器,與該輸入單元耦接,該控制器中佈建有一影像轉換模型,該影像轉換模型用以將該低解析度影像轉換成一高解析度影像,其中該影像轉換模型是利用一影像數據集訓練得出,該影像數據集包括多個高解析度和低解析度影像對(image pair),每一高解析度和低解析度影像對包括基於一第一焦距得出的具有一第一解析度的一第一訓練影像(training image)和基於一第二焦距得出的具有一第二解析度的一第二訓練影像,該第一訓練影像和該第二訓練影像具有相同或相應的影像內容,該第一解析度大於該第二解析度,其中該第一訓練影像和該第二訓練影像是經過影像對齊和滿足相似度條件而獲得的影像,該影像對齊是基於該第一訓練影像和該第二訓練影像的頻譜相位圖(phase map of spectrum)之間的差異來進行的,且在進行該影像對齊後,如果該第一訓練影像和該第二訓練影像兩者的相似度(correlation)大於一定值,則該第一訓練影像和該第二訓練影像因滿足相似度條件而被儲存成該高解析度和低解析度影像對;以及 一輸出單元,與該控制器耦接,用以輸出該高解析度影像,其中該高解析度影像的解析度高於該低解析度影像的解析度。 A device for improving image resolution, comprising: an input unit for receiving a low-resolution image; a controller coupled to the input unit, wherein an image conversion model is arranged in the controller, and the image conversion model is used to convert the low-resolution image into a high-resolution image, wherein the image conversion model is obtained by training an image data set, and the image data set includes a plurality of high-resolution and low-resolution image pairs (image pairs), each of which includes a first training image (training image) with a first resolution obtained based on a first focal length. image) and a second training image with a second resolution obtained based on a second focal length, the first training image and the second training image have the same or corresponding image content, the first resolution is greater than the second resolution, wherein the first training image and the second training image are images obtained by image alignment and satisfying a similarity condition, the image alignment is performed based on the difference between the first training image and the second training image's phase map of spectrum, and after the image alignment, if the correlation between the first training image and the second training image is greater than a certain value, the first training image and the second training image are stored as the high-resolution and low-resolution image pair because they satisfy the similarity condition; and An output unit is coupled to the controller and is used to output the high-resolution image, wherein the resolution of the high-resolution image is higher than the resolution of the low-resolution image.
TW113115657A 2024-04-26 2024-04-26 Image data collection system, image model training method, and device for improving image resolution TWI884776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW113115657A TWI884776B (en) 2024-04-26 2024-04-26 Image data collection system, image model training method, and device for improving image resolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW113115657A TWI884776B (en) 2024-04-26 2024-04-26 Image data collection system, image model training method, and device for improving image resolution

Publications (2)

Publication Number Publication Date
TWI884776B true TWI884776B (en) 2025-05-21
TW202542833A TW202542833A (en) 2025-11-01

Family

ID=96582216

Family Applications (1)

Application Number Title Priority Date Filing Date
TW113115657A TWI884776B (en) 2024-04-26 2024-04-26 Image data collection system, image model training method, and device for improving image resolution

Country Status (1)

Country Link
TW (1) TWI884776B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200943936A (en) * 2008-02-01 2009-10-16 Omnivision Cdm Optics Inc Fusing of images captured by a multi-aperture imaging system
TW202236840A (en) * 2021-01-28 2022-09-16 美商高通公司 Image fusion for scenes with objects at multiple depths
CN116977164A (en) * 2022-04-19 2023-10-31 武汉Tcl集团工业研究院有限公司 Data processing method, device, computer equipment and computer readable storage medium
US20230419437A1 (en) * 2016-09-30 2023-12-28 Qualcomm Incorporated Systems and methods for fusing images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200943936A (en) * 2008-02-01 2009-10-16 Omnivision Cdm Optics Inc Fusing of images captured by a multi-aperture imaging system
US20230419437A1 (en) * 2016-09-30 2023-12-28 Qualcomm Incorporated Systems and methods for fusing images
TW202236840A (en) * 2021-01-28 2022-09-16 美商高通公司 Image fusion for scenes with objects at multiple depths
CN116977164A (en) * 2022-04-19 2023-10-31 武汉Tcl集团工业研究院有限公司 Data processing method, device, computer equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
Feng et al. Normalized energy density-based forensic detection of resampled images
US9224189B2 (en) Method and apparatus for combining panoramic image
JP4415188B2 (en) Image shooting device
US8213741B2 (en) Method to generate thumbnails for digital images
CN101605209A (en) Camera head and image-reproducing apparatus
RU2706891C1 (en) Method of generating a common loss function for training a convolutional neural network for converting an image into an image with drawn parts and a system for converting an image into an image with drawn parts
CN112861960B (en) Image tampering detection method, system and storage medium
CN114998261A (en) Double-current U-Net image tampering detection network system and image tampering detection method thereof
CN119540812B (en) A target detection method for surveillance video based on multi-frame high-frequency difference enhancement
CN105631890B (en) Picture quality evaluation method out of focus based on image gradient and phase equalization
Mehta et al. Near-duplicate detection for LCD screen acquired images using edge histogram descriptor
CN117156297A (en) Multi-image fusion processing method and device, electronic equipment and storage medium
TWI884776B (en) Image data collection system, image model training method, and device for improving image resolution
CN116630529A (en) Three-dimensional image acquisition system and reconstruction method
CN112150363B (en) Convolutional neural network-based image night scene processing method, computing module for operating method and readable storage medium
Wang et al. A unified framework of source camera identification based on features
US20250336036A1 (en) Image data collection system, image model training method, and device for improving image resolution
CN116152183B (en) A no-reference image quality assessment method based on distortion prior learning
TW202542833A (en) Image data collection system, image model training method, and device for improving image resolution
Xie et al. Feature dimensionality reduction for example-based image super-resolution
Yu et al. Continuous digital zooming of asymmetric dual camera images using registration and variational image restoration
CN113379608B (en) Image processing method, storage medium and terminal device
JP6063680B2 (en) Image generation apparatus, image generation method, imaging apparatus, and imaging method
CN113935910A (en) A deep learning-based image blur length measurement method
CN120321349B (en) AI-based automatic repair method and equipment for video flickering