This is the official repository of XVFI (eXtreme Video Frame Interpolation)
[ArXiv_ver.] [ICCV2021_ver.] [Supp.] [Demo(YouTube)] [Oral12mins(YouTube)] [Flowframes(GUI)] [Poster]
Last Update: 20211130 - We provide extended input sequences for X-TEST. Please refer to X4K1000FPS
We provide the training and test code along with the trained weights and the dataset (train+test) used for XVFI. If you find this repository useful, please consider citing our paper.
The 4K@30fps input frames are interpolated to be 4K@240fps frames. All results are encoded at 30fps to be played as x8 slow motion and spatially down-scaled due to the limit of file sizes. All methods are trained on X-TRAIN.
Some examples of X4K1000FPS dataset, which are frames of 1000-fps and 4K-resolution. Our dataset contains the various scenes with extreme motions. (Displayed in spatiotemporally subsampled .gif files)
We provide our X4K1000FPS dataset which consists of X-TEST and X-TRAIN. Please refer to our main/suppl. paper for the details of the dataset. You can download the dataset from this dropbox link.
X-TEST
consists of 15 video clips with 33-length of 4K-1000fps frames. It follows the below directory format:
├──── YOUR_DIR/
├──── test/
├──── Type1/
├──── TEST01/
├──── 0000.png
├──── ...
└──── 0032.png
├──── TEST02/
├──── 0000.png
├──── ...
└──── 0032.png
├──── ...
├──── ...
Extended version of X-TEST
issue#9.
As described in our paper, we assume that the number of input frames for VFI is fixed to 2 in X-TEST. However, for the VFI methods that require more than 2 input frames, we provide an extended version of X-TEST which contains 8 input frames (in a temporal distance of 32 frames) for each test seqeuence. The middle two adjacent frames among the 8 frames are the same input frames in the original X-TEST. To sort .png files properly by their file names, we added 1000 to the frame indices (e.g. '0000.png' and '0032.png' in the original version of X-TEST correspond to '1000.png' and '1032.png', respectively, in the extended version of X-TEST). Please note that the extended one consists of input frames only, without the ground truth intermediate frames ('1001.png'~'1031.png'). In addition, for the sequence 'TEST11_078_f4977', '1064.png', '1096.png' and '1128.png' are replicated frames since '1064.png' is the last frame of the raw video file.
The extended version of X-TEST can be downloaded from the link.
X-TRAIN
consists of 4,408 clips from various types of 110 scenes. The clips are 65-length of 1000fps frames. Each frame is the size of 768x768 cropped from 4K frame. It follows the below directory format:
├──── YOUR_DIR/
├──── train/
├──── 002/
├──── occ008.320/
├──── 0000.png
├──── ...
└──── 0064.png
├──── occ008.322/
├──── 0000.png
├──── ...
└──── 0064.png
├──── ...
├──── ...
After downloading the files from the link, decompress the encoded_test.tar.gz
and encoded_train.tar.gz
. The resulting .mp4 files can be decoded into .png files via running mp4_decoding.py
. Please follow the instruction written in mp4_decoding.py
.
Our code is implemented using PyTorch1.7, and was tested under the following setting:
- Python 3.7
- PyTorch 1.7.1
- CUDA 10.2
- cuDNN 7.6.5
- NVIDIA TITAN RTX GPU
- Ubuntu 16.04 LTS
Caution: since there is "align_corners" option in "nn.functional.interpolate" and "nn.functional.grid_sample" in PyTorch1.7, we recommend you to follow our settings. Especially, if you use the other PyTorch versions, it may lead to yield a different performance.
- Download the source codes in a directory of your choice <source_path>.
- First download our X-TEST test dataset by following the above section 'X4K1000FPS'.
- Download the pre-trained weights, which was trained by X-TRAIN, from this link to place in <source_path>/checkpoint_dir/XVFInet_X4K1000FPS_exp1.
XVFI
└── checkpoint_dir
└── XVFInet_X4K1000FPS_exp1
├── XVFInet_X4K1000FPS_exp1_latest.pt
- Run main.py with the following options in parse_args:
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8
==> It would yield (PSNR/SSIM/tOF) = (30.12/0.870/2.15).
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 3 --multiple 8
==> It would yield (PSNR/SSIM/tOF) = (28.86/0.858/2.67).
- After running with the above test option, you can get the result images in <source_path>/test_img_dir/XVFInet_X4K1000FPS_exp1, then obtain the PSNR/SSIM/tOF results per each test clip as "total_metrics.csv" in the same folder.
- Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
- You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.
- Download the source codes in a directory of your choice <source_path>.
- First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in <source_path>/vimeo_triplet.
XVFI
└── vimeo_triplet
├── sequences
readme.txt
tri_testlist.txt
tri_trainlist.txt
- Download the pre-trained weights (XVFI-Net_v), which was trained by Vimeo90K, from this link to place in <source_path>/checkpoint_dir/XVFInet_Vimeo_exp1.
XVFI
└── checkpoint_dir
└── XVFInet_Vimeo_exp1
├── XVFInet_Vimeo_exp1_latest.pt
- Run main.py with the following options in parse_args:
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 2
==> It would yield PSNR = 35.07 on Vimeo90K.
- After running with the above test option, you can get the result images in <source_path>/test_img_dir/XVFInet_Vimeo_exp1.
- There are certain code lines in front of the 'def main()' for a convenience when running with the Vimeo option.
- The SSIM result of 0.9760 as in Fig. 8 was measured by matlab ssim function for a fair comparison after running the above guide because other SOTA methods did so. We also upload "compare_psnr_ssim.m" matlab file to obtain it.
It should be noted that there is a typo "S_trn and S_tst are set to 2" in the current version of XVFI paper, which should be modified to 1 (not 2), sorry for inconvenience.-> Updated in the latest arXiv version.
- Download the source codes in a directory of your choice <source_path>.
- First prepare your own video datasets in <source_path>/custom_path by following a hierarchy as belows:
XVFI
└── custom_path
├── scene1
├── 'xxx.png'
├── ...
└── 'xxx.png'
...
├── sceneN
├── 'xxxxx.png'
├── ...
└── 'xxxxx.png'
-
Download the pre-trained weights trained on X-TRAIN or Vimeo90K as decribed above.
-
Run main.py with the following options in parse_args (ex) x8 Multi-Frame Interpolation):
# For the model trained on X-TRAIN
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8 --custom_path './custom_path'
# For the model trained on Vimeo90K
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 8 --custom_path './custom_path'
- Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
- You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.
- It only supports for '.png' format.
- Since we can not cover diverse possibilites of naming rule for custom frames, please sort your own frames properly.
- Download the source codes in a directory of your choice <source_path>.
- First download our X-TRAIN train/val/test datasets by following the above section 'X4K1000FPS' and place them as belows:
XVFI
└── X4K1000FPS
├── train
├── 002
├── ...
└── 172
├── val
├── Type1
├── Type2
├── Type3
├── test
├── Type1
├── Type2
├── Type3
- Run main.py with the following options in parse_args:
python main.py --phase 'train' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_trn 3 --S_tst 5
- Download the source codes in a directory of your choice <source_path>.
- First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in <source_path>/vimeo_triplet.
XVFI
└── vimeo_triplet
├── sequences
readme.txt
tri_testlist.txt
tri_trainlist.txt
- Run main.py with the following options in parse_args:
python main.py --phase 'train' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_trn 1 --S_tst 1
- You can freely regulate other arguments in the parser of main.py, here
- We also provide all visual results (x8 Multi-Frame Interpolation) on X-TEST for an easier comparison as belows. Each zip file has about 1~1.5GB.
- AdaCoFo, AdaCoFf, FeFlowo, FeFlowf, DAINo, DAINf, XVFI-Net (Stst=3), XVFI-Net (Stst=5)
- The quantitative comparisons (Table2 and Figure5) are attached as belows for a reference. \
Hyeonjun Sim*, Jihyong Oh*, and Munchurl Kim "XVFI: eXtreme Video Frame Interpolation", In ICCV, 2021. (* equal contribution)
BibTeX
@inproceedings{sim2021xvfi,
title={XVFI: eXtreme Video Frame Interpolation},
author={Sim, Hyeonjun and Oh, Jihyong and Kim, Munchurl},
booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
year={2021}
}
If you have any question, please send an email to either
[Hyeonjun Sim] - flhy5836@kaist.ac.kr or
[Jihyong Oh] - jhoh94@kaist.ac.kr.
The source codes and datasets can be freely used for research and education only. Any commercial use should get formal permission first.