Code for paper: Hierarchical Co-attention Propagation Network for Zero-Shot Video Object Segmentation
-
Python (3.6.12)
-
PyTorch (version:1.7.0)
-
Requirements in the requirements.txt files.
- Download the DAVIS-2017 dataset from DAVIS.
- Download the YouTube-VOS dataset from YouTube-VOS.
- Download the YouTube-hed and DAVIS-hed datasets from DuBox code: 1gih.
- Download the YouTube-ctr and DAVIS-ctr datasets from GoogleDriver.
- The optical flow files are obtained by RAFT, we provide demo code that can be run directly on path
flow
. We also provide optical flow of YouTube-VOS (18G) in DuBox code: w9yn, optical flow of DAVIS can be found in Section Testing.
Please ensure the datasets are organized as following format.
YouTube-VOS
|----train
|----Annotations
|----Annotations_ctr
|----JPEGImages
|----YouTube-flow
|----YouTube-hed
|----meta.json
|----valid
|----Annotations
|----JPEGImages
|----meta.json
DAVIS
|----Annotations
|----Annotations_ctr
|----ImageSets
|----JPEGImages
|----davis-flow
|----davis-hed
Change your dataset paths, then run python train.py
for training model.
We also provide multi-GPU parallel code based on apex.
Run CUDA_VISIBLE_DEVICES="0,1,2,3" python -m torch.distributed.launch --nproc_per_node 4 train_apex.py
for distributed training in Pytorch.
Please change the path in two codes (libs/utils/config_davis.py
and libs/utils/config_youtubevos.py
) to your own dataset path.
If you want to test the model results directly, you can follow the settings below.
-
Download the pretrained model from GoogleDrive and put it into the "model/HCPN" files.
-
Download the optical flow of DAVIS from GoogleDrive.
The code directory structure is as follows.
HCPN
|----libs
|----model
|----apply_densecrf_davis.py
|----args.py
|----train.py
|----test.py
-
Change your path in
test.py
, then runpython test.py
. -
Evaluation code from DAVIS_Evaluation, the python version is available atPyDavis16EvalToolbox.
If you are not able to run our code but interested in our results, the segmentation results can be downloaded from GoogleDrive.
- DAVIS-16:
In the inference stage, we ran using the 512x512 size of DAVIS (480p).
Mean J&F | J score | F score |
---|---|---|
85.6 | 85.8 | 85.4 |
- Youtube-Objects:
Airplane | Bird | Boat | Car | Cat | Cow | Dog | Horse | Motorbike | Train | Mean |
---|---|---|---|---|---|---|---|---|---|---|
84.5 | 79.6 | 67.3 | 87.8 | 74.1 | 71.2 | 76.5 | 66.2 | 65.8 | 59.7 | 73.3 |
- FBMS:
Mean J |
---|
78.3 |
- DAVIS-17:
Mean J&F | J score | F score |
---|---|---|
70.7 | 68.7 | 72.7 |
DAVIS-2017
- Motion-Attentive Transition for Zero-Shot Video Object Segmentation, AAAI 2020 (https://github.com/tfzhou/MATNet)
- Video Object Segmentation Using Space-Time Memory Networks, ICCV 2019 (https://github.com/seoungwugoh/STM)
- See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks, CVPR 2019 (https://github.com/carrierlxk/COSNet)