| Reltr: Relation transformer for scene graph generation Y Cong, MY Yang, B Rosenhahn IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (9), 11169 …, 2023 | 286 | 2023 |
| Spatial-temporal transformer for dynamic scene graph generation Y Cong, W Liao, H Ackermann, B Rosenhahn, MY Yang Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 218 | 2021 |
| Flatten: optical flow-guided attention for consistent text-to-video editing Y Cong, M Xu, C Simon, S Chen, J Ren, Y Xie, JM Perez-Rua, ... arXiv preprint arXiv:2310.05922, 2023 | 144 | 2023 |
| Gentron: Diffusion transformers for image and video generation S Chen, M Xu, J Ren, Y Cong, S He, Y Xie, A Sinha, P Luo, T Xiang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 114 | 2024 |
| Attribute-centric compositional text-to-image generation Y Cong, MR Min, LE Li, B Rosenhahn, MY Yang International Journal of Computer Vision 133 (7), 4555-4570, 2025 | 24 | 2025 |
| Nodis: Neural ordinary differential scene understanding Y Cong, H Ackermann, W Liao, MY Yang, B Rosenhahn European Conference on Computer Vision, 636-653, 2020 | 22 | 2020 |
| Learning flow fields in attention for controllable person image generation Z Zhou, S Liu, X Han, H Liu, KW Ng, T Xie, Y Cong, H Li, M Xu, ... Proceedings of the Computer Vision and Pattern Recognition Conference, 2491-2501, 2025 | 14 | 2025 |
| Ssgvs: Semantic scene graph-to-video synthesis Y Cong, J Yi, B Rosenhahn, MY Yang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 12 | 2023 |
| Worldafford: Affordance grounding based on natural language instructions C Chen, Y Cong, Z Kan 2024 IEEE 36th International Conference on Tools with Artificial …, 2024 | 11 | 2024 |
| SPAN: Learning similarity between scene graphs and images with transformers Y Cong, W Liao, B Rosenhahn, MY Yang arXiv preprint arXiv:2304.00590, 2023 | 9 | 2023 |
| Segment any object model (saom): Real-to-simulation fine-tuning strategy for multi-class multi-instance segmentation M Khan, Y Qiu, Y Cong, B Rosenhahn, J Abu-Khalaf, D Suter 2024 IEEE International Conference on Image Processing (ICIP), 582-588, 2024 | 6 | 2024 |
| Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change M Khan, Y Qiu, Y Cong, B Rosenhahn, D Suter, J Abu-Khalaf 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2024 | 2 | 2024 |
| PanoSCU: A Dataset for Panoramic Indoor Scene Understanding M Khan, Y Qiu, Y Cong, J Abu-Khalaf, D Suter, B Rosenhahn IEEE Access, 2025 | 1 | 2025 |
| HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming H Qiu, S Liu, Z Zhou, Z An, W Ren, Z Liu, J Schult, S He, S Chen, Y Cong, ... arXiv preprint arXiv:2512.21338, 2025 | | 2025 |
| Scaling Zero-Shot Reference-to-Video Generation Z Zhou, S Liu, H Liu, H Qiu, Z An, W Ren, Z Liu, X Huang, KW Ng, T Xie, ... arXiv preprint arXiv:2512.06905, 2025 | | 2025 |
| TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Z Liu, W Ren, H Liu, Z Zhou, S Chen, H Qiu, X Huang, Z An, F Yang, ... arXiv preprint arXiv:2512.02014, 2025 | | 2025 |
| Mixture of States: Routing Token-Level Dynamics for Multimodal Generation H Liu, D Liu, M Zhuge, Z Zhou, T Xie, S He, Y Yang, S Liu, Y Cong, J Guo, ... arXiv preprint arXiv:2511.12207, 2025 | | 2025 |
| FDSG: Forecasting Dynamic Scene Graphs Y Yang, Y Cong, H Cheng, B Rosenhahn, MY Yang arXiv preprint arXiv:2506.01487, 2025 | | 2025 |
| Holistic scene understanding through image and video scene graphs Y Cong Hannover: Institutionelles Repositorium der Leibniz Universität Hannover, 2024 | | 2024 |