| Qwen3 technical report A Yang, A Li, B Yang, B Zhang, B Hui, B Zheng, B Yu, C Gao, C Huang, ... arXiv preprint arXiv:2505.09388, 2025 | 5454 | 2025 |
| Qwen2. 5-vl technical report S Bai, K Chen, X Liu, J Wang, W Ge, S Song, K Dang, P Wang, S Wang, ... arXiv preprint arXiv:2502.13923, 2025 | 3491 | 2025 |
| Qwen2. 5-omni technical report J Xu, Z Guo, J He, H Hu, T He, S Bai, K Chen, J Wang, Y Fan, K Dang, ... arXiv preprint arXiv:2503.20215, 2025 | 411 | 2025 |
| Textfield: Learning a deep direction field for irregular scene text detection Y Xu, Y Wang, W Zhou, Y Wang, Z Yang, X Bai IEEE Transactions on Image Processing 28 (11), 5566-5579, 2019 | 372 | 2019 |
| Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text W Wang, E Xie, X Li, X Liu, D Liang, Z Yang, T Lu, C Shen IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (9), 5349-5367, 2021 | 216 | 2021 |
| Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping J Tang, Z Yang, Y Wang, Q Zheng, Y Xu, X Bai Pattern recognition 96, 106954, 2019 | 182 | 2019 |
| MOST: A multi-oriented scene text detector with localization refinement M He, M Liao, Z Yang, H Zhong, J Tang, W Cheng, C Yao, Y Wang, X Bai Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 132 | 2021 |
| Parsing table structures in the wild R Long, W Wang, N Xue, F Gao, Z Yang, Y Wang, GS Xia Proceedings of the IEEE/CVF International Conference on Computer Vision, 944-952, 2021 | 89 | 2021 |
| OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition J Wan, S Song, W Yu, Y Liu, W Cheng, F Huang, X Bai, C Yao, Z Yang* Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 81 | 2024 |
| ICPR2018 contest on robust reading for multi-type web images M He, Y Liu, Z Yang, S Zhang, C Luo, F Gao, Q Zheng, Y Wang, X Zhang, ... 2018 24th international conference on pattern recognition (ICPR), 7-12, 2018 | 80 | 2018 |
| Vision based hand gesture recognition Y Zhu, Z Yang, B Yuan 2013 international conference on service sciences (ICSS), 260-265, 2013 | 70 | 2013 |
| Arbitrarily-oriented text detection in low light natural scene images M Xue, P Shivakumara, C Zhang, Y Xiao, T Lu, U Pal, D Lopresti, Z Yang IEEE Transactions on Multimedia 23, 2706-2720, 2020 | 56 | 2020 |
| Vision-language pre-training for boosting scene text detectors S Song, J Wan, Z Yang, J Tang, W Cheng, X Bai, C Yao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 52 | 2022 |
| Revisiting document image dewarping by grid regularization X Jiang, R Long, N Xue, Z Yang, C Yao, GS Xia Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 50 | 2022 |
| Qvq: To see the world with wisdom, December 2024 Q Team URL https://qwenlm. github. io/blog/qvq-72b-preview, 0 | 36 | |
| Fast: Faster arbitrarily-shaped text detector with minimalist kernel representation Z Chen, J Wang, W Wang, G Chen, E Xie, P Luo, T Lu arXiv preprint arXiv:2111.02394, 2021 | 35 | 2021 |
| CC-OCR: A comprehensive and challenging ocr benchmark for evaluating large multimodal models in literacy Z Yang, J Tang, Z Li, P Wang, J Wan, H Zhong, X Liu, M Yang, P Wang, ... arXiv preprint arXiv:2412.02210, 2024 | 32 | 2024 |
| Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting W Wang, X Liu, X Ji, E Xie, D Liang, ZB Yang, T Lu, C Shen, P Luo European Conference on Computer Vision, 457-473, 2020 | 30 | 2020 |
| Qwen3-vl: Sharper vision, deeper thought, broader action Q Team Qwen Blog. Accessed, 10-04, 2025 | 27* | 2025 |
| Omniparser v2: Structured-points-of-thought for unified visual text parsing and its generality to multimodal large language models W Yu, Z Yang, J Wan, S Song, J Tang, W Cheng, Y Liu, X Bai arXiv preprint arXiv:2502.16161, 2025 | 24 | 2025 |