| VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research X Wang, J Wu, J Chen, L Li, YF Wang, WY Wang ICCV 2019, 2019 | 792 | 2019 |
| Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras A Abouelenin, A Ashfaq, A Atkinson, H Awadalla, N Bach, J Bao, ... arXiv preprint arXiv:2503.01743, 2025 | 205 | 2025 |
| Meta multi-task learning for sequence modeling J Chen, X Qiu, P Liu, X Huang AAAI 2018, 2018 | 110 | 2018 |
| Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation R Zheng, J Chen, M Ma, L Huang ICML 2021, 2021 | 86 | 2021 |
| AT: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing H Bai, R Zheng, J Chen, M Ma, X Li, L Huang International Conference on Machine Learning, 1399-1411, 2022 | 61 | 2022 |
| Dropattention: A regularization method for fully-connected self-attention networks L Zehui, P Liu, L Huang, J Chen, X Qiu, X Huang arXiv preprint arXiv:1907.11065, 2019 | 58 | 2019 |
| PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit H Zhang, T Yuan, J Chen, X Li, R Zheng, Y Huang, X Chen, E Gong, ... NAACL-2022 Demo Track (Best Demo Award), 2022 | 40 | 2022 |
| SpecRec: An Alternative Solution for Improving End-to-End Speech-to-Text Translation via Spectrogram Reconstruction J Chen, M Ma, R Zheng, L Huang Proc. Interspeech 2021, 2232-2236, 2021 | 39* | 2021 |
| Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR J Chen, M Ma, R Zheng, L Huang Findings of ACL-21 (short), 2021 | 36 | 2021 |
| Same representation, different attentions: Shareable sentence representation learning from multiple tasks R Zheng, J Chen, X Qiu IJCAI 2018, 2018 | 36 | 2018 |
| Improving simultaneous translation by incorporating pseudo-references with fewer reorderings J Chen, R Zheng, A Kita, M Ma, L Huang EMNLP 2021, 2020 | 26 | 2020 |
| Exploring shared structures and hierarchies for multiple nlp tasks J Chen, K Chen, X Chen, X Qiu, X Huang arXiv preprint arXiv:1808.07658, 2018 | 21 | 2018 |
| Token-level serialized output training for joint streaming asr and st leveraging textual alignments S Papi, P Wang, J Chen, J Xue, J Li, Y Gaur 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-8, 2023 | 11 | 2023 |
| Diarist: Streaming speech translation with speaker diarization M Yang, N Kanda, X Wang, J Chen, P Wang, J Xue, J Li, T Yoshioka ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 9 | 2024 |
| Investigating neural audio codecs for speech language model-based speech generation J Li, D Wang, X Wang, Y Qian, L Zhou, S Liu, M Yousefi, C Li, CH Tsai, ... 2024 IEEE Spoken Language Technology Workshop (SLT), 554-561, 2024 | 8 | 2024 |
| Leveraging timestamp information for serialized joint streaming recognition and translation S Papi, P Wang, J Chen, J Xue, N Kanda, J Li, Y Gaur ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 4 | 2024 |
| Improving stability in simultaneous speech translation: A revision-controllable decoding approach J Chen, J Xue, P Wang, J Pan, J Li 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-7, 2023 | 4 | 2023 |
| ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech X Fan, C Pang, T Yuan, H Bai, R Zheng, P Zhu, S Wang, J Chen, Z Chen, ... arXiv preprint arXiv:2211.03545, 2022 | 2 | 2022 |
| Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation P Wang, N Kanda, J Xue, J Li, X Wang, AS Subramanian, J Chen, ... arXiv preprint arXiv:2502.02683, 2025 | | 2025 |
| Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages M Yousefi, Y Qian, J Chen, G Wang, Y Liu, D Wang, X Wang, J Xue arXiv preprint arXiv:2411.07387, 2024 | | 2024 |