A Paper List for Speech Translation

🤗 We are hiring interns and full-time employees researching on speech translation, please contact me at dongqianqian@bytedance.com

This is a paper list for speech translation.

Keyword: Speech Translation, Spoken Language Processing, Natural Language Processing

Tutorials and Surveys
Codebase
Dataset
Paper List
Related Works
Workshop

Tutorials and Surveys

Jan Niehues. Spoken Language Translation, InterSpeech-2019, [video]
Matthias Sperber and Matthias Paulik. Speech Translation and the End-to-End Promise:Taking Stock of Where We Are, ACL-2020 theme track, [paper]
Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, and Jörg Tiedemann. Multimodal Machine Translation through Visuals and Speech, Machine Translation journal-2020 (Springer), [paper]
Jan Niehues, Elizabeth Salesky, Marco Turchi, Matteo Negri. Speech Translation Tutorial, EACL-2021, [link], [slides]

Codebase

ESPnet-ST: All-in-One Speech Translation Toolkit, ACL-2020 Demo, [paper], [code]
FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ, AACL-2020 demo, [paper], [code]
NeurST: Neural Speech Translation Toolkit, Arxiv-2020, [paper], [code]

Dataset

Construction and Utilization of Bilingual Speech Corpus for Simultaneous Machine Interpretation Research, InterSpeech-2005,[paper]
Approach to Corpus-based Interpreting Studies: Developing EPIC (European Parliament Interpreting Corpus), MuTra-2005, [paper]
Automatic Translation from Parallel Speech: Simultaneous Interpretation as MT Training Data, ASRU-2009, [paper]
The KIT Lecture Corpus for Speech Translation, LREC-2012, [paper]
Improved Speech-to-Text Translation with the Fisher and Callhome Spanish–English Speech Translation Corpus, IWSLT-2013, [paper]
Collection of a Simultaneous Translation Corpus for Comparative Analysis, LREC-2014, [paper]
Microsoft Speech Language Translation (MSLT) Corpus: The IWSLT 2016 release for English, French and German, IWSLT-2016, [paper]
The Microsoft Speech Language Translation (MSLT) Corpus for Chinese and Japanese: Conversational Test data for Machine Translation and Speech Recognition, Machine_Translation-2017, [paper]
Amharic-English Speech Translation in Tourism Domain, SCNLP-2017, [paper]
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiment, LREC-2018, [paper]
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation, LREC-2018, [paper]
A Small Griko-Italian Speech Translation Corpus, SLTU-2019, [paper]
MuST-C: a Multilingual Speech Translation Corpus, NAACL-2019, [paper]
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible, Arxiv-2019, [paper]
How2: A Large-scale Dataset for Multimodal Language Understanding, NIPS-2018, [paper]
LibriVoxDeEn: A Corpus for German-to-English Speech Translation and Speech Recognition, LREC-2020, [paper]
Clotho: An Audio Captioning Dataset, Arxiv-2019, [paper]
Europarl-St: A Multilingual Corpus For Speech Translation Of Parliamentary Debates, ICASSP-2020, [paper]
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus, Arxiv-2020, [paper]
MuST-Cinema: a Speech-to-Subtitles corpus, Arxiv-2020, [paper]
CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus, Arxiv-2020, [paper], [code]
The Multilingual TEDx Corpus for Speech Recognition and Translation, Arxiv-2021, [paper]
mintzai-ST: Corpus and Baselines for Basque-Spanish Speech Translation，IberSPEECH-2021，[paper]
BSTC: A Large-Scale Chinese-English Speech Translation Dataset, Arixv-2021, [paper]
MultiSubs: A Large-scale Multimodal and Multilingual Dataset, Arxiv-2021, [paper]
Kosp2e: Korean Speech to English Translation Corpus, InterSpeech-2021, [paper]

Paper List

Pipeline ST

Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation, ASRU-2015,[paper]
Learning a Translation Model from Word Lattices, InterSpeech-2016, [paper]
Learning a Lexicon and Translation Model from Phoneme Lattices, EMNLP-2016, [paper]
Neural Lattice-to-Sequence Models for Uncertain Inputs, EMNLP-2017, [paper]
Using Spoken Word Posterior Features in Neural Machine Translation, IWSLT-2018, [paper]
Towards robust neural machine translation, ACL-2018, [paper]
Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors, InterSpeech-2019, [paper]
Lattice Transformer for Speech Translation, ACL-2019, [paper]
Self-Attentional Models for Lattice Inputs, ACL-2019, [paper]
Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training, IWSLT-2019, [paper]
Neural machine translation with acoustic embedding, ASRU-2019
Machine Translation in Pronunciation Space, Arxiv-2020, [paper]
Diversity by Phonetics and its Application in Neural Machine Translation, AAAI-2020, [paper]
Robust Neural Machine Translation for Clean and Noisy Speech Transcripts, IWSLT-2019, [paper]
ELITR Non-Native Speech Translation at IWSLT 2020, IWSLT-2020, [paper]
Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines, CLSST@LREC 2020, [paper]
Cascaded Models With Cyclic Feedback For Direct Speech Translation, Arxiv-2020, [paper]
Sentence Boundary Augmentation For Neural Machine Translation Robustness, Arxiv-2020, [paper]
A Technical Report: But Speech Translation Systems, Arxiv-2020, [paper]
Direct Segmentation Models for Streaming Speech Translation, EMNLP-2020, [paper]
Lost in Interpreting: Speech Translation from Source or Interpreter?, InterSpeech-2021, [paper]
Is “moby dick” a Whale or a Bird? Named Entities and Terminology in Speech Translation, EMNLP-2021, [paper]

End-to-end ST

Towards Speech Translation of Non Written Languages, IEEE-2006, [paper]
Towards speech-to-text translation without speech recognition, EACL-2017, [paper]
Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation, NIPS-2016, [paper]
An Attentional Model for Speech Translation Without Transcription, NAACL-2016, [paper]
An Unsupervised Probability Model for Speech-to-Translation Alignment of Low-Resource Languages, EMNLP-2016, [paper]
A Case Study on Using Speech-to-translation Alignments for Language Documentation, ComputEL-2017, [paper]
Spoken Term Discovery for Language Documentation Using Translations, SCNLP-2017, [paper]
Sequence-to-sequence Models Can Directly Translate Foreign Speech, InterSpeech-2017, [paper]
Structured-based Curriculum Learning for End-to-end English-Japanese Speech Translation, InterSpeech-2017, [paper]
End-to-End Speech Translation with the Transformer, IberSPEECH-2018, [paper]
Towards Fluent Translations from Disfluent Speech, SLT-2018, [paper]
Low-resource Speech-to-text Translation, InterSpeech-2018, [paper]
End-to-End Automatic Speech Translation of Audiobooks, ICASSP-2018, [paper]
Tied Multitask Learning for Neural Speech Translation, NAACL-2018, [paper]
Towards Unsupervised Speech to Text Translation, ICASSP-2019, [paper]
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation, ICASSP-2019, [paper]
Towards End-to-end Speech-to-text Translation with Two-pass Decoding, ICASSP-2019, [paper]
Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation, TACL-2019, [paper]
End-to-End Speech Translation with Knowledge Distillation, InterSpeech-2019, [paper]
Fluent Translations from Disfluent Speech in End-to-End Speech Translation, NAACL-2019, [paper]
Pre-Training On High-Resource Speech Recognition Improves Low-Resource Speech-To-Text Translation, NAACL-2019, [[paper]
Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation, ACL-2019, [paper]
Leveraging Out-of-Task Data for End-to-End Automatic Speech Translation, Arxiv-2019, [paper]
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation, AAAI-2020, [paper]
Adapting Transformer to End-to-end Spoken Language Translation, InterSpeech-2019, [paper]
Unsupervised phonetic and word level discovery for speech to speech translation for unwritten languages, InterSpeech-2019, [paper]
A comparative study on end-to-end speech to text translation, ASRU-2019, [paper]
Instance-Based Model Adaptation For Direct Speech Translation, ICASSP-2020, [paper]
Analyzing Asr Pretraining For Low-Resource Speech-To-Text Translation, ICASSP-2020, [paper]
ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task, IWSLT-2019, [paper]
Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade, IWSLT-2019, [paper]
Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning, ICASSP-2020, [paper]
Enhancing Transformer for End-to-end Speech-to-Text Translation, EAMT-2019, [paper]
On Using SpecAugment for End-to-End Speech Translation, IWSLT-2019, [paper]
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding, AAAI-2020, [paper]
From Speech-To-Speech Translation To Automatic Dubbing, Arxiv-2020, [paper]
Skinaugment: Auto-Encoding Speaker Conversions For Automaticspeech Translation, ICASSP-2020, [paper]
Curriculum Pre-training for End-to-End Speech Translation, ACL-2020, [paper]
Jointly Trained Transformers models for Spoken Language Translation, Arxiv-2020, [paper]
Relative Positional Encoding for Speech Recognition and Direct Translation, Arxiv-2020, [paper]
Worse WER, but Better BLEU? Leveraging Word Embedding asIntermediate in Multitask End-to-End Speech Translation, ACL-2020, [paper]
Phone Features Improve Speech Translation, ACL-2020, [paper]
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection, Arxiv-2020, [paper]
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020, IWSLT2020, [paper]
Self-Training for End-to-End Speech Translation, INTERSPEECH2020 (submitted), [paper]
CSTNet: Contrastive Speech Translation Network for Self-Supervised Speech Representation Learning, INTERSPEECH2020 (submitted), [paper]
Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?, IWSLT2020, [paper]
End-To-End Speech Translation With Self-Contained Vocabulary Manipulation, ICASSP2020
End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs, TASLP-2020, [paper]
UWSpeech: Speech to Speech Translation for Unwritten Languages, Arxiv-2020, [paper]
Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus, ACL-2020, [paper]
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation, INTERSPEECH2020 (submitted), [paper]
Self-Supervised Representations Improve End-to-End Speech Translation, Arxiv-2020, [paper]
Consistent Transcription and Translation of Speech, TACL-2020, [paper]
Contextualized Translation of Automatically Segmented Speech, INTERSPEECH-2020, [paper]
On Target Segmentation for Direct Speech Translation, AMTA-2020, [paper]
End-to-End Speech Translation with Adversarial Training, WAST-2020, [paper]
SDST: Successive Decoding for Speech-to-text Translation, Arxiv-2020, [paper]
TED: Triple Supervision Decouples End-to-end Speech-to-text Translation, Arxiv-2020, [paper]
Investigating Self-supervised Pre-training for End-to-end Speech Translation, ICML-2020 workshop, [paper], [code]
Adaptive Feature Selection for End-to-End Speech Translation, EMNLP2020 Findings, [paper], [code]
A General Multi-Task Learning Framework To Leverage Text Data For Speech To Text Tasks, Arxiv-2020, [paper]
MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation, Arxiv-2020, [paper]
Evaluating Gender Bias In Speech Translation, ICASSP-2021 (submitted), [paper]
Bridging the Modality Gap for Speech-to-Text Translation, Arxiv-2020, [paper]
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation, COLING-2020, [paper], [code]
Effectively pretraining a speech translation decoder with Machine Translation data, EMNLP-2020, [paper]
Tight Integrated End-to-End Training for Cascaded Speech Translation, SLT-2021, [paper]
Breeding Gender-aware Direct Speech Translation Systems, COLING-2020, [paper]
On Knowledge Distillation for Direct Speech Translation, CLiC-IT-2020, [paper]
Streaming Models for Joint Speech Recognition and Translation, EACL-2021, [paper]
CTC-based Compression for Direct Speech Translation, EACL-2021, [paper]
Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation, ICML-2021, [paper]
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation, NAACL-2021, [paper]
Large-Scale Self- and Semi-Supervised Learning for Speech Translation, Arxiv-2021, [paper]
End-to-end Speech Translation via Cross-modal Progressive Training, InterSpeech2021-2021, [paper]
Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation, Arxiv-2021, [paper]
AlloST: Low-resource Speech Translation without Source Transcription, InterSpeech2021-2021, [paper]
Learning Shared Semantic Space for Speech-to-Text Translation, ACL-2021 Findings, [paper]
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders, ACL-2021, [paper]
How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation, ACL-2021 Findings, [paper]
Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?, ACL-2021, [paper]
Efficient Transformer for Direct Speech Translation, Arxiv-2021, [paper]
Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task, ACL-2021, [paper]
Beyond Sentence-Level End-to-End Speech Translation: Context Helps, ACL-2021, [paper]
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation, ACL-2021, [paper]
Speechformer: Reducing Information Loss in Direct Speech Translation, EMNLP-2021, [paper]
Fast-Md: Fast Multi-Decoder End-To-End Speech Translation With Non-Autoregressive Hidden Intermediates, ASRU-2021, [paper]
Mutual-Learning Improves End-to-End Speech Translation, EMNLP-2021, [paper]

End-to-end Streaming ST

Simuls2s: End-to-end Simultaneous Speech To Speech Translation, ICLR-2019(under review), [paper]
ON-TRAC Consortium for End-to-End and Simultaneous SpeechTranslation Challenge Tasks at IWSLT 2020, IWSLT-2020, [paper]
SimulSpeech: End-to-End Simultaneous Speech to Text Translation, ACL-2020, [paper]
Streaming Simultaneous Speech Translation With Augmented Memory Transformer, ICASSP-2021(submitted), [paper]
SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation, Arxiv-2020, [paper]
Simultaneous Speech-To-Speech Translation System With Neural Incremental Asr, Mt, And Tts, Arxiv-2020, [paper]
An Empirical Study Of End-To-End Simultaneous Speech Translation Decoding Strategies, ICASSP 2021, [paper]
RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer, ACL-2021 Findings, [paper]
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR, ACL-2021 Findings, [paper]
Simultaneous Speech Translation for Live Subtitling: from Delay to Display, Arxiv-2021, [paper]
UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation, Arxiv-2021, [paper]
Decision Attentive Regularization To Improve Simultaneous Speech Translation Systems, ICASSP-2022 submitted, [paper]

End-to-end NA ST

Orthros: Non-Autoregressive End-To-End Speech Translation With Dual-Decoder, Arxiv-2020, [paper]
Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation, ACL-2021 Findings, [paper]
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring, Arxiv-2021, [paper]

End-to-end Multilingual ST

Multilingual End-To-End Speech Translation, ASRU-2019, [paper]
One-To-Many Multilingual End-To-End Speech Translation, ASRU-2019, [paper]
Multilingual Speech Translation with Efficient Finetuning of Pretrained Models, ACL-2021, [paper]
Lightweight Adapter Tuning for Multilingual Speech Translation, [paper]

End-to-end S2ST

Direct speech-to-speech translation with a sequence-to-sequence model, InterSpeech-2019, [paper]
Speech-To-Speech Translation Between Untranscribed Unknown Languages, ASRU-2019, [paper]
Transformer-Based Direct Speech-To-Speech Translation With Transcoder, SLT-2021, [paper]
Direct Speech-To-Speech Translation With Discrete Units, Arxiv-2021, [paper]
Translatotron 2: Robust Direct Speech-To-Speech Translation, Arxiv-2021, [paper]
Direct Simultaneous Speech To Speech Translation, Arxiv-2021, [paper]

End-to-end Zero-shot ST

Zero-shot Speech Translation, Arxiv-2021, [paper]

Multimodal MT

Transformer-based Cascaded Multimodal Speech Translation, Arxiv-2019, [paper]
Towards Multimodal Simultaneous Neural Machine Translation, Arxiv-2020, [paper]
Towards Automatic Face-to-Face Translation, Arxiv-2020, [paper], [code]
Keyframe Segmentation and Positional Encoding for Video-guided Machine Translation Challenge 2020, ALVR-2020, [paper]
DeepFuse: HKU’s Multimodal Machine Translation System for VMT’20, ALVR-2020, [paper]
Team RUC AI·M3 Technical Report at VMT Challenge 2020: Enhancing Neural Machine Translation with Multimodal Rewards, ALVR-2020, [paper]
Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation，EACL-2021,[paper]
Cross-lingual Visual Pre-training for Multimodal Machine Translation, EACL-2021, [paper]
Generative Imagination Elevates Machine Translation, NAACL-2021, [[https://arxiv.org/abs/2009.09654]]
Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding, AAAI-2021, [paper]
Improving Translation Robustness with Visual Cues and Error Correction, Arxiv-2021, [paper]
Gumbel-Attention for Multi-modal Machine Translation, Arxiv-2021, [paper]
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation, ACL-2021, [paper]

Streaming MT

Simultaneous translation of lectures and speeches, Machine Translation-2007, [paper]
Real-time incremental speech-to-speech translation of dialogs, NAACL-2012, [paper]
Incremental segmentation and decoding strategies for simultaneous translation, IJCNLP-2013, [paper]
Don't Until the Final Verb Wait: Reinforcement learning for simultaneous machine translation, EMNLP-2014, [paper]
Segmentation strategies for streaming speech translation, NAACL-2013, [paper]
Optimizing segmentation strategies for simultaneous speech translation, ACL-2014, [paper]
Syntax-based simultaneous translation through prediction of unseen syntactic constituents, ACL-IJCNLP-2015, [paper]
Simultaneous machine translation using deep reinforcement learning, ICML-2016, [paper]
Interpretese vs. translationese: The uniqueness of human strategies in simultaneous interpretation, NAACL-2016, [paper]
Can neural machine translation do simultaneous translation?, Arxiv-2016, [paper]
Learning to translate in real-time with neural machine translation, EACL-2017, [paper]
Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation, NAACL-2018, [paper]
Prediction Improves Simultaneous Neural Machine Translation, EMNLP-2018, [paper]
STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework, ACL-2019, [paper]
Simultaneous Translation with Flexible Policy via Restricted Imitation Learning, ACL-2019, [paper]
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation, ACL-2019, [paper]
Thinking Slow about Latency Evaluation for Simultaneous Machine Translation, Arxiv-2019, [paper]
DuTongChuan: Context-aware Translation Model for Simultaneous Interpreting, Arxiv-2019, [paper]
Monotonic Multihead Attention, ICLR-2020(under review), [paper]
How To Do Simultaneous Translation Better With Consecutive Neural Machine Translation, Arxiv-2019, [paper]
Simultaneous Neural Machine Translation using Connectionist Temporal Classification, Arxiv-2019, [paper]
Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation, ICASSP-2020, [paper]
Learning Coupled Policies for Simultaneous Machine Translation, Arxiv-2020, [paper]
Re-translation versus Streaming for Simultaneous Translation, Arxiv-2020, [paper]
Efficient Wait-k Models for Simultaneous Machine Translation, Arxiv-2020, [paper]
Opportunistic Decoding with Timely Correction for Simultaneous Translation, ACL-2020, [paper]
Neural Simultaneous Speech Translation Using Alignment-Based Chunking, IWSLT2020, [paper]
Dynamic Masking for Improved Stability in Spoken Language Translation, Arxiv-2020, [paper]
Learn to Use Future Informationin Simultaneous Translation, Arxiv-2020, [paper]
Presenting Simultaneous Translation in Limited Space, ITAT WAFNL 2020, [paper]
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training, EMNLP2020 Findings, [paper]
Improving Simultaneous Translation with Pseudo References, Arxiv-2020, [paper]
Future-Guided Incremental Transformer for Simultaneous Translation, AAAI-2021, [paper]
Faster Re-translation Using Non-Autoregressive Model For Simultaneous Neural Machine Translation, Arxiv-2021, [paper]
Learning Coupled Policies for Simultaneous Machine Translation using Imitation Learning, EACL-2021, [paper]
Simultaneous Multi-Pivot Neural Machine Translation, Arxiv-2021, [paper]
Stream-level Latency Evaluation for Simultaneous Machine Translation, Arxiv-2021, [paper]
Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation, Interspeech 2021, [paper]
Full-Sentence Models Perform Better in Simultaneous Translation Using the Information Enhanced Decoding Strategy, Arxiv-2021, [paper]
Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy, EMNLP-2021, [paper]

Related Works

Automated Audio Captioning

Effects Of Word-Frequency Based Pre- Annd Post- Processings For Audio Captioning, DCASE-2020, [paper]

Named Entity Recognition

End-to-end Named Entity Recognition from English Speech, INTERSPEECH2020(submitted), [paper]

Text Normalization

A Hybrid Text Normalization System Using Multi-Head Self-Attention For Mandarin, ICASSP-2020, [paper]
A Unified Sequence-To-Sequence Front-End Model For Mandarin Text-To-Speech Synthesis, ICASSP-2020, [paper]
Naturalization of Text by the Insertion of Pauses and Filler Words, Arxiv-2020, [paper]

Disfluency Detection

Semi-Supervised Disfluency Detection, COLING-2018, [paper]
Adapting Translation Models for Transcript Disfluency Detection, AAAI-2019, [paper]
Giving Attention to the Unexpected:Using Prosody Innovations in Disfluency Detection, Arxiv-2019, [paper]
Multi-Task Self-Supervised Learning for Disfluency Detection, AAAI-2020, [paper]
Improving Disfluency Detection by Self-Training a Self-Attentive Model, Arxiv-2020, [paper]
Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection, EMNLP-2020, [paper], [code]
Auxiliary Sequence Labeling Tasks For Disfluency Detection, Arxiv-2020, [paper]

Punctuation Prediction

Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection, ICASSP-2020，[paper]
Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings, INTERSPEECH-2020 (submitted), [paper]
Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech, INTERSPEECH-2020, [paper]

Workshop

IWSLT 2018, [link], [paper]
IWSLT 2019, [link], [paper]
IWSLT 2020, [link], [paper]
AutoSimTrans 2020, [link], [paper]
Self-supervision in Audio and Speech, ICML 2020, [link]
IWSLT 2021, [link], [paper]
AutoSimTrans 2021, [link]
NAACL同传Workshop：千言 - 机器同传, [link]

Copyright

By volunteers from Institute of Automation，Chinese Academy of Sciences & ByteDance AI Lab.

Welcome to open an issue or make a pull request!

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Paper List for Speech Translation

🤗 We are hiring interns and full-time employees researching on speech translation, please contact me at dongqianqian@bytedance.com

Tutorials and Surveys

Codebase

Dataset

Paper List

Pipeline ST

End-to-end ST

End-to-end Streaming ST

End-to-end NA ST

End-to-end Multilingual ST

End-to-end S2ST

End-to-end Zero-shot ST

Multimodal MT

Streaming MT

Related Works

Automated Audio Captioning

Named Entity Recognition

Text Normalization

Disfluency Detection

Punctuation Prediction

Workshop

Copyright

About

Releases

Packages

dqqcasia/awesome-speech-translation

Folders and files

Latest commit

History

Repository files navigation

A Paper List for Speech Translation

🤗 We are hiring interns and full-time employees researching on speech translation, please contact me at dongqianqian@bytedance.com

Tutorials and Surveys

Codebase

Dataset

Paper List

Pipeline ST

End-to-end ST

End-to-end Streaming ST

End-to-end NA ST

End-to-end Multilingual ST

End-to-end S2ST

End-to-end Zero-shot ST

Multimodal MT

Streaming MT

Related Works

Automated Audio Captioning

Named Entity Recognition

Text Normalization

Disfluency Detection

Punctuation Prediction

Workshop

Copyright

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages