Simultaneous Translation

Simultaneous translation, which performs translation concurrently with the source speech, is widely useful in many scenarios such as international conferences, negotiations, press releases, legal proceedings, and medicine. This problem has long been considered one of the hardest problems in AI and one of its holy grails. Recently, with rapid improvements in machine translation, speech recognition, and speech synthesis, there has been exciting progress towards simultaneous translation. This tutorial will focus on the design and evaluation of policies for simultaneous translation, to leave attendees with a deep technical understanding of the history, the recent advances, and the remaining challenges in this field.


Brief Description
Simultaneous translation, which performs translation concurrently with the source speech, is widely useful in many scenarios such as international conferences, negotiations, press releases, legal proceedings, and medicine. This problem has long been considered one of the hardest problems in AI and one of its holy grails. Recently, with rapid improvements in machine translation, speech recognition, and speech synthesis, there has been exciting progress towards simultaneous translation. This tutorial will focus on the design and evaluation of policies for simultaneous translation, to leave attendees with a deep technical understanding of the history, the recent advances, and remaining challenges in this field.

Type of the Tutorial
This is a cutting-edge proposal, and the first tutorial on this topic (simultaneous translation) in the history of ACL, EMNLP, NAACL, EACL, COL-ING, and AACL. -Practical Issues (segmentation, punctuation, error tolerance) speech-to-text and speech-to-speech systems computer aided interpretation (CAI)

Breadth
We envision a tutorial that emphasizes interdisciplinary breadth at the beginning and end (roughly one half of the tutorial in total). The beginning section on Human Interpretation will allow us to discuss the strategies and behaviours that enable humans to perform this challenging task, touching on observations from Translation Studies. Meanwhile, the end sections on Practical Issues and Moving Toward Speech to Speech Translation will allow us to discuss issues in incremental Speech Recognition and Text-to-Speech that are otherwise under-represented at a typical *ACL conference.
At most 33% are work by the presenters, and at least 77% are work by other researchers.

Diversity
Simultaneous translation techniques can greatly improve the efficiency of human communication across linguistic barriers. With this technology, you will be able to understand any foreign language by pulling out your smart phone to listen to the machine-generated simultaneous translation in your own language, with only less than 3 seconds delay. If you travel to a remote country, you will also be able to "talk" to the locals with this technology using your smart phone and headsets.
Both Mingbo Ma and Naveen Arivazhagan are junior instructors. Colin Cherry works at Google in Montreal, Liang Huang works Oregon State University in Corvallis, and Zhongjun He works at Baidu in Beijing.

Prerequisites
• Machine Learning: understand the basics of the sequence-to-sequence framework.
• Linguistics: understand basic syntactic structures and appreciate the vast amount of diversity of syntactic structures (esp. word order) among human languages

Small Reading List
Only the last two (33%) were co-authored by the presenters.
• When source and target language have drastically word orders difference, e.g., from verb-final languages (German) to verbmedial languages (English), the final inal verb is predicted in advance on source side to avoid long latency.
A sentence rewriting method is proposed to generates more monotonic translations to improve the speed-accuracy tradeof. Several grammaticality and meaning-preserving syntactic transformation rules are applied to paraphrase reference translations to make their word order closer to the source language word order.
Several waiting criteria are manually designed to serve as translation polices to decide wait or read. The authors proposed a NMT framework for simultaneous translation with a agent which learn to make decisions on when to translate or wait by interacting with a pretrained NMT environment. Prefix-to-prefix framework is proposed for simultaneous translation which implicitly learns to anticipate in a single translation model. Within this framework, "wait-k" policy is trained to generate the target sentence simultaneously with the source sentence with k word delay.

Open Access
All materials (slides, videos, etc.) will be openly available online.