WORD-BASED AND PHRASE-BASED MT
Word based model: overly simplistic formulation — only used for _alignment_, not actual _translation_.
Phrase based MT: treats N-grams as translation units, referred to as ‘phrases’.
Phrase-pairs memorise:
common translation fragments
common reordering patterns
FINDING & SCORING PHRASE PAIRS
- Extract” phrase pairs; then produce ‘probabilities’ by count
- The phrase-table:
- (massive list with many millions of pairs )
- Decoding: segmentation of F into phrases; re-ordering of their translations to produce E
score function is product of the:
- Translation “probability” $P(F|E)$, split into phrase-pairs
- Languate model probability $P(E)$
- distortion cost $d(start_i, end_{i-1})$, measuring amount of reordering between adjacent phrase-pairs
- Search problem: find optimal translation $E^*$
TRANSLATION PROCESS:
- segment
- translate
- Reordering (Dynamic Programming solution)
- PHRASE-BASED DECODING
NEURAL MACHINE TRANSLATION
Phrase-based approach is rather complicated
We want to know what is the probability of a sequence $y|x$?
sequence2sequence model
encoder-decoder models
- _Encoder_: represents the source sentence as a vector or matrix of real values.
- _Decoder_: predicts the word sequence in the target.
- RNN attention model
Evaluation
Fluency; adequacy
BLEU: measures closeness of translation to one or more references
weighted average of 1,2,3&4-gram precisions and a brevity penalty to hedge against short outputs