WSTA 22 - MACHINE TRANSLATION

NLP

 2018/05/29   Share

WORD-BASED AND PHRASE-BASED MT

Word based model: overly simplistic formulation — only used for _alignment_, not actual _translation_.

Phrase based MT: treats N-grams as translation units, referred to as ‘phrases’.

Phrase-pairs memorise:
- common translation fragments
- common reordering patterns
FINDING & SCORING PHRASE PAIRS
- Extract” phrase pairs; then produce ‘probabilities’ by count
The phrase-table:
- (massive list with many millions of pairs )
Decoding: segmentation of F into phrases; re-ordering of their translations to produce E

$E^*, A^*=argmax_{E,A}score(E, A, F)$

score function is product of the:
- Translation “probability” $P(F|E)$, split into phrase-pairs
- Languate model probability $P(E)$
- distortion cost $d(start_i, end_{i-1})$, measuring amount of reordering between adjacent phrase-pairs
Search problem: find optimal translation $E^*$

TRANSLATION PROCESS:

segment
translate
Reordering (Dynamic Programming solution)

PHRASE-BASED DECODING

NEURAL MACHINE TRANSLATION

Phrase-based approach is rather complicated

We want to know what is the probability of a sequence $y|x$?

sequence2sequence model

encoder-decoder models
- _Encoder_: represents the source sentence as a vector or matrix of real values.
- _Decoder_: predicts the word sequence in the target.
RNN attention model

Evaluation

Fluency; adequacy

BLEU: measures closeness of translation to one or more references

weighted average of 1,2,3&4-gram precisions and a brevity penalty to hedge against short outputs

$BLEU=bp * prec_{1-gram} * prec_{2-gram} * prec_{3-gram} * prec_{4-gram}$

CATALOG

1. WORD-BASED AND PHRASE-BASED MT
2. NEURAL MACHINE TRANSLATION
3. Evaluation



缺失模块。
1、请确保node版本大于6.2
2、在博客根目录（注意不是archer根目录）执行以下命令：
npm i hexo-generator-json-content --save
3、在根目录_config.yml里添加配置：

jsonContent:
  meta: false
  pages: false
  posts:
    title: true
    date: true
    path: true
    text: false
    raw: false
    content: false
    slug: false
    updated: false
    comments: false
    link: false
    permalink: false
    excerpt: false
    categories: false
    tags: true