316 MT Flashcards
(10 cards)
translation divergences
- differences between languages
- typology-how different languages have different affixes to them
*structural- the structures in which sentences are set..SVO SOV - lexical divergences- difference in ambiguity and grammar
- lexical gap- words that doesnt have a meaning in another language..rivere
ambiguity resolution
words that are spelt the same in one language but has two separate words in another language..he plays the guitar..he plays soccer
direct translation
morphological analysis,lexical transfer,local reordering, morphological generation.
* word to word translations.
syntactic transfer
semantic transfer
analysis,shallow analysis
transfer(lexica,syntactic),transfer phase
generation,synthesis phase
requires both syntactic and direct translation
one makes possible parse trees then finds the most likely parse tree.
a good translation should be:
faithful-correctly convery information and tone
fluent- grammatically well structured and readable
phrase based translation
translation probability-
translation model-tells us what a setnence/phrase in a source language most likely translates to(faithfulness)
distortion probability- tells how likely a given sentence/phrase is(language model)(fluency)
encoder and decoder
encoder is used for language understanding in source languae
decode is used for language generation in target language
seq2seq encoder
the encoder usually uses an LSTm to generate context vectro from the output
*input tokes are read:
-one at time
-in reverse order
* the encoder conosists of stacked LSTMS
* column-timestep
row- single layer
human rater evaluations
- fluency
-asking raters to rate output interms of readability,clarity and naturalness - cloze test:
if theres a space within the output and someone can accurately depict the word that showed be there..then the machine is doing well - fidelity:
-adequacy-measures whether a translation contains the info that existed in the source
-informativeness-text based evaluation if theres sufficient info in the output
bleu
the closer machine translation is to a professional human translation,the better it is
-if results are needed urgently
* two constraints
a gram in the reference translation cannot be matched more than once
brevity penalty-very small sentences that would achieve a 1.0 precision is penalised