SLP Flashcards

Question

one-hot vector

Answer 1

a vector with one value=1 and the rest of its values =0

Answer 2

It takes a vector z of K arbitrary values and maps them to a probability distribution in range (0,1).

Answer 3

the distance between the system output and the gold output

Answer 4

the cross-entropy loss

Answer 5

gradient descent

Answer 6

a loss function that prefers the correct class labels of the trianing examples to be more likely.

Answer 7

It chooses the parameters w,b that maximize the log probability of the true y labels in the training data given the observations x.

Answer 8

observations where there are only two discrete possible outcomes

Answer 9

to find the optimal weights to minimize the loss function we have defined for the model

Answer 10

has at most one minimum

Answer 11

because it chooses a single random example at a time

Answer 12

a quadratic function of the weight values

Answer 13

a linear function of the weight values

Answer 14

weight vectors with many small weights

Answer 15

sparse solutions with some larger weights but many more weights set to zero

Answer 16

sparser weight vectors and fewer features

Answer 17

sigma(z) * (1-sigma(z))

Answer 18

du/dv * (dv/dx)

Answer 19

In terms of directed binary grammatical relations between the words.

Answer 20

A dependency structure where the labels of relations among words are drawn from a fixed inventory of grammatical relations.

Answer 21

The head-dependent relations are a good proy for the semantic relationship between predictes and their arguments.

Answer 22

a head and a dependent

Answer 23

the central organizing word

Answer 24

A kind of modifier

Answer 25

subject, direct object, indirect object...

Answer 26

universal dependencies

Answer 27

An open community effort to annotate dependencies and other aspects of grammar across more than 100 languages.

Answer 28

clausal relations describe syntactic roles with respect to a predicae and modifier relations categorize the ways that words can modify their heads.

Answer 29

As a directed graph G = (V,A) with vertices V and ordered pairs of vertices A, called arcs.

Answer 30

the head-dependent and grammatical function relationships between the elements in the set of vertices V.

Answer 31

1. has a single designated root node without incoming arcs 2. each vertex has exactly one incoming arc (except for the root node) 3. there is a unique path from the root node to each vertex in V.

Answer 32

Each word has a single head, the dependency structure is connected and there is a single root node from which there is a unique path to each of the words in the sentence.

Answer 33

If there is a path from the head to every word that lies between the head and the dependent in the sentence.

Answer 34

If all the arcs that make it up are projective.

Answer 35

It is projective if there are no crossing edges.

Answer 36

There is a stack on which we build the parse, a buffer of tokens to be parsed and a parser taking actions on the parse via a predictor called an oracle.

Answer 37

It walks through the sentence left-to-right and shifts items from the buffer onto the stack.

Answer 38

At each time point the top two elements on the stack are examined and the oracle makes a decision about what transition to apply to build the parse.

Answer 39

1. assign the current word as head of a previously seen word. 2. assign a previously seen word as the head of the current word. 3. postpone dealing with the current word and store it for later processing

Answer 40

It asserts a head-dependent relation between the word at the top of the stack and the second word, and removes the second word from the stack.

Answer 41

It asserts a head-dependent relation between the second word on the stack and the word at the top, it then removes the top word from the stack.

Answer 42

It removes the word from the front of the input buffer and pushes it onto the stack.

Answer 43

LEFTARC and RIGHTARC, because reducing means combining elements on the stack.

Answer 44

When ROOT is the second element of the stack. (because the root node cannot have incoming arcs)

Answer 45

That there are two elements on the stack.

Answer 46

Where the transition operators only assert relations between elements at the top of the stack, and once an item is assigned, its head is removed from the stack and unavailable for further processing.

Answer 47

The current state of the parse: the stack, an input buffer of words/tokens, and a set of relations representing a dependency tree.

Answer 48

A greedy algorithm running in linear time (length of the sentence)

Answer 49

By supervised machine learning methods, using configurations annotated with the correct transition to make (drawn from dependency trees).

Answer 50

To ensure that a word is not popped from the stack and lost for further processing before all its dependents have been assigned to it. (restriction = use only if all of the dependents of the word at the top of the stack have already been assigned)

Answer 51

1. A current configuration with a stack S and a set of dependency relations R_c. 2. A reference parse consisting of a set of vertices V and a set of dependency relations R_p.

Answer 52

1. classic feature-based algorithm 2. neural classifier using embedding features.

Answer 53

(s₁*w, op), (s₂*w, op) (s₁*t, op), (s₂*t, op) (b₁*w, op), (b₁*t, op)(s₁*wt, op) with s=stack, b=word buffer, w=word forms, t=part-of-speech and op=operator.

Answer 54

cross entropy loss

Answer 55

Pass the sentence through an encoder, take the representation of the top 2 words on the stack & the first word on the buffer concatenete these and present to feed-forward network that predicts transition.

Answer 56

A breadth-first search strategy with a heuristic filter that makes sure that the search frontier stays within a fixed-size beam width.

Answer 57

They score entire trees rather than relying on greedy local decisions. They can also produce non-projective trees.

Answer 58

the overall score for a tree is the sum of the scores of each of the scores of the edges that comprise the tree.

Answer 59

1. assigning a score to each edge 2. finding the best parse tree given the scores of all potential edges

Answer 60

Create a graph G which is a fully-connected, weighted, directed graph where vertices are input words and the directed edges represent all possible head-dependent assignments.

Answer 61

the score for each possible head-dependent relation assigned by some scoring algorithm.

Answer 62

finding the maximum spanning tree over G

Answer 63

No the relative weights of the edges entering each vertex matter.

Answer 64

exact match metrix = how many sentences are parsed correctly

Answer 65

the proper assignment of a word to its head along with the correct depdendency relation = LAS

Answer 66

the correctness of the assigned head, ignoring the dependency relation = UAS

Answer 67

label accuracy score

Answer 68

the percentage of tokens with correct labels, ignoring where the relations are coming from.

Answer 69

Aspects of meaning can be learned solely from the texts we encounter over our lives, based on the complx association of words with the words they co-occur with.

Answer 70

Learning knowledge about language and the world from vast amounts of text

Answer 71

the pretrained language models resulting from pretraining

Answer 72

Text generation, code-generation and image-generation

Answer 73

The task of generating text conditioned on an input piece of text

Answer 74

Generating the most likely word given the context

Answer 75

By computing the probability for each possible output (words in vocab) and then choosing the highest probability word (argmax).

Answer 76

the task of choosing a word to generate based on the model's probabilities

Answer 77

Repeatedly choosing the next word conditioned on the previous choices

Answer 78

autoregressive generation

Answer 79

choose random words according to their probability assigned by the model, so iteratively choose a word to generate according to its probability in context

SLP Flashcards

(107 cards)