Lecture 4 Flashcards

Grammar and Parsing

1
Q

Syntactic Level Analysis

A

To analyze how words are put together
to make valid sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Grammar:

A

Grammar: the kind of implicit
knowledge of your native language that
you had mastered by the time you were
3 or 4 years old without explicit
instruction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Chomsky:

A

syntactic structure can be independent on the meaning of the sentence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Grammars (and parsing) are key
components in many applications:

A
  • Grammar checkers
  • Dialogue management
  • Question answering
  • Information extraction
  • Machine translation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Two types of Grammars

A

*Context Free Grammar (CFG), also known
as Phrase Structure Grammar
* Dependency Grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Context Free Grammar (CFG)

A

a set of recursive rewriting rules (or productions) used to generate patterns of strings.
CFGs describe the structure of language by capturing
constituency and ordering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Constituency

A

How we group words into units and what we say about how the various kinds of units behave

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ordering

A

Rules that govern the ordering of words and bigger units in the language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Notations of CFG
Non-terminal

A

symbols represent the phrases, the categories of phrases, or the constituents,
e.g., NP, VP, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Notations of CFG
Terminals

A

symbols are the words,
e.g., car. They
often come from words in a lexicon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Notations of CFG
Rewrite rules / productions

A

rules for replacing nonterminal symbols (on the left
side) with other nonterminal or terminal symbols (on the right side)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Notations of CFG
Start symbol:

A

a special nonterminal symbol that appears in the initial string generated by the grammar: S  [NP VP] | VP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

derivation

A

a sequence of rules applied to a string that accounts for that string
* Covers all the elements in the string
* Covers only the elements in the string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Parsing

A

is the process of finding a derivation (i. e. sequence of productions) leading from the START symbol to a
TERMINAL symbol (or TERMINALS to START symbol)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Challenges for CFG
Agreement

A

In English, subjects and verbs have to agree in person and number; Determiners and nouns have to agree in
number. S - NP VP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Challenges for CFG
Subcategorization

A

expresses the constraints that a particular verb (sometimes called the predicate) places on the number and
syntactic types of arguments it wants to take (occur with)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Dependency Grammar

A
  • Dependency grammars offer a different
    way to represent syntactic structure
  • CFGs represent constituents in a parse
    tree that can derive the words of a
    sentence
  • Dependency grammars represent
    syntactic dependency relations between
    words that show the syntactic structure
  • Syntactic structure is the set of relations
    between a word (aka the head word) and
    its dependents.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Parsing

A

the process of finding a derivation (i. e. sequence of productions) leading from the START symbol to a
TERMINAL symbol (or TERMINALS to START symbol)

19
Q

Top down parser

A
  • starting from the rules
  • Only searches for trees that can be answers
  • But also suggests trees that are not consistent
    with any of the words
20
Q

Bottom up parser

A
  • starting from the input token list
  • Only forms trees consistent with the words
  • But suggest trees that make no sense globally
21
Q

Solutions to parsing problems (1)

A
  1. Solve the problem of
    performance with chart parsers that use a
    special data structure (i.e., chart) to get rid of the backtracking
22
Q

Solutions to parsing problems (2)

A
  1. Solve the problems of predefining CFG or other grammars by using Treebanks and statistical parsing. The main use of the Treebank is to provide the probabilities to inform the statistical parsers
23
Q

Solutions to parsing problems (3)

A

Partially solve the problems of correctly choosing the best parse trees
by using lexicalization (information about words from the Treebank)

24
Q

Probabilistic CFG (PCPG)

A

The parsing task is to generate the parse tree with the highest probability (or the top n parse trees)

25
Q

Attach probabilities to grammar rules

A

The expansions
for a given non-terminal sum to 1
VP -> Verb .55
VP -> Verb NP .40
VP -> Verb NP PP .05

26
Q

The probability of a parse tree:

A

the product of the
probabilities of the rules used in the derivation

27
Q

Word Sense

A

the Meaning of a Word -
We say that a word has more than
one word sense (meaning) if there
is more than one definition.

28
Q

Word senses may be

A

Coarse-grained, if not many distinctions are
made
Fine-grained, if there are many distinctions
of meanings

29
Q

Polysemy:

A

a word with two or more
related meanings

30
Q

Homonymy:

A

Words spelled (or
pronounced) the same way but
with different meanings

31
Q

Hypernymy:

A

A more general term
that encompasses a word

32
Q

Hyponymy:

A

A more specific term
that is contained within a word

33
Q

How Humans Disambiguate

A
  • local context (e.g., book in a sentence
    that has flight, travel, etc.)
  • the sentence or other surrounding text
    containing the ambiguous word
    restricts the interpretation of the
    ambiguous word
  • domain knowledge (e.g., plant in a
    biology article)
  • the fact that a text is concerned with a
    particular domain activates only the
    sense appropriate to that domain
  • frequency data
  • the frequency of each sense in general
    usage
34
Q

How Machines Disambiguate

A

Algorithm for simplified Lesk:
1. Retrieve from machine readable dictionary
all sense definitions of the word to be
disambiguated
2. Determine the overlap between each
sense definition and the current context
3. Choose the sense that leads to highest
overlap

35
Q

Example: disambiguate

A

PINE
“Pine cones hanging in a tree”
* PINE
1. kinds of evergreen tree with needle-shaped leaves
2. waste away through sorrow or illness

36
Q

WSD

A

Word Sense Disambiguation

37
Q

Classifier Approach to WSD -1

A

Train a classification algorithm that
can label each (open-class) word with
the correct sense, given the context of
the word

38
Q

Classifier Approach to WSD -2

A

Training set is the hand-labeled corpus
of senses

39
Q

Classifier Approach to WSD -3

A

Result of training is a model that is
used by the classification algorithm to
label words in the test set, and
ultimately, in new text examples

40
Q

Word Similarity Features:

A
  • For each word in the context, compute
    a similarity measure between that
    word and the words in the definitions
    to be disambiguated
  • Similarity measures can be defined
    from a semantic relation lexicon, such
    as WordNet (hypernym, hyponym)
41
Q

Syntactic features (relationship between the word and the other parts of the sentence)

A

Predicate-argument relations: Verb-object,
subject-verb

Heads of Noun and Verb Phrases

42
Q

Collocational features:

A

Information about words in specific positions (i.e., previous word)

43
Q

Associated words features -1

A

For each word to be disambiguated, collect a small number of frequently-used
context words.

44
Q

Associated words features -2

A

Represent these words as a set of words feature