CHAPTER 6 Flashcards

(30 cards)

1
Q

………….is a classification process.

It involves the automatic assignment of descriptions to tokens.

The descriptions are called ………………

A

Tagging , tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

T/F Tags can represent part-of-speech, semantic
information, and other attributes.

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

…………..is the process of labeling words in a text (corpus).

A

Pos tagging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

T/F THE GOAL OF PoS tagging is to assign the word to a specific part of speech (e.g., noun, verb, adjective, etc.).

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F Words can have multiple POS depending on their
context.

Example:
* VERB: (Book that flight).
* NOUN: (Hand me that book).

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do we use POS tagging in NLP?

A

To understand the grammatical structure
To disambiguate words
To improve the accuracy
To facilitate research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Steps Involved in the POS tagging

A

1- Collect a dataset of annotated text
2- Pre-process the text
3- Divide the Dataset into Training and Testing Sets.
4- Train the POS tagger
5- Test the POS tagger
6- Fine-tune the POS tagger
7- Use the POS tagger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Collect a Dataset of Annotated Text:

A
  • Gather a dataset where each word is labeled with its correct POS tag.
  • This dataset will be used for training and testing the POS
    tagger.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pre-process the Text:

A
  • Tasks include tokenization (splitting the text into individual words), lowercasing, and removing punctuation to standardize the text before training.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Divide the Dataset into Training and Testing Sets:

A
  • Training Set: Used to train the POS tagger.
  • Testing Set: Used to evaluate the tagger’s performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Train the POS Tagger:

A

Build a statistical model (e.g., Hidden Markov Model -
HMM) or define a set of rules for a rule-based or
transformation-based tagger.

  • Train the model or rules on the annotated training set.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Test the POS Tagger

A
  • Use the trained model or rules to predict the POS tags for words in the testing set.
  • Compare predicted tags to true tags to evaluate the tagger’s performance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Fine-tune the POS Tagger:

A
  • If the performance is unsatisfactory, adjust the model or rules, and repeat the training and testing process until achieving the desired accuracy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Use the POS Tagger

A
  • Once trained and tested, the POS tagger can be applied to new, unseen text.
  • Preprocess the text and input it into the model or rules to get predicted POS tags for each word.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Types of POS Tagging

A

1- Rule based POS tagging
2- Stochastic POS tagging
3- Transformation based POS tagging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

……………..is one of the oldest techniques of
tagging .
⚫ It assigns tags based on a set of predefined linguistic rules.

A

Rule-based POS tagging

17
Q

POS Tagging Architecture, Two-stage architecture:

A
  • First Stage:
    A dictionary is used to assign each word a list of potential partsof-speech based on its form and context.
  • Second Stage:
    Hand-written disambiguation rules are applied to narrow down the
    list, selecting a single part-of-speech for each word, resolving
    ambiguities.
18
Q

Advantage and disadvantage of rule-based poc tagging

A

⚫ Advantages:
* Transparent and interpretable.
* Works well for languages with clear grammatical rules.
⚫ Disadvantages:
* Time-consuming rule creation.
* May struggle with ambiguity or unseen words.

19
Q

……………uses probabilistic methods to assign POS tags based on statistical models.

A

Stochastic POS tagging

20
Q

The simplest stochastic tagger applies the following
approaches for POS tagging :

A

– Word Frequency Approach
– Tag Sequence Probabilities

21
Q

……………………In this approach, stochastic taggers use the probability that
a word occurs with a particular tag to disambiguate words.
➢ Assign the most frequent tag seen with a word in the training
set to ambiguous instances of that word

A

Word Frequency Approach

22
Q

The main issue with this approach:(Stochastic POS Tagging)

A

It may generate incorrect tag sequences because it ignores
context

23
Q

…………..In this approach, stochastic taggers calculate the probability of a sequence of tags occurring, rather than focusing on individual word-tag pairs

A

Tag Sequence Probabilities

24
Q

T/F Tag Sequence Probabilities ,The model uses a probability distribution over sequences of n tags (unigram, bigram, trigram, etc.) to determine the best tag
for a word.

25
T/F Transformation based tagging is also called Brill tagging It is a hybrid method combining rule-based and statistical approaches to refine POS tagging.
T
26
allows the model to apply transformation rules that change one tagging state to another, using context and linguistic knowledge to improve accuracy.
TBL (Transformation-based Learning)
27
READ IT Consider the following steps to understand the working of TBL: ⚫ Start with an initial solution: The process begins with an initial tagging solution, often provided by a simple method like default tagging (e.g., assigning the most common tag to each word). ⚫ Choose the most beneficial transformation: In each cycle, TBL selects the most beneficial transformation rule based on the current tags. These transformations are based on patterns observed in the data, such as changing a tag if certain contextual clues are present. ⚫ Apply the transformation to the problem: The transformation chosen in the previous step is applied to the existing tags, adjusting them to be more accurate. The process is repeated in cycles, refining the tagging after each transformation.
DONE
28
T/F If the word "book" is tagged as a verb (VB) and is preceded by a determiner (DT), change the tag to noun (NN)
T
29
Advantages and Disadvantages OF TBL
⚫ Advantages: * Combines strengths of both rule-based and probabilistic methods. * Can improve tagging accuracy over other methods. ⚫ Disadvantages: * Requires initial tagging (which could be incorrect). * The rule learning and application can be computationally expensive, especially with large datasets.
30
Application of POS Tagging
Information Extraction Text Classification Machine Translation Natural Language Generation Named Entity Recognition (NER)