Use the trained model or rules to predict the POS tags for words in the testing set. Compare predicted tags to true tags to evaluate the tagger's performance.

Once trained and tested, the POS tagger can be applied to new, unseen text. Preprocess the text and input it into the model or rules to get predicted POS tags for each word.

CHAPTER 6 Flashcards by Basel Altasan

………….is a classification process.

It involves the automatic assignment of descriptions to tokens.

The descriptions are called ………………

Tagging , tags

How well did you know this?

Not at all

Perfectly

T/F Tags can represent part-of-speech, semantic
information, and other attributes.

How well did you know this?

Not at all

Perfectly

…………..is the process of labeling words in a text (corpus).

Pos tagging

How well did you know this?

Not at all

Perfectly

T/F THE GOAL OF PoS tagging is to assign the word to a specific part of speech (e.g., noun, verb, adjective, etc.).

How well did you know this?

Not at all

Perfectly

T/F Words can have multiple POS depending on their
context.

Example:
* VERB: (Book that flight).
* NOUN: (Hand me that book).

How well did you know this?

Not at all

Perfectly

Why do we use POS tagging in NLP?

To understand the grammatical structure
To disambiguate words
To improve the accuracy
To facilitate research

How well did you know this?

Not at all

Perfectly

Steps Involved in the POS tagging

1- Collect a dataset of annotated text
2- Pre-process the text
3- Divide the Dataset into Training and Testing Sets.
4- Train the POS tagger
5- Test the POS tagger
6- Fine-tune the POS tagger
7- Use the POS tagger

How well did you know this?

Not at all

Perfectly

Collect a Dataset of Annotated Text:

Gather a dataset where each word is labeled with its correct POS tag.
This dataset will be used for training and testing the POS
tagger.

How well did you know this?

Not at all

Perfectly

Pre-process the Text:

Tasks include tokenization (splitting the text into individual words), lowercasing, and removing punctuation to standardize the text before training.

How well did you know this?

Not at all

Perfectly

Divide the Dataset into Training and Testing Sets:

Training Set: Used to train the POS tagger.
Testing Set: Used to evaluate the tagger’s performance

How well did you know this?

Not at all

Perfectly

Train the POS Tagger:

Build a statistical model (e.g., Hidden Markov Model -
HMM) or define a set of rules for a rule-based or
transformation-based tagger.

Train the model or rules on the annotated training set.

How well did you know this?

Not at all

Perfectly

Test the POS Tagger

Use the trained model or rules to predict the POS tags for words in the testing set.
Compare predicted tags to true tags to evaluate the tagger’s performance.

How well did you know this?

Not at all

Perfectly

Fine-tune the POS Tagger:

If the performance is unsatisfactory, adjust the model or rules, and repeat the training and testing process until achieving the desired accuracy.

How well did you know this?

Not at all

Perfectly

Use the POS Tagger

Once trained and tested, the POS tagger can be applied to new, unseen text.
Preprocess the text and input it into the model or rules to get predicted POS tags for each word.

How well did you know this?

Not at all

Perfectly

Types of POS Tagging

1- Rule based POS tagging
2- Stochastic POS tagging
3- Transformation based POS tagging

How well did you know this?

Not at all

Perfectly

……………..is one of the oldest techniques of
tagging .
⚫ It assigns tags based on a set of predefined linguistic rules.

Study These Flashcards

Rule-based POS tagging

POS Tagging Architecture, Two-stage architecture:

Study These Flashcards

First Stage:
A dictionary is used to assign each word a list of potential partsof-speech based on its form and context.
Second Stage:
Hand-written disambiguation rules are applied to narrow down the
list, selecting a single part-of-speech for each word, resolving
ambiguities.

Advantage and disadvantage of rule-based poc tagging

Study These Flashcards

⚫ Advantages:
* Transparent and interpretable.
* Works well for languages with clear grammatical rules.
⚫ Disadvantages:
* Time-consuming rule creation.
* May struggle with ambiguity or unseen words.

……………uses probabilistic methods to assign POS tags based on statistical models.

Study These Flashcards

Stochastic POS tagging

The simplest stochastic tagger applies the following
approaches for POS tagging :

Study These Flashcards

– Word Frequency Approach
– Tag Sequence Probabilities

……………………In this approach, stochastic taggers use the probability that
a word occurs with a particular tag to disambiguate words.
➢ Assign the most frequent tag seen with a word in the training
set to ambiguous instances of that word

Study These Flashcards

Word Frequency Approach

The main issue with this approach:(Stochastic POS Tagging)

Study These Flashcards

It may generate incorrect tag sequences because it ignores
context

…………..In this approach, stochastic taggers calculate the probability of a sequence of tags occurring, rather than focusing on individual word-tag pairs

Study These Flashcards

Tag Sequence Probabilities

T/F Tag Sequence Probabilities ,The model uses a probability distribution over sequences of n tags (unigram, bigram, trigram, etc.) to determine the best tag
for a word.

Study These Flashcards

T/F Transformation based tagging is also called Brill tagging It is a hybrid method combining rule-based and statistical approaches to refine POS tagging.

allows the model to apply transformation rules that change one tagging state to another, using context and linguistic knowledge to improve accuracy.

TBL (Transformation-based Learning)

READ IT Consider the following steps to understand the working of TBL: ⚫ Start with an initial solution: The process begins with an initial tagging solution, often provided by a simple method like default tagging (e.g., assigning the most common tag to each word). ⚫ Choose the most beneficial transformation: In each cycle, TBL selects the most beneficial transformation rule based on the current tags. These transformations are based on patterns observed in the data, such as changing a tag if certain contextual clues are present. ⚫ Apply the transformation to the problem: The transformation chosen in the previous step is applied to the existing tags, adjusting them to be more accurate. The process is repeated in cycles, refining the tagging after each transformation.

DONE

T/F If the word "book" is tagged as a verb (VB) and is preceded by a determiner (DT), change the tag to noun (NN)

Advantages and Disadvantages OF TBL

⚫ Advantages: * Combines strengths of both rule-based and probabilistic methods. * Can improve tagging accuracy over other methods. ⚫ Disadvantages: * Requires initial tagging (which could be incorrect). * The rule learning and application can be computationally expensive, especially with large datasets.

Application of POS Tagging

Information Extraction Text Classification Machine Translation Natural Language Generation Named Entity Recognition (NER)

CHAPTER 6 Flashcards

(30 cards)