CHAPTER 6 Flashcards
(30 cards)
………….is a classification process.
It involves the automatic assignment of descriptions to tokens.
The descriptions are called ………………
Tagging , tags
T/F Tags can represent part-of-speech, semantic
information, and other attributes.
T
…………..is the process of labeling words in a text (corpus).
Pos tagging
T/F THE GOAL OF PoS tagging is to assign the word to a specific part of speech (e.g., noun, verb, adjective, etc.).
T
T/F Words can have multiple POS depending on their
context.
Example:
* VERB: (Book that flight).
* NOUN: (Hand me that book).
T
Why do we use POS tagging in NLP?
To understand the grammatical structure
To disambiguate words
To improve the accuracy
To facilitate research
Steps Involved in the POS tagging
1- Collect a dataset of annotated text
2- Pre-process the text
3- Divide the Dataset into Training and Testing Sets.
4- Train the POS tagger
5- Test the POS tagger
6- Fine-tune the POS tagger
7- Use the POS tagger
Collect a Dataset of Annotated Text:
- Gather a dataset where each word is labeled with its correct POS tag.
- This dataset will be used for training and testing the POS
tagger.
Pre-process the Text:
- Tasks include tokenization (splitting the text into individual words), lowercasing, and removing punctuation to standardize the text before training.
Divide the Dataset into Training and Testing Sets:
- Training Set: Used to train the POS tagger.
- Testing Set: Used to evaluate the tagger’s performance
Train the POS Tagger:
Build a statistical model (e.g., Hidden Markov Model -
HMM) or define a set of rules for a rule-based or
transformation-based tagger.
- Train the model or rules on the annotated training set.
Test the POS Tagger
- Use the trained model or rules to predict the POS tags for words in the testing set.
- Compare predicted tags to true tags to evaluate the tagger’s performance.
Fine-tune the POS Tagger:
- If the performance is unsatisfactory, adjust the model or rules, and repeat the training and testing process until achieving the desired accuracy.
Use the POS Tagger
- Once trained and tested, the POS tagger can be applied to new, unseen text.
- Preprocess the text and input it into the model or rules to get predicted POS tags for each word.
Types of POS Tagging
1- Rule based POS tagging
2- Stochastic POS tagging
3- Transformation based POS tagging
……………..is one of the oldest techniques of
tagging .
⚫ It assigns tags based on a set of predefined linguistic rules.
Rule-based POS tagging
POS Tagging Architecture, Two-stage architecture:
- First Stage:
A dictionary is used to assign each word a list of potential partsof-speech based on its form and context. - Second Stage:
Hand-written disambiguation rules are applied to narrow down the
list, selecting a single part-of-speech for each word, resolving
ambiguities.
Advantage and disadvantage of rule-based poc tagging
⚫ Advantages:
* Transparent and interpretable.
* Works well for languages with clear grammatical rules.
⚫ Disadvantages:
* Time-consuming rule creation.
* May struggle with ambiguity or unseen words.
……………uses probabilistic methods to assign POS tags based on statistical models.
Stochastic POS tagging
The simplest stochastic tagger applies the following
approaches for POS tagging :
– Word Frequency Approach
– Tag Sequence Probabilities
……………………In this approach, stochastic taggers use the probability that
a word occurs with a particular tag to disambiguate words.
➢ Assign the most frequent tag seen with a word in the training
set to ambiguous instances of that word
Word Frequency Approach
The main issue with this approach:(Stochastic POS Tagging)
It may generate incorrect tag sequences because it ignores
context
…………..In this approach, stochastic taggers calculate the probability of a sequence of tags occurring, rather than focusing on individual word-tag pairs
Tag Sequence Probabilities
T/F Tag Sequence Probabilities ,The model uses a probability distribution over sequences of n tags (unigram, bigram, trigram, etc.) to determine the best tag
for a word.
T