CHAPTER 5 Flashcards
(34 cards)
………………..is the core component of modern
Natural Language Processing (NLP)
language model
T/F Language model , It is a probabilistic statistical model that determines the probability of a given sequence of words occurring in a sentence based on the previous words.
T
T/F
It helps to predict which word is more likely to appear next
in the sentence.
⚫ It’s a tool that analyzes the pattern of human language for
the prediction of words.
⚫ Language models analyze bodies of text data to provide a
basis for their word predictions.
⚫ Widely used in NLP applications like chatbots and search
engines.
T
How does Language Model Works?
⚫ Language Models determine the probability of the next word by analyzing the text in data.
⚫ These models interpret the data by feeding it through algorithms.
⚫ The algorithms are responsible for creating rules for the context in natural language .
⚫ The models are prepared for the prediction of words by
learning the features and characteristics of a language.
⚫ With this learning, the model prepares itself for
understanding phrases and predicting the next words in
sentences.
Types of Language Models
- Statistical Language Models
- Neural Language Models
T/F Statistical models predict the next word based on previous words.
T
T/F Statistical language model, Use probabilistic techniques to analyze text patterns.
T
Popular Statistical Models
1- N-Gram Model – Uses a fixed-length sequence of words.
2- Bidirectional Model – Considers both past and future words.
3- Exponential Model – Assigns probabilities based on exponential functions.
4-Continuous Space Model – Represents words in a continuous vector
space.
5- Neural language Model – Uses neural networks.
…………creates a probability distribution for a
sequence of ‘n’ tokens (words).
This is one of the simplest approaches to language modelling
N-Gram model
T/F There are different types of N-Gram models such as, bigrams,trigrams
T
N-GRAM EXMAPLES:
Example Sentence:
“I like learning and practice NLP in this lecture”
– Unigrams:
“I”, “like”, “learning”, “and”, “practice”, “NLP”, “in”, “this”, “lecture”
– Bigram Example:
(“I”, “like”), (“like”, “learning”), (“learning”, “and”), (“and”, “practice”),
(“practice”, “NLP”), (“NLP”, “in”), (“in”, “this”), (“this”, “lecture”)
READ IT AGAIN
…………..is a collection of text data consisting of the proceedings of the European Parliament from 1996 to 2012
The Europarl Corpus
T/F The Markov assumption simplifies language modeling by stating that only the most recent words in a sentence
matter when predicting the next word.
For a bigram model, The prediction of the next word
depends only on the previous word.
For an n-gram model, only the preceding (n-1) words are considered
T
Problem with n-gram
⚫ One problem with with N-Gram models is data sparsity.
⚫ This occurs when the model encounters word sequences (N-Grams) that were not seen during training. As a result,
the model assigns them a zero probability.
⚫ Techniques to solve this problem include smoothing, backoff and interpolation
…………….is a two-word sequence of two words coming together to form a meaning
bigram
“I like”, “like learning”, “learning and”, “ and practice”, “practice
NLP”, “ NLP in”, “in this”, “this lecture”.
…………is a three-word sequence of three words coming together to form a meaning
“I like learning”, “learning and practicing”, “practicing NLP in “ in this lecture”
trigram
T/F Bidirectional model, Unlike n-gram models, which analyze text in one direction
(backwards), bidirectional models analyze text in both
directions, (backwards and forwards)
T
T/F Examining text bidirectionally increases result accuracy
This type is often utilized in machine learning and speech
generation applications
Example: Google uses a bidirectional model to process search
queries
T
………..This type of statistical model evaluates text by using an equation which is a combination of n-grams and feature functions
Here the features and parameters of the desired results
are already specified
This model has fewer statistical assumptions which mean
the chances of having accurate results are more.
Exponential
The process of assigning weight to a word is known as
…………………..
word embedding
……………In this type of statistical model, words are arranged as a
non-linear combination of weights in a neural network
This type of model proves helpful in scenarios where the data set of words continues to become large and include unique words.
Continuous Space
T/F Neural Language Models
These language models are based on neural networks and are often considered as an advanced approach to execute NLP tasks
Neural language models overcome the shortcomings of classical models such as n-gram and are used for complex tasks such as speech recognition or machine translation
T
…………… is the process of
analyzing the structure of a sentence based on grammatical rules.
parsing