NLP _ 01 Flashcards
(92 cards)
What is the most likely the first step of NLP?
cutting board
Text preprocessing
What is the most likely the first step of NLP?
cutting board
Text preprocessing
what is Noise removal?
front of fridge
stripping text of formatting.(e.g. HTML tags.)
What is Tokenization?
under sink door
breaking text into individual words
What is normalization?
Cleaning text data in any other way than Noise removal and tokenization
What is stemming?
it is a blunt axt to shop off word prefexes ans suffixes.
What is lemmatization?
coat closet
It is a scalpel to bring words down to their root forms
What would I import to use regex?
import re
What python package could I use for NLP?
import nltk
what method of nltk would I use to tokenize text?
from nltk.tokenize import word_tokenize
Give an example of a list comprehension :
lemmatized = [lemmatizer.lemmatize(token) for token in tokenized]
How would you import WordNetLemmatizer?
from nltk.stem import WordNetLemmatizer
How would you import PorterStemmer?
from nltk.stem import PorterStemmer?
By default lemmatize() treat every word as a…?
Noun
Language models are probabilistic machine models of …?
language used for NLP comprehension tasks
Language models learn a …?
probability of word occurrence over a sequence of words and use it to estimate the relative likelihood of different phrases.
Common language models include:
Statistical models:
- bag of words (unigram model)
- n-gram models
Neural Language Modeling(NLM)
What is Text simlarity in NLP?
Text similarity is a facet of NLP concerned with the similarity between texts.
What are two popular text similarity metrics?
- Levenshtein distance
- cosine similarity
How would you describe the metric : Levenshtein distance
it is defined as the minimum number of edit operations( deletions,insertions, or substitutions) required to transform a text into another
Define the metric : Cosine similarity
It is defined as teh cosine of the angle between two vectors. To determine the cosine similarity, text documents need to be converted into vectors.
**What are common forms of language prediction?
- **Auto-suggest **and suggested replies
Natural Language processing is concerned with …?
enabling computers to interpret, analyze, and **approximate **the generation of human speech.
What is Parsing w.r.t NLP?
it is the process concerned with segmenting text based on syntax
` from a string?
This is a paragraph
" result = re.sub(r'<.?p>', '', text) print(result) ` This is a paragraph