Natural Language Processing Flashcards
Why does NLP fall under Speech Recognition?
Application 2:
-Speech recognizer normally takes your input signal as audio and converts that to Text and recognizes what is spoken.
-Speech recognizer has a language moduling involved that tells that probably the person has spoken this particular word in place of that word even though they sound the same. But as I see the sequence as it has spoken one of the word is more probable than the other one.
Application 3: Image Captioning- You give an image. You want the computer to type caption of the image.
How does this fall under NLP?
Image is 2 dimensional data. CNN is involved to encode the image. Once the image is encoded then we need NLP to generate the caption to generate text for it. NLP models are there to do that.
If we have a product that can converse one text to another language they all fall under NLP and goes to very deep Neural Networks.
Machine Translation: Want to convert from english to french.
How is this done?
Neural Machine Translation- Most fundamental product and most researched area in NLP.
______ a language to specify the rules for the set of possible strings that you want to search in a corpus (large body) of text.
a. regular expression
b. natural expression
c. language
d. none of the above
a. regular expression
regular expression - a language to specify the rules for the set of possible strings that you want to search in a corpus (large body) of text.
Ex: “Your password must have at least 8 characters, at least 1 upper case letter, at least 1 lower case letter, at least 1 digit (number), and at least 1 symbol from the special symbols.”
This shows a specify of rules for a set of strings in a large body of text.
In regular expression, why do not use python program for searching?
a. it is slower
b. Regular expression uses a C wrap , not needed
c. Not wise, regular expression package already included.
d. only b and c
d. only b and c
Do not use Python Program for searching Regular Expression because:
b. Regular expression uses a C wrap , not needed
c. Not wise, regular expression package already included.
Why do Regular Expression not use String Functions?
a. not customized
b. regular expressions are much more abstract and faster
c. regular expressions are not a language specific thing, can do almost anything
d. only b and c
d. only b and c
Do not use String Functions in Regular Expressions because:
b. regular expressions are much more abstract and faster
c. regular expressions are not a language specific thing, can do almost anything
______ is a simple program that uses pattern matching to recognize phrases like “I need X” and translate them into subtle outputs like “What would it mean to you if you got X?”
a. Regular Expression
b. ELIZA chatbot
c. ChatGpt
d. None of the Above
b. ELIZA chatbot
ELIZA chatbot - is a simple program that uses pattern matching to recognize phrases like “I need X” and translate them into subtle outputs like “What would it mean to you if you got X?”
Example:
User: I am unhappy
ELIZA: Do you think coming here will help you not be unhappy?
User: I NEED SOME HELP, (I need x), that seems certain
ELIZA: What would it mean to you if you got some help (These are strings / outputs)
User: Perhaps I could learn to get along with my mother?
ELIZA: Tell me more about your family.
what does import re mean?
a. import regular expression
b. import regular eliza
c. import return expression
d. import return eliza
import re mean
a. import regular expression
what does the following code:
“re.findall(‘abc’, ‘askdfj;askabcdfahgfa;ghabc;hgabchkg;a’) mean?
a. regular expression, find all
b. find all, look for string abc
c. regular expression, find all, look for string abc in string asdlkfjalhgld.
the following code:
“re.findall(‘abc’, ‘askdfj;askabcdfahgfa;ghabc;hgabchkg;a’) mean
c. regular expression, find all, look for string abc in string asdlkfjalhgld.
re = regular expression
findall = find all
‘abc’ = what we are trying to look for
‘adklagfahga’ = string we are looking in
Which are meta characters with Special Meaning?
a. . ^ $ * + ?
b. . ^ $ * + ? { } [ ]
c. . ^ $ * + ? { } [ ] \ |
d. . ^ $ * + ? { } [ ] \ | ( )
Are meta characters with Special Meaning
d. . ^ $ * + ? { } [ ] \ | ( )
_____ used for specifying a class, which is a set of characters that you wish to match. Characters can be listed individually like [abcdef] or in a range ‘ like [a-f]?
a. [ ]
b. { }
c. \
d. ( )
a. [ ]
[ ] This metacharacter is used for specifying a class, which is a set of characters that you wish to match. Characters can be listed individually like [abcdef] or in a range ‘ like [a-f]
EXAMPLE:
re.findall (‘[abcd]’, ‘kasdf’
output: ‘a’, ‘d’,
K in kasdf is not searched because it is not in [‘abcd’ ]
a is searched and in output because is class findall. [‘abcd’ ]
s is not searched because it not class findall. [‘abcd’ ]
Using [ ], count the number of digits below:
‘2319ab4621acdz+*!’
i. #define your string
ii. #define a corrector class, [0-9]
iii. # find all occurrences out of this class using re.findall
iiii. # Use length will tell me how many digits that are there
a. 4,3,2,1
b. 1,2,3,4
c. 2,3,4,1
d. 3,2,4,1
k = ‘2319ab4621acdz+*!’
L=re.findall(‘[0-9]’,s)
print(len(L))
b. 1,2,3,4
1) #define your string
2) #define a corrector class, [0-9]
3) # find all occurrences out of this class using re.findall
4) # Use length will tell me how many digits that are there
Verify that 4 characters were printed consecutively.
if len(re.findall(‘[0-9][0-9][0-9][0-9]
‘asdfja;_+);1kj2306kjl891’))>0:
print(“Found”)
else:
print(“Not Found”)
1) if len(re.findall(‘[0-9][0-9][0-9][0-9]
*Finds all 4 characters consecutively why we have 4 different [0-9]
2) ‘asdfja;_+);1kj2306kjl891’))>0:
*This is our string
3) print(“Found”)
else:
print(“Not Found”)
*This says “Based on result, print Found or Not Found”
_____ is a Meta Character symbol or tool is used in regular expressions to set complements.
a. ^
b. [ ]
c. ( )
d. $
a. ^
^ symbol or tool is used in regular expressions to set complements.
[^023abf] what does the meta character ^ doing here?
a. Set complement saying everything except 023abf
b. includes all 023abf
c. exponent
d. Ordinary Complement
a. saying everything except 023abf
[023abf^] what does the meta character ^ doing here?
a. Set complement saying everything except 023abf
b. includes all 023abf
c. exponent
d. Ordinary Complement
d. Ordinary Complement
_______ is known as the “zero or more” quantifier. It indicates that the preceding character or expression can occur zero or more times in the input text. Here’s what it signifies:
Zero Occurrences: The character or expression preceding “*” may not occur at all in the text, and the pattern will still match.
Multiple Occurrences: If the character or expression occurs, it can occur any number of times (including zero).
a. $
b. *
c. ^
d. @
b. *
- is known as the “zero or more” quantifier. It indicates that the preceding character or expression can occur zero or more times in the input text. Here’s what it signifies:
Zero Occurrences: The character or expression preceding “*” may not occur at all in the text, and the pattern will still match.
Multiple Occurrences: If the character or expression occurs, it can occur any number of times (including zero).
What example of Meta Character is the following:
the regular expression “cats” matches both “cat” and “cats” in the input text.
a. $
b. *
c. ^
d. @
b. *
In this example, the regular expression “cats” matches both “cat” and “cats” in the input text. The “” allows for zero or more occurrences of the character “s”.
_____ is defined as the one that does not start with a digit and does not contain any special characters other than under score and it can have arbitrary number or characters.
a. proper variable name
b. regular expression
c. *
d. special sequence
a. proper variable name
A Proper Variable Name - is defined as the one that does not start with a digit and does not contain any special characters other than under score and it can have arbitrary number or characters.
____ applies repetitive pattern as long as it can go. the default behavior in most regex engines.
a. special sequence
b. proper variable name
c. greedy matching
d. *
c. greedy matching
Greedy Matching
1. applies repetitive pattern as long as it can go.
2. the default behavior in most regex engines.
These are the steps to do Greedy Matching
Example:
I have a cat named Saturn, and another cat named Saturnalia.
- Define Your Pattern: Start by defining the pattern you want to match in your regular expression. This pattern may include characters, groups, and quantifiers that specify how many times a character or group should be matched.
Our pattern will be “cat..”, which means we’re looking for the word “cat” followed by any characters (.) and ending with a period (.).
- Apply Quantifiers: Use quantifiers like “*”, “+”, “{n,}”, etc., to specify how many times a character or group should be matched. These quantifiers determine the greediness of the matching.
The .* part of the pattern is a greedy quantifier. It means that it will try to match as many characters as possible before satisfying the next part of the pattern (the period).
- Apply the Regular Expression: Apply your regular expression pattern to the input text you want to search through.
- Find Matches: Use a function like findall() (in Python’s re module) to find all matches of the pattern in the input text.
Use a function like findall() to find all matches of the pattern in the input text.
- Greedily Match: The regex engine will attempt to match as much of the input text as possible while still satisfying the overall pattern. This means it will try to match as many repetitions of the quantified elements as it can.
- Keep Matching Until Satisfied: The regex engine will keep trying to match more characters until it cannot match anymore without violating the pattern.
The regex engine will start by finding the first occurrence of “cat” and then try to match as many characters as possible until it finds a period.
- Backtrack if Necessary: If greediness causes the pattern to fail, the regex engine will backtrack and try different possibilities until it finds a match. This may involve matching fewer repetitions of a quantified element or taking a different path through the input text.
It continues to match characters until it hits the period, as it is trying to satisfy the pattern “cat.*.”.
- Retrieve Matches: Once all matches are found, retrieve and process them as needed for your application.
Since our pattern is greedy, the regex engine won’t backtrack until it finds a period. So, if there are multiple occurrences of “cat” in the text, it will keep matching characters until it finds a period for each occurrence.
These are the steps to do Greedy Matching
Example:
I have a cat named Saturn, and another cat named Saturnalia.
- Define Your Pattern: Start by defining the pattern you want to match in your regular expression. This pattern may include characters, groups, and quantifiers that specify how many times a character or group should be matched.
Our pattern will be “cat..”, which means we’re looking for the word “cat” followed by any characters (.) and ending with a period (.).
- Apply Quantifiers: Use quantifiers like “*”, “+”, “{n,}”, etc., to specify how many times a character or group should be matched. These quantifiers determine the greediness of the matching.
The .* part of the pattern is a greedy quantifier. It means that it will try to match as many characters as possible before satisfying the next part of the pattern (the period).
- Apply the Regular Expression: Apply your regular expression pattern to the input text you want to search through.
- Find Matches: Use a function like findall() (in Python’s re module) to find all matches of the pattern in the input text.
Use a function like findall() to find all matches of the pattern in the input text.
- Greedily Match: The regex engine will attempt to match as much of the input text as possible while still satisfying the overall pattern. This means it will try to match as many repetitions of the quantified elements as it can.
- Keep Matching Until Satisfied: The regex engine will keep trying to match more characters until it cannot match anymore without violating the pattern.
The regex engine will start by finding the first occurrence of “cat” and then try to match as many characters as possible until it finds a period.
- Backtrack if Necessary: If greediness causes the pattern to fail, the regex engine will backtrack and try different possibilities until it finds a match. This may involve matching fewer repetitions of a quantified element or taking a different path through the input text.
It continues to match characters until it hits the period, as it is trying to satisfy the pattern “cat.*.”.
- Retrieve Matches: Once all matches are found, retrieve and process them as needed for your application.
Since our pattern is greedy, the regex engine won’t backtrack until it finds a period. So, if there are multiple occurrences of “cat” in the text, it will keep matching characters until it finds a period for each occurrence.
___A meta character that ensures the entry at least 4 times.
example:
doc = ‘YahooYahoooYahooooYahooooYaho’
regExp = ‘Yahoo+’
rs = re.findall(regExp,doc)
print(rs)
output: [Yahoo, Yahoo, Yahoo,Yahoo]
a. *
b. +
c. ?
b. +
Difference between + and * is that + ensures 1 time whatever the pattern it and goes up to infinity.
What does this mean?
“doc = ‘YahooYaooYahoooo
regExp = ‘Yah?oo’
rs = re.findall(regExp,doc)
print(rs)
- doc = string of Yahoo
- regExp is Regular Expression saying that h? is optional in Yahoo
- rs = re.findall(regExp.doc)
Means that find all in Regular Exp. document
_______A metacharacter in regular expressions that specifies repeated patterns and also defines lower and upper limits
a. {,}
b. (,)
c. +
d. \
a. {,}
{ } A metacharacter in regular expressions that specifies repeated patterns and also defines lower and upper limits
[a-z] {2,5} means that want at least 2 ch. with highest 5 charaters
- ab (2 characters) so valid
- a,b,c,d (4 charac.) so valid
- dldgkd (6 charac.) not valid
{0} is the same as what metacharacter?
a. {,}
b. *
c. +
d. \
b. *
{0} = *
{1} is the same as what metacharacter?
a. {,}
b. *
c. +
d. \
c. +
{1} = +
{0,1} is the same as what metacharacter?
a. ?
b. *
c. +
d. \
a. ?
{0,1} = ?
match() means what in regular expression?
a. determines if the RE matches at the beginning of the string
b. scans through a string, looking for any location where this RE matches
c. neither
match()
a. determines if the RE matches at the beginning of the string
search() means what in regular expression?
a. determines if the RE matches at the beginning of the string
b. scans through a string, looking for any location where this RE matches
c. neither
search()
b. scans through a string, looking for any location where this RE matches
______ what regular expression means “logical or” used to join.
a. +
b. ()
c. |
d. *
c. |
means logical or that is used to join.
______ what regular expression matches at the end of a string, or any location followed by a newline character?
a. +
b. $
c. |
d. *
b. $
$ A regular expression matches at the end of a string, or any location followed by a newline character
______ what regular expression makes a group of characters to be treated just like a single character?
ex. want thethethethe or find repeatition
p=re.compile(‘(the)+’)
a. +
b. $
c. |
d. ()
d. ()
() is a regular expression that makes a group of characters to be treated just like a single character.
The () makes a group. If you want to find all the ‘the’ just group them all shown below. then search in doc
(the)+
m=p.search(doc)
(7, 22) thethethethethe
what does this mean?
It is the beginning and end
ex starts at 7 and ends at 22
______ splits the string into a list, splitting it wherever the RE matches.
a. split()
b. sub()
c. subn()
a. split()
split()- splits the string into a list, splitting it wherever the RE matches.
ex: abc, f12, 1349,a
if we want to split the result
output: [abc; f12; 1349; a]
______ finds all substrings where RE matches and replaces them with a different string.
a. split()
b. sub()
c. subn()
b. sub()
(also known as substitute)
sub() - finds all substrings where RE matches and replaces them with a different string.
______Does the same thing as sub() but returns with a new string and the number of replacements.
a. split()
b. sub()
c. subn()
c. subn()
Does the same thing as sub() but returns with a new string and the number of replacements.
p = re.compile(‘\W+’)
This is an example of a
a. word tokenizer
b. word spacer
c. w+
d. word compiler
a. word tokenizer
p = re.compile(‘\W+’)
_____ a field that focuses on software’s ability to understand and process human languages
a. NLP (natural language processing)
b. language compiler
c. word tokenizer
a. NLP (natural language processing)
-a field that focuses on software’s ability to understand and process human languages
_____ Spreading the text into tokens minimal meaningful units. This can be words, sentences. or sentences into words.
a. tokenization
b. parts of speech
c. stemming
a. tokenization
Spreading the text into tokens minimal meaningful units. This can be words, sentences. or sentences into words.
______ Assigning parts of speech to text
ex. noun, proverb, etc.
a. tokenization
b. parts of speech
c. stemming
b. parts of speech
Assigning parts of speech to text
ex. noun, proverb, etc.
______ process of reducing words to their stem.
ex. walking -> walk
a. tokenization
b. parts of speech
c. stemming
c. stemming
process of reducing words to their stem.
ex. walking -> walk
____ similar to stemming, operates by including word context, but includes “good or better”
a. tokenization
b. parts of speech
c. stemming
d. lemmatization
d. lemmatization
similar to stemming, operates by including word context, but includes “good or better”
_____ name entity recognition, labels the sequence of words of names of things.
ex. person, company, or street
a. tokenization
b. NER
c. stemming
b. NER
name entity recognition, labels the sequence of words of names of things.
ex. person, company, or street
____ analyze the grammer of the text to extract the same text form
a. tokenization
b. NER
c. stemming
d. parsing
d. parsing
analyze the grammer of the text to extract the same text form
spaCY- NER, tokenization, etc.
CoreNLP -
gensim- semantic analysis, clarity, efficiency
NLTK- Natural Language Token (Mother of all NLP libraries)
____ used for filtering information in web search. Helps avoid SPAM emails by classification.
a. text classification
b. classification
c. nlp classification
a. text classification
text classification - used for filtering information in web search. Helps avoid SPAM emails by classification.
____ identify opinions and sentiments of the audience. Understand emotions of audience via social media.
a. sentiment analysis
b. chatbots
c. classification
d. advertisement
a. sentiment analysis
Sentiment Analysis- identify opinions and sentiments of the audience. Understand emotions of audience via social media.
____ helps in customer support and assistance through low priority tasks. Also used in HR Systems like how many vacation days left.
a. chatbots
b. customer service
c. sentiment analysis
a. chatbots
helps in customer support and assistance through low priority tasks. Also used in HR Systems like how many vacation days left.