Natural Language Processing Flashcards

1
Q

Why does NLP fall under Speech Recognition?

Application 2:

-Speech recognizer normally takes your input signal as audio and converts that to Text and recognizes what is spoken.

-Speech recognizer has a language moduling involved that tells that probably the person has spoken this particular word in place of that word even though they sound the same. But as I see the sequence as it has spoken one of the word is more probable than the other one.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Application 3: Image Captioning- You give an image. You want the computer to type caption of the image.

How does this fall under NLP?

Image is 2 dimensional data. CNN is involved to encode the image. Once the image is encoded then we need NLP to generate the caption to generate text for it. NLP models are there to do that.

If we have a product that can converse one text to another language they all fall under NLP and goes to very deep Neural Networks.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Machine Translation: Want to convert from english to french.

How is this done?

Neural Machine Translation- Most fundamental product and most researched area in NLP.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

______ a language to specify the rules for the set of possible strings that you want to search in a corpus (large body) of text.

a. regular expression
b. natural expression
c. language
d. none of the above

A

a. regular expression

regular expression - a language to specify the rules for the set of possible strings that you want to search in a corpus (large body) of text.

Ex: “Your password must have at least 8 characters, at least 1 upper case letter, at least 1 lower case letter, at least 1 digit (number), and at least 1 symbol from the special symbols.”

This shows a specify of rules for a set of strings in a large body of text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In regular expression, why do not use python program for searching?

a. it is slower
b. Regular expression uses a C wrap , not needed
c. Not wise, regular expression package already included.
d. only b and c

A

d. only b and c

Do not use Python Program for searching Regular Expression because:

b. Regular expression uses a C wrap , not needed
c. Not wise, regular expression package already included.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do Regular Expression not use String Functions?

a. not customized
b. regular expressions are much more abstract and faster
c. regular expressions are not a language specific thing, can do almost anything
d. only b and c

A

d. only b and c

Do not use String Functions in Regular Expressions because:
b. regular expressions are much more abstract and faster
c. regular expressions are not a language specific thing, can do almost anything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

______ is a simple program that uses pattern matching to recognize phrases like “I need X” and translate them into subtle outputs like “What would it mean to you if you got X?”

a. Regular Expression
b. ELIZA chatbot
c. ChatGpt
d. None of the Above

A

b. ELIZA chatbot

ELIZA chatbot - is a simple program that uses pattern matching to recognize phrases like “I need X” and translate them into subtle outputs like “What would it mean to you if you got X?”

Example:
User: I am unhappy
ELIZA: Do you think coming here will help you not be unhappy?

User: I NEED SOME HELP, (I need x), that seems certain
ELIZA: What would it mean to you if you got some help (These are strings / outputs)

User: Perhaps I could learn to get along with my mother?
ELIZA: Tell me more about your family.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does import re mean?

a. import regular expression
b. import regular eliza
c. import return expression
d. import return eliza

A

import re mean

a. import regular expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does the following code:
“re.findall(‘abc’, ‘askdfj;askabcdfahgfa;ghabc;hgabchkg;a’) mean?

a. regular expression, find all
b. find all, look for string abc
c. regular expression, find all, look for string abc in string asdlkfjalhgld.

A

the following code:
“re.findall(‘abc’, ‘askdfj;askabcdfahgfa;ghabc;hgabchkg;a’) mean

c. regular expression, find all, look for string abc in string asdlkfjalhgld.

re = regular expression
findall = find all
‘abc’ = what we are trying to look for
‘adklagfahga’ = string we are looking in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which are meta characters with Special Meaning?

a. . ^ $ * + ?
b. . ^ $ * + ? { } [ ]
c. . ^ $ * + ? { } [ ] \ |
d. . ^ $ * + ? { } [ ] \ | ( )

A

Are meta characters with Special Meaning
d. . ^ $ * + ? { } [ ] \ | ( )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

_____ used for specifying a class, which is a set of characters that you wish to match. Characters can be listed individually like [abcdef] or in a range ‘ like [a-f]?

a. [ ]
b. { }
c. \
d. ( )

A

a. [ ]

[ ] This metacharacter is used for specifying a class, which is a set of characters that you wish to match. Characters can be listed individually like [abcdef] or in a range ‘ like [a-f]

EXAMPLE:
re.findall (‘[abcd]’, ‘kasdf’

output: ‘a’, ‘d’,
K in kasdf is not searched because it is not in [‘abcd’ ]
a is searched and in output because is class findall. [‘abcd’ ]
s is not searched because it not class findall. [‘abcd’ ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Using [ ], count the number of digits below:

‘2319ab4621acdz+*!’

i. #define your string
ii. #define a corrector class, [0-9]
iii. # find all occurrences out of this class using re.findall
iiii. # Use length will tell me how many digits that are there

a. 4,3,2,1
b. 1,2,3,4
c. 2,3,4,1
d. 3,2,4,1

A

k = ‘2319ab4621acdz+*!’
L=re.findall(‘[0-9]’,s)
print(len(L))

b. 1,2,3,4

1) #define your string
2) #define a corrector class, [0-9]
3) # find all occurrences out of this class using re.findall
4) # Use length will tell me how many digits that are there

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Verify that 4 characters were printed consecutively.

if len(re.findall(‘[0-9][0-9][0-9][0-9]
‘asdfja;_+);1kj2306kjl891’))>0:
print(“Found”)
else:
print(“Not Found”)

A

1) if len(re.findall(‘[0-9][0-9][0-9][0-9]

*Finds all 4 characters consecutively why we have 4 different [0-9]

2) ‘asdfja;_+);1kj2306kjl891’))>0:

*This is our string

3) print(“Found”)
else:
print(“Not Found”)

*This says “Based on result, print Found or Not Found”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

_____ is a Meta Character symbol or tool is used in regular expressions to set complements.

a. ^
b. [ ]
c. ( )
d. $

A

a. ^
^ symbol or tool is used in regular expressions to set complements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

[^023abf] what does the meta character ^ doing here?

a. Set complement saying everything except 023abf
b. includes all 023abf
c. exponent
d. Ordinary Complement

A

a. saying everything except 023abf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

[023abf^] what does the meta character ^ doing here?

a. Set complement saying everything except 023abf
b. includes all 023abf
c. exponent
d. Ordinary Complement

A

d. Ordinary Complement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

_______ is known as the “zero or more” quantifier. It indicates that the preceding character or expression can occur zero or more times in the input text. Here’s what it signifies:

Zero Occurrences: The character or expression preceding “*” may not occur at all in the text, and the pattern will still match.

Multiple Occurrences: If the character or expression occurs, it can occur any number of times (including zero).

a. $
b. *
c. ^
d. @

A

b. *

  • is known as the “zero or more” quantifier. It indicates that the preceding character or expression can occur zero or more times in the input text. Here’s what it signifies:

Zero Occurrences: The character or expression preceding “*” may not occur at all in the text, and the pattern will still match.

Multiple Occurrences: If the character or expression occurs, it can occur any number of times (including zero).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What example of Meta Character is the following:

the regular expression “cats” matches both “cat” and “cats” in the input text.

a. $
b. *
c. ^
d. @

A

b. *

In this example, the regular expression “cats” matches both “cat” and “cats” in the input text. The “” allows for zero or more occurrences of the character “s”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

_____ is defined as the one that does not start with a digit and does not contain any special characters other than under score and it can have arbitrary number or characters.

a. proper variable name
b. regular expression
c. *
d. special sequence

A

a. proper variable name

A Proper Variable Name - is defined as the one that does not start with a digit and does not contain any special characters other than under score and it can have arbitrary number or characters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

____ applies repetitive pattern as long as it can go. the default behavior in most regex engines.

a. special sequence
b. proper variable name
c. greedy matching
d. *

A

c. greedy matching

Greedy Matching
1. applies repetitive pattern as long as it can go.
2. the default behavior in most regex engines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

These are the steps to do Greedy Matching

Example:
I have a cat named Saturn, and another cat named Saturnalia.

  1. Define Your Pattern: Start by defining the pattern you want to match in your regular expression. This pattern may include characters, groups, and quantifiers that specify how many times a character or group should be matched.

Our pattern will be “cat..”, which means we’re looking for the word “cat” followed by any characters (.) and ending with a period (.).

  1. Apply Quantifiers: Use quantifiers like “*”, “+”, “{n,}”, etc., to specify how many times a character or group should be matched. These quantifiers determine the greediness of the matching.

The .* part of the pattern is a greedy quantifier. It means that it will try to match as many characters as possible before satisfying the next part of the pattern (the period).

  1. Apply the Regular Expression: Apply your regular expression pattern to the input text you want to search through.
  2. Find Matches: Use a function like findall() (in Python’s re module) to find all matches of the pattern in the input text.

Use a function like findall() to find all matches of the pattern in the input text.

  1. Greedily Match: The regex engine will attempt to match as much of the input text as possible while still satisfying the overall pattern. This means it will try to match as many repetitions of the quantified elements as it can.
  2. Keep Matching Until Satisfied: The regex engine will keep trying to match more characters until it cannot match anymore without violating the pattern.

The regex engine will start by finding the first occurrence of “cat” and then try to match as many characters as possible until it finds a period.

  1. Backtrack if Necessary: If greediness causes the pattern to fail, the regex engine will backtrack and try different possibilities until it finds a match. This may involve matching fewer repetitions of a quantified element or taking a different path through the input text.

It continues to match characters until it hits the period, as it is trying to satisfy the pattern “cat.*.”.

  1. Retrieve Matches: Once all matches are found, retrieve and process them as needed for your application.

Since our pattern is greedy, the regex engine won’t backtrack until it finds a period. So, if there are multiple occurrences of “cat” in the text, it will keep matching characters until it finds a period for each occurrence.

A

These are the steps to do Greedy Matching

Example:
I have a cat named Saturn, and another cat named Saturnalia.

  1. Define Your Pattern: Start by defining the pattern you want to match in your regular expression. This pattern may include characters, groups, and quantifiers that specify how many times a character or group should be matched.

Our pattern will be “cat..”, which means we’re looking for the word “cat” followed by any characters (.) and ending with a period (.).

  1. Apply Quantifiers: Use quantifiers like “*”, “+”, “{n,}”, etc., to specify how many times a character or group should be matched. These quantifiers determine the greediness of the matching.

The .* part of the pattern is a greedy quantifier. It means that it will try to match as many characters as possible before satisfying the next part of the pattern (the period).

  1. Apply the Regular Expression: Apply your regular expression pattern to the input text you want to search through.
  2. Find Matches: Use a function like findall() (in Python’s re module) to find all matches of the pattern in the input text.

Use a function like findall() to find all matches of the pattern in the input text.

  1. Greedily Match: The regex engine will attempt to match as much of the input text as possible while still satisfying the overall pattern. This means it will try to match as many repetitions of the quantified elements as it can.
  2. Keep Matching Until Satisfied: The regex engine will keep trying to match more characters until it cannot match anymore without violating the pattern.

The regex engine will start by finding the first occurrence of “cat” and then try to match as many characters as possible until it finds a period.

  1. Backtrack if Necessary: If greediness causes the pattern to fail, the regex engine will backtrack and try different possibilities until it finds a match. This may involve matching fewer repetitions of a quantified element or taking a different path through the input text.

It continues to match characters until it hits the period, as it is trying to satisfy the pattern “cat.*.”.

  1. Retrieve Matches: Once all matches are found, retrieve and process them as needed for your application.

Since our pattern is greedy, the regex engine won’t backtrack until it finds a period. So, if there are multiple occurrences of “cat” in the text, it will keep matching characters until it finds a period for each occurrence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

___A meta character that ensures the entry at least 4 times.

example:
doc = ‘YahooYahoooYahooooYahooooYaho’
regExp = ‘Yahoo+’
rs = re.findall(regExp,doc)
print(rs)

output: [Yahoo, Yahoo, Yahoo,Yahoo]

a. *
b. +
c. ?

A

b. +

Difference between + and * is that + ensures 1 time whatever the pattern it and goes up to infinity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does this mean?

“doc = ‘YahooYaooYahoooo
regExp = ‘Yah?oo’
rs = re.findall(regExp,doc)
print(rs)

A
  1. doc = string of Yahoo
  2. regExp is Regular Expression saying that h? is optional in Yahoo
  3. rs = re.findall(regExp.doc)
    Means that find all in Regular Exp. document
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

_______A metacharacter in regular expressions that specifies repeated patterns and also defines lower and upper limits

a. {,}
b. (,)
c. +
d. \

A

a. {,}
{ } A metacharacter in regular expressions that specifies repeated patterns and also defines lower and upper limits

[a-z] {2,5} means that want at least 2 ch. with highest 5 charaters

  1. ab (2 characters) so valid
  2. a,b,c,d (4 charac.) so valid
  3. dldgkd (6 charac.) not valid
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

{0} is the same as what metacharacter?

a. {,}
b. *
c. +
d. \

A

b. *

{0} = *

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

{1} is the same as what metacharacter?

a. {,}
b. *
c. +
d. \

A

c. +

{1} = +

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

{0,1} is the same as what metacharacter?

a. ?
b. *
c. +
d. \

A

a. ?

{0,1} = ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

match() means what in regular expression?

a. determines if the RE matches at the beginning of the string
b. scans through a string, looking for any location where this RE matches
c. neither

A

match()

a. determines if the RE matches at the beginning of the string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

search() means what in regular expression?

a. determines if the RE matches at the beginning of the string
b. scans through a string, looking for any location where this RE matches
c. neither

A

search()

b. scans through a string, looking for any location where this RE matches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

______ what regular expression means “logical or” used to join.

a. +
b. ()
c. |
d. *

A

c. |

means logical or that is used to join.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

______ what regular expression matches at the end of a string, or any location followed by a newline character?

a. +
b. $
c. |
d. *

A

b. $

$ A regular expression matches at the end of a string, or any location followed by a newline character

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

______ what regular expression makes a group of characters to be treated just like a single character?

ex. want thethethethe or find repeatition

p=re.compile(‘(the)+’)

a. +
b. $
c. |
d. ()

A

d. ()

() is a regular expression that makes a group of characters to be treated just like a single character.

The () makes a group. If you want to find all the ‘the’ just group them all shown below. then search in doc

(the)+
m=p.search(doc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

(7, 22) thethethethethe

what does this mean?

A

It is the beginning and end

ex starts at 7 and ends at 22

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

______ splits the string into a list, splitting it wherever the RE matches.

a. split()
b. sub()
c. subn()

A

a. split()

split()- splits the string into a list, splitting it wherever the RE matches.

ex: abc, f12, 1349,a
if we want to split the result
output: [abc; f12; 1349; a]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

______ finds all substrings where RE matches and replaces them with a different string.

a. split()
b. sub()
c. subn()

A

b. sub()
(also known as substitute)

sub() - finds all substrings where RE matches and replaces them with a different string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

______Does the same thing as sub() but returns with a new string and the number of replacements.

a. split()
b. sub()
c. subn()

A

c. subn()

Does the same thing as sub() but returns with a new string and the number of replacements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

p = re.compile(‘\W+’)

This is an example of a

a. word tokenizer
b. word spacer
c. w+
d. word compiler

A

a. word tokenizer

p = re.compile(‘\W+’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

_____ a field that focuses on software’s ability to understand and process human languages

a. NLP (natural language processing)
b. language compiler
c. word tokenizer

A

a. NLP (natural language processing)

-a field that focuses on software’s ability to understand and process human languages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

_____ Spreading the text into tokens minimal meaningful units. This can be words, sentences. or sentences into words.

a. tokenization
b. parts of speech
c. stemming

A

a. tokenization

Spreading the text into tokens minimal meaningful units. This can be words, sentences. or sentences into words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

______ Assigning parts of speech to text

ex. noun, proverb, etc.

a. tokenization
b. parts of speech
c. stemming

A

b. parts of speech

Assigning parts of speech to text

ex. noun, proverb, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

______ process of reducing words to their stem.

ex. walking -> walk

a. tokenization
b. parts of speech
c. stemming

A

c. stemming

process of reducing words to their stem.

ex. walking -> walk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

____ similar to stemming, operates by including word context, but includes “good or better”

a. tokenization
b. parts of speech
c. stemming
d. lemmatization

A

d. lemmatization

similar to stemming, operates by including word context, but includes “good or better”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

_____ name entity recognition, labels the sequence of words of names of things.

ex. person, company, or street

a. tokenization
b. NER
c. stemming

A

b. NER

name entity recognition, labels the sequence of words of names of things.

ex. person, company, or street

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

____ analyze the grammer of the text to extract the same text form

a. tokenization
b. NER
c. stemming
d. parsing

A

d. parsing

analyze the grammer of the text to extract the same text form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

spaCY- NER, tokenization, etc.
CoreNLP -
gensim- semantic analysis, clarity, efficiency
NLTK- Natural Language Token (Mother of all NLP libraries)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

____ used for filtering information in web search. Helps avoid SPAM emails by classification.

a. text classification
b. classification
c. nlp classification

A

a. text classification

text classification - used for filtering information in web search. Helps avoid SPAM emails by classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

____ identify opinions and sentiments of the audience. Understand emotions of audience via social media.

a. sentiment analysis
b. chatbots
c. classification
d. advertisement

A

a. sentiment analysis

Sentiment Analysis- identify opinions and sentiments of the audience. Understand emotions of audience via social media.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

____ helps in customer support and assistance through low priority tasks. Also used in HR Systems like how many vacation days left.

a. chatbots
b. customer service
c. sentiment analysis

A

a. chatbots

helps in customer support and assistance through low priority tasks. Also used in HR Systems like how many vacation days left.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

_______ offers insights into audience preferences and helps improve customer satisfaction

a. customer service
b. chatbots
c. sentiment analysis

A

a. customer service

offers insights into audience preferences and helps improve customer satisfaction

50
Q

____ document summarization, machine translation, and speech recognition.

a. customer service
b. chatbots
c. sentiment analysis
d. natural language processing

A

d. natural language processing

offers document summarization, machine translation, and speech recognition.

51
Q

Natural Language is is any language that has evolved naturally through use and repetition without conscious planning or premeditation

TRUE

FALSE

A

TRUE

Natural Language is is any language that has evolved naturally through use and repetition without conscious planning or premeditation

Natural Language is what humans use to communicate and it has evolved with human evolution

52
Q

NLP is a science that focuses on : 1 - Grammar
2 - Translation
3 - Speech Recognition
4 - Software Ability to understand and proess human language

A. Software’s ability to understand and process human’s language

B. Speech Recognition

C. Grammar

D. Translation

A

NLP is a science that focuses on :

A. Software’s ability to understand and process human’s language

NLP has evolved as a science to build programs or software capable of understanding human language

53
Q

NLTK stands for Natural Language Tool Kit

TRUE

FALSE

A

TRUE

NLTK stands for Natural Language Tool Kit

54
Q

_____ process of breaking up text into smaller pieces (tokens)

a. tokenization
b. NER
c. stemming
d. parsing

A

a. tokenization

TOKENIZATION process of breaking up text into smaller pieces (tokens)

55
Q

____ words that are commonly used. Language specific also. ex: ‘a’, ‘an’, ‘the’

a. stop words
b. tokenization
c. stemming
d. parsing

A

a. stop words
Stop words - words that are commonly used. Language specific also.

56
Q

Tokenizing a sentences is assigning ids to each word

FALSE

TRUE

A

FALSE

Tokenizing a sentence is spliting it into tokens ( words )

57
Q

POS tagging is the process of assigning tags to tokens (words) like nouns, verbs …

TRUE

FALSE

A

TRUE

POS taggig is the process of assigning part of speech tags to tokens (words)
Tags include noun, verb, adjective etc…

58
Q

TF-IDF is used to :

A. Extract keywords or features from a Text

B. Find synonyms

C. Extract the root or lemma of a word

A

A. Extract keywords or features from a Text

TF-IDF is a technique used to find what are the dominant words or keywords in a text

59
Q

______ Process of computationally classifying and categorizing opinions expressed in a piece of text.

Helps understand the writers opinion about a topic, event, product etc.

A. Sentiment Analysis

B. Find synonyms

C. Extract the root or lemma of a word

A

A. Sentiment Analysis

Process of computationally classifying and categorizing opinions expressed in a piece of text.

60
Q

________is the first layer of the neural network.

______ Allows words with similar meanings to have similar representation.

A. Sentiment Analysis

B. Find synonyms

C. Extract the root or lemma of a word

D. Word Embeddings

A

D. Word Embeddings

________is the first layer of the neural network.

______ Allows words with similar meanings to have similar representation.

61
Q

what does this code mean:

network = Sequential.

A

This shows that you are using a neural network and need to add layers in sequence.

62
Q

The closer the sentiment analysis is to 0 what does that mean?

a. more positive result
b. more negative result

A

b. more negative result

Sentiment analysis closer to 0 is a more negative result.

63
Q

The closer the sentiment analysis is to 1 what does that mean?

a. more positive result
b. more negative result

A

a. more positive result

Sentiment analysis closer to 1 is a more negative result.

64
Q

Sentiment Analysis is a process to classify text into topics

FALSE

TRUE

A

FALSE

Sentiment Analysis is used to classify text based on the opinion or the sentiment of the writer: Negative or Positive.

65
Q

It is a best practice to use all the dataset to train models

FALSE

TRUE

A

FALSE

Dataset must be split into Training and Test data in order to avoid wrong performance calculations.

66
Q

True or False:

The basic mechanics of machine learning is to make computers act without being explicitly programmed to do so?

A

True

The basic mechanics of machine learning is to make computers act without being explicitly programmed to do so.

67
Q

_____ has given us:

-Fraud detection
-Web search
-Self-Driving cars
-Online shopping recommendations

A. Machine Learning
B. Deep Learning
C. Reinforcement Learning
d. None of above

A

A. Machine Learning

Machine Learning has given us:

-Fraud detection
-Web search
-Self-Driving cars
-Online shopping recommendations

68
Q

______ helps us take a picture
of what someone else wrote in a board and convert it into text.

ex. scan a doc and want to convert to word document

A. OCR (Optical Character Recognition)

B. Machine Learning

C. Deep Learning

D. Reinforcement Learning

A

A. OCR (Optical Character Recognition)

OCR (Optical Character Recognition) -
helps us take a picture
of what someone else wrote in a board and convert it into text.

ex. scan a doc and want to convert to word document

69
Q

_____ type uses applications:
-Facebook news feed
-Self-Driving cars
-Virtual personal assistant
-Email spams
-Online customer support

A. OCR (Optical Character Recognition)

B. Machine Learning

C. Deep Learning

D. Reinforcement Learning

A

B. Machine Learning

Machine Learning applications include:
-Facebook news feed
-Self-Driving cars
-Virtual personal assistant
-Email spams
-Online customer support

70
Q

______ Mainly used for classification problems, repick the most significant attribute and then splits them creating a tree like structure

a. Decision Tree
b. Logistic Regression
c. Linear Regression
d. Naive Bayes

A

a. Decision Tree

Decision Trees - Mainly used for classification problems, repick the most significant attribute and then splits them creating a tree like structure

71
Q

______ Uses data we have learned in the past and and applies what is learned on new data. It starts on dataset - Train- model.

Also compare output if it correct or not to improvement.

a. supervised
b. unsupervised learning

A

a. supervised - Uses data we have learned in the past and and applies what is learned on new data. It starts on dataset - Train- model.

72
Q

_________ if dataset is not labeled, categorized, or configured. Finds a hidden parent or structure from unlabeled data based on similarities.

a. supervised
b. unsupervised learning

A

unsupervised learning - if dataset is not labeled, categorized, or configured. Finds a hidden parent or structure from unlabeled data based on similarities.

73
Q

_______ statistical approach to find relationshp between variables. Predicts outcome from input based on relationship between variables extracted or obtained from dataset.

a. linear regression
b. logistic regression

A

a. linear regression

Linear Regression - statistical approach to find relationshp between variables. Predicts outcome from input based on relationship between variables extracted or obtained from dataset.

74
Q

________ also a statistical method used to predict binary outcome, Yes / No, 0/ 1, True or False given independent variables. When outcome variable is configurable.

Ex. if a transaction to be spam or not.

a. linear regression
b. logistic regression

A

b. logistic regression

Logistic Regression - also a statistical method used to predict binary outcome, Yes / No, 0/ 1, True or False given independent variables. When outcome variable is configurable.

Ex. if a transaction to be spam or not.

75
Q

________ Useful for large datasets, can outperform even highly sophisticated classification methods. Form a family of simple probablistic classifiers. All attributes are independent.

ex. Orange is round, certain size, and color. But would not assume these things all at once.

A. Decision tree
b Naive Bayes

A

b Naive Bayes

Naive Bayes - Useful for large datasets, can outperform even highly sophisticated classification methods. Form a family of simple probablistic classifiers. All attributes are independent.

ex. Orange is round, certain size, and color. But would not assume these things all at once.

76
Q

_____ - process of predicting the class or category of a given input / data. A program will learn from train in dataset. It can be bi-class (ex. male or female). Sentiment analyzer is an example of a bi-class classifier.

ex. if person is male or female
or if email is spam or not.

a. classification
b. decistion tree
c. naive bayes

A

a. classification

CLASSIFICATION - process of predicting the class or category of a given input / data. A program will learn from train in dataset. It can be bi-class (ex. male or female). Sentiment analyzer is an example of a bi-class classifier.

77
Q

True or False:

Classification can be used in
A. Bi-Class (Ex.: Male or Female
B. Multi-Class (what type of fruit in pict)
(what type of text is article talking about)

A

True:

Classification can be used in
A. Bi-Class
B. Multi-Class

78
Q

_______ classifies textual information into categories. We want to know what people are talking about and what are people’s opinion?

a. classification
b. decistion tree
c. naive bayes
d. text classification

A

d. text classification

Text classification - classifies textual information into categories. We want to know what people are talking about and what are people’s opinion

79
Q

____ used to organize, structure, or organize into classes?

a. classification
b. decistion tree
c. naive bayes
d. text classification

A

d. text classification

Text Classification - used to organize, structure, or organize into classes?

80
Q

Provide steps in Classification

i. Feature extraction (ex. sentiment analyzer in keras) and transform into math representation in the form of vectors.

ii. Labels: (ex. sunny, machine, learning) represent as 1, 1, 0 \

iii. Goes into training and text is analyzed.

iv. Model created and tested.

a. 1,2,3,4
b. 4,3,2,1
c. 2,1,3,4

A

a. 1,2,3,4

Classification Steps:

i. Feature extraction (ex. sentiment analyzer in keras) and transform into math representation in the form of vectors.

ii. Labels: (ex. sunny, machine, learning) represent as 1, 1, 0 \

iii. Goes into training and text is analyzed.

iv. Model created and tested.

81
Q

Steps to Pre-Process Dataset for Classification

i. pre-process the data to get Dataset
(use scikit learn)

ii. Get the training and test data subjects

iii. check out categories names

iv. printing a single ost

v. extracting features

vi. calculating TF-IDF

a. 1,2,3,4,5,6
b. 2,4,6,1,3,5
c. 6,5,4,3,2,1
d. none of above

A

a. 1,2,3,4,5,6

Steps to Pre-Process Dataset for Classification

i. pre-process the data to get Dataset
(use scikit learn)

ii. Get the training and test data subjects

iii. check out categories names

iv. printing a single ost

v. extracting features

vi. calculating TF-IDF

82
Q

________ has multinominal and gaussian variables. Multinomilan for multinomial data- used for text classification (ex. word counts for text classificatin)

a. naive bayes
b. SVM
c. Multinomial

A

a. naive bayes

Multinomilan for multinomial data- used for text classification (ex. word counts for text classificatin)

83
Q

_____ for multinomial data- used for text classification (ex. word counts for text classificatin)

a. naive bayes
b. SVM
c. Multinomial

A

c. Multinomial

Multinomial is a Naive Bayes classifier. Used for multinomial data- used for text classification (ex. word counts for text classificatin)

84
Q

_____A Naive Bayes classifier used for classification or regression problems. Uses a hyperplane seperation. Discriminative classifier given labeled data. Based on the labeled data it outputs an optimal hyperplane to either input data or categorize potential new points.

a. naive bayes
b. SVM (support vector machines)
c. Multinomial

A

b. SVM (support vector machines)

SVM - A Naive Bayes classifier used for classification or regression problems. Uses a hyperplane seperation. Discriminative classifier given labeled data. Based on the labeled data it outputs an optimal hyperplane to either input data or categorize potential new points.

85
Q

Machine Learning is used in :

A. Recommendation engines

B. Self-Driving cars

C. Fraud Detection

D. All the above

A

D. All the above

Machine Learning is used in :

-Recommendation engines

-Self-Driving cars

-Fraud Detection

86
Q

Machine learning can be supervised or unsupervised

TRUE

FALSE

A

TRUE

Machine Learning is divided into two categories : Supervised and Unsupervised

87
Q

Text Classification is used to correct the grammar mistakes in a text

FALSE

TRUE

A

FALSE

Text Classification is used to classify text content based on topic or sentiment per example

88
Q

_____ An Artificial Intelligence computer program that can hold a conversation with a human using natural language

ex. C3PO in Star Wars

A. Chatbots
B. AI
C. Deeplearning
D. NLP

A

A. Chatbots

An Artificial Intelligence computer program that can hold a conversation with a human using natural language

ex. C3PO in Star Wars

EX: How is the weather going to be tomorrow

89
Q

___ a library used to build chatbots. ML conversatinoal dialouge engine built using python. Provide automated responses using queries. Easy to use and create a chatbot fastly. Uses ML algorithms to produce responses. It is multi-lingual and open source , available on GitHub

A. Chatbots
B. AI
C. Deeplearning
D. Chatterbot

A

D. Chatterbot

CHATTERBOT - a library used to build chatbots. ML conversatinoal dialouge engine built using python. Provide automated responses using queries. Easy to use and create a chatbot fastly. Uses ML algorithms to produce responses. It is multi-lingual and open source , available on GitHub

90
Q

What is the correct Chatterbot Flow?

i. Input
ii. Process and Apply Adapters
iii. Response

a. 1,2,3
b. 3,2,1
c. 2,1,3

A

a. 1,2,3

i. Input
ii. Process and Apply Adapters
iii. Response

91
Q

To add items to a Table View, we use :

A. A program that can hold a conversation

B. HumanDroid

C. A dating app

A

To add items to a Table View, we use:

A. A program that can hold a conversation

A chatbot is a computer progrm that can hold a human like conversation

92
Q

We use ChatterBot preprocessors to modify the input statement that a chatbot receives

TRUE

FALSE

A

TRUE

Preprocessors are used to modify the input like ‘chatterbot.preprocessors.clean_whitespace’ that is used to remove white spaces

93
Q

______ Are used to modify the input like ‘chatterbot.preprocessors.clean_whitespace’ that is used to remove white spaces

A. pre-processors
b. GPU
c. classification
d. NLTK

A

A. pre-processors

Preprocessors are used to modify the input like ‘chatterbot.preprocessors.clean_whitespace’ that is used to remove white spaces

94
Q

Chatterbot supports only the English Language

FALSE

TRUE

A

FALSE

One of chatterbot advantages is that it supports multiple languages

95
Q

How many words in the sentence?

” I always uh do the main um processing, I mean, the uh um data-processing.”

a. 15
b. 10
c. 11

A

uh and um are considered “DE-INFLUENCES”

If it is an important word, you have to consider the application you are working on.

a. 15
In this specific case, we are picking space units.

96
Q

____ this is task dependent and languate dependent

a. word
b. vocabulary
c. correctors

A

a. word

Words are task dependent and language dependent

97
Q

_____ set of unique words (word types)

a. word
b. vocabulary
c. correctors

A

b. vocabulary

Vocabulary - set of unique words (word types). Punctuations are not words.

I always uh do the main um processing, I mean, the uh um data, processing.

98
Q

In NLP vocabulary what is not considered

a. punctuation
b. stop words
c. can’t

A

a. punctuation

In NLP, vocabulary, PUNCTUATION is NOT CONSIDERED

99
Q

What is the vocabulary outcome of the following:

“I always uh do the main um processing, I mean, the uh um data-processing”

A

{I, always, uh, do, the, main, um, processing, I, mean, the, um, data, processing}

100
Q

______ large body of text and all available documents are there.

a. corpus
b. vocab
c. stop words

A

a. corpus

CORPUS- large body of text and all available documents are there.

101
Q

__________List of words (tokens) in a document. Meaning all the words in a document.

a. corpus
b. vocab
c. stop words
d. tokens

A

d. tokens

Tokens- List of words (tokens) in a document. Meaning all the words in a document.

102
Q

What is the token in the following:

“I always uh do the main um processing, I mean, the uh um data-processing?”

a. 11
b. 15
c. 12

A

b. 15

every word is a token. Every word should be included.

103
Q

_______ up to date package for NLP processing. Most modern NLP is used in this package.

a. Python
b. spaCy
c. sci-kit learn

A

b. spaCy

spaCy- up to date package for NLP processing. Most modern NLP is used in this package.

104
Q

The Text Processing Flow

i. build the vocabulary (corpus, recognize need words)

ii. represent different words by word encodings (also called word embeddings)

iii. classification pipeline

a. 1,2,3
b. 3,2,1
c. 2,1,3

A

a. 1,2,3

The Text Processing Flow

i. build the vocabulary (corpus, recognize need words)

ii. represent different words by word encodings (also called word embeddings)

iii. classification pipeline

105
Q

True or False:

Every NLP task requires text normalization:

  1. Tokenizing words
  2. Normilizing word formats
  3. Segmenting sentences
A

True

Every NLP task requires text normalization:

  1. Tokenizing words
  2. Normilizing word formats
  3. Segmenting sentences
106
Q

True or False:

In Space based Tokenization,
Many languages (like Chinese, Japanese, Thai) DO NOT use spaces to seperate words.

A

True

In “Space based Tokenization”,
Many languages (like Chinese, Japanese, Thai) DO NOT use spaces to seperate words.

107
Q

i. recieve the data

ii. learn from the data, what kind of units should be called as tokens regardless of words or not.

overall: USE THE DATA to tell us HOW TO TOKENIZE

a. Data Driven Approach
b. Data Tokenization
c. Subword Tokenization

A

a. Data Driven Approach

i. recieve the data

ii. learn from the data, what kind of units should be called as tokens regardless of words or not.

overall: USE THE DATA to tell us HOW TO TOKENIZE

108
Q

______ Rather than using the whole words you build the vocabulary by individual characters. (all the letters in the corpus). Then add words with the letters who are most frequent.

a. Data Driven Approach
b. Data Tokenization
c. Subword Tokenization
d. Byte Pair Encoding (BPE)

Visual: (A,B,C,D,…a,b,c,d,,)
(A,B,) is merged ‘AB’ to the vocab. Add this to the corpuse

(A,B,C,D, ….a,b,c,d,.. AB)

Keep doing this until you have alot of merges that make words. “k merges’

A

d. Byte Pair Encoding (BPE)

Rather than using the whole words you build the vocabulary by individual characters. (all the letters in the corpus). Then add words with the letters who are most frequent.

Visual: (A,B,C,D,…a,b,c,d,,)
(A,B,) is merged ‘AB’ to the vocab. Add this to the corpuse

(A,B,C,D, ….a,b,c,d,.. AB)

Keep doing this until you have alot of merges that make words. “k merges’

109
Q

_______________ smallest meaning-bearing unit of a language

a. morpheme
b. byte pair encoding (BPE)
c. Data Driven Approach
d. Data Tokenization

A

a. morpheme

morpheme- smallest meaning- bearing unit of a language

110
Q

_________ the core meaning bearing units

a. stems
b. affixes
c. morpheme
d. word normalization

A

a. stems

STEMS- the core meaning bearing units

111
Q

___________ certain parts that are apart of stems, often with grammatical functions.

ex. ING -> LAMDA sign
ex. SSES -> SS
ex. ATIONAL -> ATE (relational -> relate)

a. stems
b. affixes
c. morpheme
d. word normalization

A

b. affixes

AFFIXES - certain parts that are apart of stems, often with grammatical functions.

ex. ING -> LAMDA sign
ex. SSES -> SS
ex. ATIONAL -> ATE (relational -> relate)

112
Q

__________ very useful in prepocessing. This is critical in chatbot systems and speech recognition systems.

a. sentence segmentation
b. word normalization
c. stemming
d. morpheme

A

a. sentence segmentation

SENTENCE SEGMENTATION - very useful in preprocessing. This is critical in chatbot systems and speech recognition systems.

113
Q

Flow of Sentence Segmentation

i. tokenize first

ii. Use rules or ML to classify a .(.) as either 1(part of word) or 2 (in sentence)

a. 1,2
b. 2,1

A

a. 1,2

Flow of Sentence Segmentation

i. tokenize first

ii. Use rules or ML to classify a .(.) as either 1(part of word) or 2 (in sentence)

114
Q

A model that actually predicts the probability of a sentence. Also can be known as Probabilistic Model.

a. Language modeling
b. probabilistic modeling
c. machine translation
d. speech recognition

A

a. Language modeling

A model that actually predicts the probability of a sentence. Also can be known as Probabilistic Model.

115
Q

P(high winds tonight) > P(large wind tonight)

this is an ex of
a. speech recognition
b. spelling correction
c. machine translation

A

This is machine translation.

Means that High is more probable than using large.

116
Q

P(I saw a van)»P(eyes aw of an)

this is an ex of

a. speech recognition
b. spelling correction
c. machine translation

A

a. speech recognition

Same outcome just different spelling

117
Q

p(about fifteen minutes from) >

p(about fifteen minuets from)

this is an ex of

a. speech recognition
b. spelling correction
c. machine translation

A

b. spelling correction

Because minutes is mispelled in 2nd outcome

p(about fifteen minutes from) >

p(about fifteen minuets from)

118
Q

P(w1,w2,w3,w4)

is an example of

a. probability of a sentence
b. probability of a next word

A

a. probability of a sentence

P(w1,w2,w3,w4)

119
Q

P(wn| w1,w2…wn-1)

is an example of

a. probability of a sentence
b. probability of a next word

A

b. probability of a next word

P(wn| w1,w2…wn-1)

120
Q
A