Building Features from Text Data in Microsoft Azure Flashcards

1
Q

Which of the following must be downloaded into your Natural Language Toolkit (NLTK) environment to use the NLTK defined stopwords?

stem

stopwords

punkt

lemma

A

stopwords

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which class provides a powerful (but simple) means of analyzing frequency distributions in words?

PunktSentenceTokenizer

nltk. probability.FreqDist
pandas. DataFrame
numpy. array

A

nltk.probability.FreqDist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can remove words based on their count of recurrence in a document or corpus?

Lemmatization

Frequency filtering

Stopword removal

Tokenization

A

Frequency filtering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

To use the WordNetLemmatizer class, make sure that you have downloaded which Natural Language Toolkit (NLTK) component?

punkt

tagsets

wordnet

stopwords

A

wordnet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following represents the complete set of words represented in an encoding?

Vocabulary

Document

Feature

Corpus

A

Vocabulary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

HashingVectorizer builds on FeatureHasher by providing which capability?

Tokenization of documents

Word embeddings

Parts-of-speech tagging

Locality-sensitive hashing

A

Tokenization of documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the process of breaking or splitting text into smaller meaningful components?

Stemming

Tokenization

Stopword removal

Lemmatization

A

Tokenization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which tokenizer in Natural Language Toolkit (NLTK) can convert text into a sequence of sentences?

RegexTokenizer

PunktSentenceTokenizer

TreebankWordTokenizer

WhitespaceTokenizer

A

PunktSentenceTokenizer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following is a process of attempting to reduce a word to its base by removing inflection, but which might result in a nonsense word?

Stemming

Lemmatization

Stopword removal

Tokenization

A

Stemming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

To use sent_tokenize or PunktSentenceTokenizer, which of the following must be downloaded into your Natural Language Toolkit (NLTK) environment?

stem

lemma

stopwords

punkt

A

punkt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly