Building Features from Text Data in Microsoft Azure Flashcards Preview

DP-100 - PS > Building Features from Text Data in Microsoft Azure > Flashcards

Flashcards in Building Features from Text Data in Microsoft Azure Deck (10)
Loading flashcards...
1

Which of the following must be downloaded into your Natural Language Toolkit (NLTK) environment to use the NLTK defined stopwords?

stem

stopwords

punkt

lemma

stopwords

2

Which class provides a powerful (but simple) means of analyzing frequency distributions in words?

PunktSentenceTokenizer

nltk.probability.FreqDist

pandas.DataFrame

numpy.array

nltk.probability.FreqDist

3

What can remove words based on their count of recurrence in a document or corpus?

Lemmatization

Frequency filtering

Stopword removal

Tokenization

Frequency filtering

4

To use the WordNetLemmatizer class, make sure that you have downloaded which Natural Language Toolkit (NLTK) component?

punkt

tagsets

wordnet

stopwords

wordnet

5

Which of the following represents the complete set of words represented in an encoding?

Vocabulary

Document

Feature

Corpus

Vocabulary

6

HashingVectorizer builds on FeatureHasher by providing which capability?

Tokenization of documents

Word embeddings

Parts-of-speech tagging

Locality-sensitive hashing

Tokenization of documents

7

What is the process of breaking or splitting text into smaller meaningful components?

Stemming

Tokenization

Stopword removal

Lemmatization

Tokenization

8

Which tokenizer in Natural Language Toolkit (NLTK) can convert text into a sequence of sentences?

RegexTokenizer

PunktSentenceTokenizer

TreebankWordTokenizer

WhitespaceTokenizer

PunktSentenceTokenizer

9

Which of the following is a process of attempting to reduce a word to its base by removing inflection, but which might result in a nonsense word?

Stemming

Lemmatization

Stopword removal

Tokenization

Stemming

10

To use sent_tokenize or PunktSentenceTokenizer, which of the following must be downloaded into your Natural Language Toolkit (NLTK) environment?

stem

lemma

stopwords

punkt

punkt