Sample Practice Questions for Final Exam 2 Flashcards Preview

Big Data Analystics - Final Exam 2 > Sample Practice Questions for Final Exam 2 > Flashcards

Flashcards in Sample Practice Questions for Final Exam 2 Deck (14):
1

In the R statement below, df is a _____
df

Data Frame

2

In R, you can reference variable X2 in data frame df as what?

df$X2

3

The R statement c(4, 7, 9), after executed, will return what?

4 7 9

4

The R statement plot(countries$PCGDP, countries$PiracyRate) produces what?

A scatterplot of variable PiracyRate versus variable PCGDP

5

In text analytics, the Vector Space Model refers to what?

A term frequency matrix where documents are represented by term dimensions

6

True or False: In R, in order to produce a word cloud from a corpus of unstructured documents, first we
need to parse the corpus into words, and then create a term document matrix?

True

7

A word cloud produced by R library wordcloud displays terms with font sizes that are what?

Proportional to their frequency

8

What is the purpose of singular value decomposition of a term frequency matrix?

The extraction of latent semantic dimensions

9

Do you see anything wrong in the SAS Enterprise Miner analysis diagram below?
Steakhouse 3546P --> Text Filter --> Text Parsing --> Text Topic

Text Parsing needs to be performed before Text Filter

10

What is the term that describes collections of documents with similar characteristics?

Text Clusters

11

You would like to design a business process where incoming documents containing unstructured text are routed to treatment A or treatment B, depending on their content. Treatments A and B are done in a parallel configuration as shown on the diagram below. A good text analytic choice would be to make the routing decision based on _____.
-------> A ------->
------------>
-------> B ------->

Text cluster where the document belongs

12

If you want to adjust employee salary data for employee experience, a way to do that is to:

Fit a regression model using Y=salary and X=experience and save the residuals. The residuals represent salary adjusted for experience

13

As a hiring manager, you are about to make a job offer to an applicant. Your company has branches in 3 different cities and the applicant is willing to relocate. Should you offer a different salary at different cities? Using your HR database, you collect sample salary data from the 3 cities. What kind of analysis would answer your question?

ANOVA using Y=salary and X=city. If the overall model F test is significant, different salary averages apply to different cities.

14

Again you want to compare salaries across 3 cities. You do not have access to a salary database, so you survey 300 volunteers. For privacy reasons, instead of asking their exact salary, you ask the participants to report their salary range. Then you compile a city-by-salary range cross-tabulation. The chi-square test is significant. This implies that:

The three cities have different salary distributions across the salary ranges