Sample Practice Questions for Final Exam 2 Flashcards Preview

Big Data Analystics - Final Exam 2 > Sample Practice Questions for Final Exam 2 > Flashcards

Flashcards in Sample Practice Questions for Final Exam 2 Deck (14):

In the R statement below, df is a _____

Data Frame


In R, you can reference variable X2 in data frame df as what?



The R statement c(4, 7, 9), after executed, will return what?

4 7 9


The R statement plot(countries$PCGDP, countries$PiracyRate) produces what?

A scatterplot of variable PiracyRate versus variable PCGDP


In text analytics, the Vector Space Model refers to what?

A term frequency matrix where documents are represented by term dimensions


True or False: In R, in order to produce a word cloud from a corpus of unstructured documents, first we
need to parse the corpus into words, and then create a term document matrix?



A word cloud produced by R library wordcloud displays terms with font sizes that are what?

Proportional to their frequency


What is the purpose of singular value decomposition of a term frequency matrix?

The extraction of latent semantic dimensions


Do you see anything wrong in the SAS Enterprise Miner analysis diagram below?
Steakhouse 3546P --> Text Filter --> Text Parsing --> Text Topic

Text Parsing needs to be performed before Text Filter


What is the term that describes collections of documents with similar characteristics?

Text Clusters


You would like to design a business process where incoming documents containing unstructured text are routed to treatment A or treatment B, depending on their content. Treatments A and B are done in a parallel configuration as shown on the diagram below. A good text analytic choice would be to make the routing decision based on _____.
-------> A ------->
-------> B ------->

Text cluster where the document belongs


If you want to adjust employee salary data for employee experience, a way to do that is to:

Fit a regression model using Y=salary and X=experience and save the residuals. The residuals represent salary adjusted for experience


As a hiring manager, you are about to make a job offer to an applicant. Your company has branches in 3 different cities and the applicant is willing to relocate. Should you offer a different salary at different cities? Using your HR database, you collect sample salary data from the 3 cities. What kind of analysis would answer your question?

ANOVA using Y=salary and X=city. If the overall model F test is significant, different salary averages apply to different cities.


Again you want to compare salaries across 3 cities. You do not have access to a salary database, so you survey 300 volunteers. For privacy reasons, instead of asking their exact salary, you ask the participants to report their salary range. Then you compile a city-by-salary range cross-tabulation. The chi-square test is significant. This implies that:

The three cities have different salary distributions across the salary ranges