Exam 2 Flashcards
What does the straight line on the lift chart represent
expected number of positives on any class we would predict if we used the naive model
What is a validatation set used to do?
compare models and pick the best one
Estimating a model that explains the training set data points perfectly and leaves little error but that is unlikely to be accurate in prediction is
overfitting
With most data mining techniques, why do we partition the data..
in order to judge how out model will do when we apply it to new data
What data mining technique groups objects together based upong maximizing the intraclass similarity and minimizing interclass similarity
clustering
inter- among
intra-within
What are the tools and techniques that are used in the large scale or big data arena
data mining
What new mindset is needed to begin data mining using big data
we will need to be open to finding relationships and patterns we never imagined existed in the data we are about to examine
In “the big data future has attived” by michal malone the statement is made that:
- metadata is more IM than the big data itself
- the major challenge to big data analysis will be overcome bc the fruits of big data are too valuable
- discovery of this “metadata” may prove to be the undoing of big data analysis
- that privacy issues will prevent big data analysis from advancing beyond wht weve already seen
2
What are the four categories of analytic tool available in data mining?
prediction, classification, clustering, association
What data mining toold allows us to predict a class of objects whose label is unknown to us?
prediction
Forecasting model in stats, is what in data mining?
algorithm
Data mining term “score” is known as __ in stats
forecast
What stat terminology is referred to as a record in data mining terminology?
observation
What are the 5 steps identified by SAS for the data mining process
sample, explore, modify, model, and assess
The data mining process that involved creating, selecting, or transforming data is called
modify
The data mining process step that involved data cleansing is called
explore
In The invisible digital hand, the replacement of the visible hand in competition by the digitized hand…
- is usually accompanied by fewer firms in the marketplace
- could result in less price comparison and more impulse buying
- can give rise to anticompetitive behavior
- does not give rise to the “frenemy” relationships
3
Most economic time series are integrated in what order?
one
Can’t use ARIMA with trend. Must integrate it (another name for taking 1st diffrences). Most time series are integrated in one
Which of the following models utilizes a transformed series to induce a stationary series?
- ARIMA(1,0,1)
- ARIMA(1,0,0)
- ARIMA(1,1,1)
- ARIMA(0,0,1)
3- the I has to be a 1 bc it’s transformative
Which of the following is NOT a char of a time series best represented as an ARIMA (3,0,1)
- og series is stationary
- autocorrelation function has one dominant spike
- the partial autocorrelation function has one dominant spike
- the partial autocorrelation function has 3 spikes
- none are correct
one diminant spike
Which of the following is not a first step in the ARIMA model selection process
- examine teh ACF of the raw series
- examine the PCF of teh raw series
- test the data for stationarity
- estimate an ARIMA (1,1,1) model for reference purposes
- all of the options are correct
4
What is the Q stat based on?
estimated autocorrelation function
What is the Q stat used to test?
whether a series is white noise or not
T/F the Q stat follows the chi squared distr.
T