Exam 2 Flashcards

Question

What tests whether there is residual autocorrelations as a set are sig diff from zero

Answer 1

stationary

Answer 2

when we don't know the predictors of the variable to be forecast

Answer 3

that the series we are observing sdtarted as white noise and was transformed by the black box process into the series

Answer 4

that the correct black box could hyave produced such a series from white noise

Answer 5

white noise- black box- observed time series

Answer 6

white noise

Answer 7

the dep variable depends on its own previous values rather than the white noise series or residuals

Answer 8

**bc a series needs to be stationary before you identify the correct model

Answer 9

differencing

Answer 10

stationary

Answer 11

the set of autocorrelation is jointly equal to zero

Answer 12

extremely high order AR and MA processes

Answer 13

deseasonalized data

Answer 14

use of explanatory variables (arima doesn't have any)

Answer 15

measurement of the very long tem movement of the data that are often though of as waves

Answer 16

measurement of the irregular movement or random variations in the series

Answer 17

constant amplitude

Answer 18

the series that remains after the seasonality and irregular components have been smoothed out by using moving averages

Answer 19

when we don't know the predictors of the variable to be forecast

Answer 20

This statement is accurate. A classification model uses its validation data to test the models accuracy or its predictive ability. Therefore, the misclassification rate on the validation data is a better indicator of a models predictive ability than the training data. The misclassification rate on the validation set is a better measure because we want to see how well the model can function on unseen data.

Answer 21

Both SAS and IBM recommend sampling as the first step since we need the training data set to build the model and validation data set to test the model’s accuracy. The risk in ignoring this step is creating bias. If a data scientist uses the same data to both build and test the model, and that model is overfit, then most likely the results will also be overfit. need to see how this model will work on data we have and in the real world.

Answer 22

Structured data: data that does have a predefined model Unstructured data: Data that does not have a predefined data model Unstructured data is a more prevalent form of data because it comes in many different forms which we are expose to daily Excel spreadsheet: structured data A thousand text files: unstructured A thousand video images: unstructured A thousand audio filed: Unstructured

Answer 23

Overfitting: When we put too many attributes (or try to account for too many patterns) in a model, including some unrelated to the target. If a data scientist overfits their data they will incorrectly explain some variation in the data that is nothing more than a chance variation. In other words, they will have mislabeled the noise in the data as part of the “true signal” If you overfit, you model the noise in the data. If you model it, then replicate it, your model will have a great fit, but a low accuracy

Answer 24

diff forecast methods diff forecasters diff sources of data

Answer 25

consistently over or underestimate any values

Answer 26

Because with data mining we have access to the information and tools that can help us do better than predicting correctly 90% of the time. So in this scenario, we could look at our lift chart and find the customers with the highest probability of accepting a personal loan, and market to them, in order to have a better chance of finding people who will accept the loan.

Answer 27

Nonrivalry: characteristic that means that one person’s use of the good to create value does not diminish the value another can extract from the data. It’s important to realize that data has this characteristic so more researchers and data scientists can use data, because every time the data set is used, it can be used to obtain different results. Every researcher can use a data set with a different purpose and get different conclusions.

Answer 28

Confusion matrix: This shows model performance. There is a confusion matrix for both the validation data and the training data. Most often, the results from the validation model are most relevant since they show how the model performed on unseen data. The validation confusion matrix shows model performance in classification on data that was not used to build the model. Gives results for the amount of correct classifications and the misclassifications. Lift chart: This is the standard for accuracy in data mining. These charts help to determine how effectively the model can reorder the data set, by placing the individuals who have the highest probability of success on top, and those with the lowest probability of success on bottom. By looking at the chart, you can determine how well your model is doing compared to a naïve model. confusion- what you misclassified vs classified correctly lift- for each % of data, how many you got right or wrong

Answer 29

- make sure the # or rows of hist data exceed the # of forecast values - consider how the data should be set up

Answer 30

slope is equal to one | intercept is equal to zero

Answer 31

forecast values above the perf forecast line

Answer 32

by combining forecasts from diff models

Answer 33

preconcieved notions of the forecaster

Answer 34

1. first consider how the data should be set up 2. regress the actual values of the variable to be forecast on the two forecast results for the historic period. 3. When there's no bias proceed to the same regression bu force the constant to be 0

Answer 35

to show that the forecast would not have bias

Answer 36

leading economic index- lead turning points in economic activity (stock prices, avg weekly manufacturing hours, ISM new order index) coincident index- are coincident with turning points in economic activity (ie personal income and industrial production) lagging index- lag turning points in economic activity (avg prime rates, avg duration of unemployment, labor cost per unit of output)

Answer 37

a test of the combine forecast model bias can be performed

Answer 38

it minimizes the total error giving you more predictive accuracy

Answer 39

constant periodicity

Answer 40

Centered moving average

Answer 41

constant amplitude

Answer 42

by the cycle factor (CMA/CMAT) | if CF>1: indicated the deseasnalized value for that period is above the long term trend of the data

Answer 43

cycle factor

Answer 44

cycle factor

Answer 45

identify long term trend, seasonal fluctuation, cyclical movements, and irregular fluctuation. Then break the series into its components by breaking the series int its component parts and then reassembling the parts to construct a forecast

Answer 46

Y=T(rend)xS(easonality)xC(yclicality)xI(irregular variations)

Answer 47

t: CMAT S: seasonal indices C: cycle factor

Answer 48

it allows us to better see the underlying pattern in the data and provides a measure of the extent of seasonality in the form of seasonal indeces

Answer 49

expansion phase

Answer 50

year that is centered on that MA

Answer 51

constant periodicity and constant amplitude and regulatity

Answer 52

centered MA

Answer 53

compare the actual value with the deseasonalized value

Answer 54

indicates a period where the value is greater than the quarterly avg for the year

Answer 55

leading index

Answer 56

lagging index

Answer 57

past pattern

Answer 58

recent observation | recent

Answer 59

- numbers are normally and ind. distributed - purely random series of numbers - it is assumed the observed time series started as white noise

Answer 60

partial autocorrelatin coefficient | autocorrelation coefficient

Answer 61

- simple models are the best | - it is possible for two or more models to be very similar in their fit of data

Answer 62

allows for greater flecibility in the choice of the correct model extracts a great deal of info from the time series encourages examination of a wide variety of models in search for an acceptable one

Answer 63

allows for greater flexibility in the choice of the correct model extracts a great deal of info from the time series encourages examination of a wide variety of models in search for an acceptable one

Answer 64

take longs of the og time series to transfer the trend in variance to a trend in the mean differencing the time series to remove a trend

Answer 65

more positive correlation

Answer 66

if neither function falls off abruptly, but both decline toward zero in some fashion, the appropriate model is an ARMA(p,q) type

Answer 67

if neither function falls off abruptly, but both decline toward zero in some fashion, the appropriate model is an ARMA(p,q) type

Answer 68

autoregressive model

Answer 69

if the remaining series is not white noise, pass it thru another black box

Answer 70

estimate paramenters of tentative model

Answer 71

whether the residual autocorrelations as a set are sig diff from zero

Answer 72

the autoregrissive is similar to the MA model except that the dep variable depends on its own previous values

Answer 73

autoregressive terms

Answer 74

nonstationary

Answer 75

creating, selecting, or transforming the data

Answer 76

not affect

Answer 77

holdout periods in standard data forecasting models

Answer 78

F, it is not needed

Answer 79

- both measure the certainty of trustworthiness associated w the patterns discovered - in data mining u simultaneously search for diff kinds of patterns in parallel - in biz forecasting search for set patterns - in biz forecasting the expectation is that the data will contain some level of variation - in data mining, patterns are not pre specified

Answer 80

autoregressive terms

Answer 81

moving average terms

Exam 2 Flashcards

(127 cards)