Chapter 2 Quiz Flashcards by Elizabeth Burnham

methods are trained on a set of training data and then their performance is evaluated on a separate set of validation data

data partitioning

How well did you know this?

Not at all

Perfectly

tasks of classification and prediction as well as pattern discovery

predictive analytics

How well did you know this?

Not at all

Perfectly

trying to predict value of categorical variable

classification

How well did you know this?

Not at all

Perfectly

trying to predict value of numerical variable

prediction

How well did you know this?

Not at all

Perfectly

finding general associations patterns between items in large databases through rules general to an entire population

association rules/affinity analysis

How well did you know this?

Not at all

Perfectly

method that uses individual users’ preferences and tastes given their historic purchases or measurable behavior indicative of preference

collaborative filtering

How well did you know this?

Not at all

Perfectly

consolidating a large number of records into a smaller set

data reduction

How well did you know this?

Not at all

Perfectly

methods for reducing the number of cases

clustering

How well did you know this?

Not at all

Perfectly

reduction of the number of variables

dimension reduction

How well did you know this?

Not at all

Perfectly

exploration by creating charts and dashboards

data visualization/visual analytics

How well did you know this?

Not at all

Perfectly

used in classification and prediction, must have data available in which the value of the outcome of interest is known

supervised learning algorithms

How well did you know this?

Not at all

Perfectly

data from which classification or prediction algorithm learns

training data

How well did you know this?

Not at all

Perfectly

sample of data where the outcome is known used for comparison between models

validation data

How well did you know this?

Not at all

Perfectly

sample of data where the outcome is known used to predict how well the model will do

test data

How well did you know this?

Not at all

Perfectly

there is no outcome variable to predict or classify

unsupervised learning algorithm

How well did you know this?

Not at all

Perfectly

steps in machine learning

Study These Flashcards

understand project
obtain data
preprocess data
reduce dimensions (if necessary)
determine ML task
partition data
choose ML technique
perform task
interpret results
deploy model

SEMMA

Study These Flashcards

sample, explore, modify, model, assess

character, integer, categorical

Study These Flashcards

types of variables

unordered categorical variables

Study These Flashcards

nominal variables

ordered categorical variables

Study These Flashcards

ordinal variables

categorical variables decomposed into a series of binary variables

Study These Flashcards

dummy variables

creating different binary dummy variables for more than one category

Study These Flashcards

one-hot encoding

values that lie far away from the bulk of the data

Study These Flashcards

outliers

knowledge of the particular application being considered

Study These Flashcards

domain knowledge

how to standardize a value

subtract mean from each value and divide by standard deviation

how to normalize a value to [0,1]

subtract minimum value and divide by range

creating a complex model that is too similar to the training data that it cannot be applied to other data

overfitting

adds up squared errors so when an error is positive or negative it contributes the same

sum of squared errors

average of the squared errors

mean squared error

square root of the MSE to give idea of the typical error in the same scale used for the outcome variable

root mean squared error

average of the absolute values of the errors

mean absolute deviation

what do SSE, MSE, RMSE, and MAD measure?

prediction error

Chapter 2 Quiz Flashcards

(32 cards)