Week 2 Flashcards

Question 1

Q

Clustering is useful for

Answer

A

1) targetted marketing, customer segmentation
2) Personalized medicine
3) Locating facilities
4) Image analysis
5) Data investigation

Question 2

Q

Infinity norm

Answer

A

The largest (absolute) of a set of absolute numbers

Question 3

Q

Data has two types of patterns

Answer

A

real effect - real relationship between attributes and response
random effect - random but looks like a real effect

Question 4

Q

Flow chart for validation

Answer

A

1) Build models using traning set of data
2) choosing between models?
3) if yes, then choose best model using validation set of data and estimate quality using test set of data

Question 5

Q

When working with one model, what is the rule of thumb for splitting for two datasets

Answer

A

70-90% training 10-30% test

Question 6

Q

For validation, what is the rue of thumb for splitting

Answer

A

50-70% training
Split the rest evenly between validation and test

Question 7

Q

Splitting data approaches

Answer

A

Random: Randomly choose data points for training. Randomly choose points for validation and test
ROtation: Take turns selecting points

Question 8

Q

Advantage to rotation

Answer

A

Make sure each part of the data is equally separated

Question 9

Q

Disadvantage to rotation

Answer

A

Bias could be introduced.

Question 10

Q

After cross validation. We train the model again using all of the data. TF

Question 11

Q

kmeans algorithm

Answer

A

Pick k cluster centers within range of data
Assign each data point to nearest cluster center
Recalculate cluster centers(centroids)
Repeat until no changes

Question 12

Q

heurstic

Answer

A

fast, good but not guaranteed to find absolute best solution. Kmeans is an example of a heuristic

Question 13

Q

KMeans is example of EM algorithm TF

Question 14

Q

Expectation Step of kmeans

Answer

A

FInd cluster centers

Question 15

Q

Maximization step of kmeans

Answer

A

assign points to clusters

Week 2 Flashcards

(15 cards)