Week 2 Flashcards

(15 cards)

1
Q

Clustering is useful for

A

1) targetted marketing, customer segmentation
2) Personalized medicine
3) Locating facilities
4) Image analysis
5) Data investigation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Infinity norm

A

The largest (absolute) of a set of absolute numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data has two types of patterns

A

real effect - real relationship between attributes and response
random effect - random but looks like a real effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Flow chart for validation

A

1) Build models using traning set of data
2) choosing between models?
3) if yes, then choose best model using validation set of data and estimate quality using test set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When working with one model, what is the rule of thumb for splitting for two datasets

A

70-90% training 10-30% test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For validation, what is the rue of thumb for splitting

A

50-70% training
Split the rest evenly between validation and test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Splitting data approaches

A

Random: Randomly choose data points for training. Randomly choose points for validation and test
ROtation: Take turns selecting points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Advantage to rotation

A

Make sure each part of the data is equally separated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Disadvantage to rotation

A

Bias could be introduced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

After cross validation. We train the model again using all of the data. TF

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

kmeans algorithm

A
  1. Pick k cluster centers within range of data
  2. Assign each data point to nearest cluster center
  3. Recalculate cluster centers(centroids)
  4. Repeat until no changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

heurstic

A

fast, good but not guaranteed to find absolute best solution. Kmeans is an example of a heuristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

KMeans is example of EM algorithm TF

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Expectation Step of kmeans

A

FInd cluster centers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Maximization step of kmeans

A

assign points to clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly