Week 1 Flashcards

(28 cards)

1
Q

Structured Data:

A

Data that can be stored in a structured way (like in the table above).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Unstructured Data:

A

Data not easily stored or described (i.e. text from social media)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Quantitative Data:

A

Numbers with a meaning (i.e. 3 baseballs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Categorical Data:

A

Numbers without meaning (i.e. an area code or country of origin)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Binary Data:

A

Data that takes one of two values (i.e. yes or no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unrelated Data:

A

No relationship between data points (i.e. players on different teams)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Time Series Data:

A

Same data recorded over time (i.e. an athlete’s performance over time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Scaling Data:

A

Transforming your data so that features are within a specific range (i.e. 0-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Standardizing Data:

A

Change your observations so they can be described as a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Validation:

A

Verifying that models are performing as intended

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hard Classifiers:

A

Classifies into groups perfectly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Soft Classifiers:

A

Gives as good of a separation as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SVM

A

Support vector machines are supervised machine learning models used for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SUpport Vector

A

comes from the idea of having a line that touches the edge of the shape (or ‘supports’ it) is called a support vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

TF The support vector machine automatically (machine) determines support vectors, or the points supporting the shape on parallel lines.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Goal of SVM

A

The goal is to maximize (or optimize) the space between the support vectors to minimize errors between the classes.

17
Q

Lambda in SVM

A

controls the weight, so as it grows, the margin outweighs any error, and as it becomes zero, minimizing mistakes becomes much more important. We can add a multiplier mj per error to weigh the errors, with the larger multiplier being more important than a smaller one.

18
Q

minimize error and margin equation

19
Q

What happens to our svm if we have data that varies widely in range?

A

our sv model may be thrown off if we have data that varies widely in range. Remember that SVM’s goal is to maximize the distance between the separating plane and the support vectors.If one feature is much bigger than another (i.e X1 is .3-.6 and X2 is 1000-2000), the large range will dominate the model and throw off our results.

20
Q

WHat is the most common scaling

A

between 0 and 1

21
Q

How do you scale to a normal distribution

A

you scale the data to a mean of 0 and a standard deviation of 1.

22
Q

You use scaling (or normalizing) when you’re working with data of what kind?

A

In bounded range

bartting avg
sat scores

23
Q

WHat kinds of models do you use standardization with>

A

PCA
Clustering

24
Q

How does KNN classify data?

A

Rather than using a line to separate data into classes, the KNN algorithm classifies data by looking at a data point’s “nearest neighbors.”

25
KNN is used to classify single classes? TF
F. It can be used to classify multile classes
26
Three important things to note re: KNN
1) There’s more than one way to measure distance (straight line is the most common, but there are others as well) 2) Some attributes might be more important than others in classification 3) Unimportant attributes can be removed
27
Two ways to adjust data
1) Scale down to same interval(between 0 and 1) 2) STandardization
28
WHich method to use? SCaling vs standardization
SCALING Data in bounded range Examples: Neural Netwroks, Optimization models that needbounded data STANDARDIZATION Examples: Principa;s components Analysis, Clustering