Week 1 Flashcards by Daniel Conner

Structured Data:

Data that can be stored in a structured way (like in the table above).

How well did you know this?

Not at all

Perfectly

Unstructured Data:

Data not easily stored or described (i.e. text from social media)

How well did you know this?

Not at all

Perfectly

Quantitative Data:

Numbers with a meaning (i.e. 3 baseballs)

How well did you know this?

Not at all

Perfectly

Categorical Data:

Numbers without meaning (i.e. an area code or country of origin)

How well did you know this?

Not at all

Perfectly

Binary Data:

Data that takes one of two values (i.e. yes or no)

How well did you know this?

Not at all

Perfectly

Unrelated Data:

No relationship between data points (i.e. players on different teams)

How well did you know this?

Not at all

Perfectly

Time Series Data:

Same data recorded over time (i.e. an athlete’s performance over time)

How well did you know this?

Not at all

Perfectly

Scaling Data:

Transforming your data so that features are within a specific range (i.e. 0-1)

How well did you know this?

Not at all

Perfectly

Standardizing Data:

Change your observations so they can be described as a normal distribution

How well did you know this?

Not at all

Perfectly

Validation:

Verifying that models are performing as intended

How well did you know this?

Not at all

Perfectly

Hard Classifiers:

Classifies into groups perfectly

How well did you know this?

Not at all

Perfectly

Soft Classifiers:

Gives as good of a separation as possible

How well did you know this?

Not at all

Perfectly

SVM

Support vector machines are supervised machine learning models used for classification.

How well did you know this?

Not at all

Perfectly

SUpport Vector

comes from the idea of having a line that touches the edge of the shape (or ‘supports’ it) is called a support vector.

How well did you know this?

Not at all

Perfectly

TF The support vector machine automatically (machine) determines support vectors, or the points supporting the shape on parallel lines.

True

How well did you know this?

Not at all

Perfectly

Goal of SVM

Study These Flashcards

The goal is to maximize (or optimize) the space between the support vectors to minimize errors between the classes.

Lambda in SVM

Study These Flashcards

controls the weight, so as it grows, the margin outweighs any error, and as it becomes zero, minimizing mistakes becomes much more important. We can add a multiplier mj per error to weigh the errors, with the larger multiplier being more important than a smaller one.

minimize error and margin equation

Study These Flashcards

What happens to our svm if we have data that varies widely in range?

Study These Flashcards

our sv model may be thrown off if we have data that varies widely in range. Remember that SVM’s goal is to maximize the distance between the separating plane and the support vectors.If one feature is much bigger than another (i.e X1 is .3-.6 and X2 is 1000-2000), the large range will dominate the model and throw off our results.

WHat is the most common scaling

Study These Flashcards

between 0 and 1

How do you scale to a normal distribution

Study These Flashcards

you scale the data to a mean of 0 and a standard deviation of 1.

You use scaling (or normalizing) when you’re working with data of what kind?

Study These Flashcards

In bounded range

bartting avg
sat scores

WHat kinds of models do you use standardization with>

Study These Flashcards

PCA
Clustering

How does KNN classify data?

Study These Flashcards

Rather than using a line to separate data into classes, the KNN algorithm classifies data by looking at a data point’s “nearest neighbors.”

KNN is used to classify single classes? TF

F. It can be used to classify multile classes

Three important things to note re: KNN

1) There’s more than one way to measure distance (straight line is the most common, but there are others as well) 2) Some attributes might be more important than others in classification 3) Unimportant attributes can be removed

Two ways to adjust data

1) Scale down to same interval(between 0 and 1) 2) STandardization

WHich method to use? SCaling vs standardization

SCALING Data in bounded range Examples: Neural Netwroks, Optimization models that needbounded data STANDARDIZATION Examples: Principa;s components Analysis, Clustering

Week 1 Flashcards

(28 cards)