Week 1 Continued Flashcards

1
Q

Cleaning data means removing __________, dealing with _______ data, and resolving _________ data

A

duplicates, dealing with missing data, and resolving incomplete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name the 5 ways to handle missing data

A
  1. Delete the row
  2. Replace it with the Mean/Median/Mode
  3. Create a new Category for missing values (I.E. ‘Unknown’)
  4. Predict the missing values
  5. Use an algorithm to produce a estimated result (kNN or Random forest work)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name the 3 categories of data

A

Structured, Unstructured, Semi-structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain structured data

A

Every element shares the same field. Ex: DBs, objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain unstructured data

A

No common structure. News articles, websites, videos, audio, photographs.

They’re all the same thing but there’s no agreed upon format or rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain semi-structured

A

Some structure, but it’s not common.

Ex: XML, JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Categorical and numerical are ____ ____s

A

data types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal and ordinal are ___________ data types

A

Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

discrete and continuous are _________ data types

A

numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Example of nominal.

A

Male or female

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Example of ordinal.

A

Strongly agree, agree, neutral, disagree, strongly disagree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

T/F - Ordinal answers must have a gradual order

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain discrete

A

Values must be distinct and separated, cannot be measured

Ex. # of students in class, # of tickets sold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain continuous

A

Measured, cannot be counted

Ex. Height, salary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain Sample vs Population

A

Sample contains a subset of the population

Population always contains all members of a given group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you get the variance?

A

Standard deviation^2

17
Q

Stopped at distributions

A
18
Q

Name this: When the probability of an event is equal (Heads or tails)

A

Uniform

19
Q

Name this: Values are centered on a mid-point but but decrease as you get farther from the mean (I.E. grades)

A

Gaussian

20
Q

What is it called if μ = 0 and 𝜎^2 = 1

A

Standard Normal Distribution

21
Q

Example of rank statistics

A

95th percentile. It’s where a number compares to the rest of the data.

22
Q

Covariance computes the strength and direction of ___ ____ of _______. Do they get larger together? Do they have an inverse relationship? No relationship?

A

two sets of values.

23
Q

Name this: A number between 1 and -1, this tells you the strength of a relationship

A

Correlation