1 to 200 Math Flashcards

1
Q

Histogram Buckets- Bins

A

The size of the bucket bins are very important, you are better with more than less. When you have less the data can be too general and not accurate enough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Line Plot

A

Also makes it easy to stack features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inclusive internal

A

need to look this one up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Distribution Plot

A

Allows us to visualize the dispersion of data across variables most common method Histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Histogram is

A

The most common distribution plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

X axis

A

Horizontal Axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Y axis

A

Vertical Axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When do we use a scatter Plot

A

used to show the relationship between 2 features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When is a line plot Appropriate

A

When we know for sure there is a continuous relationship (linear) between 2 data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2 Types of distribution Plots

A

Box and Whisker, KDE Kernel Density Estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Categorical Plots

A

Metric per category, many variations, most common is the simple bar plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Always keep in mind the information I want to share of the story I am trying to tell

A

How does the story help analyzing that information to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bin Sizes

A

You can make smaller or larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Histogram is

A

A distribution plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

in a histogram which axis is continuous

A

x Axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hat mx+b

A

Linear equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

M=

A

How steep the line is (the slope)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

X=

A

How far it is from the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

B=

A

the value of y when x=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

M formula question

A

Rise over run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Y=

A

How far up and down the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Ojive

A

Accumulating line Plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Line plots are greater when

A

the relationship between the data points that have no in-between points like the weather or days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

yHat means

A

The equation of a straight line in the slope intercept form y hat represents the predicted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
X Axis needs to be
Continuous Data
26
Why do we use line plots?
We use them for changes over time
27
Data is
Data is collected and observable information about something
28
Discrete Data
can only take on certain values, there are no in-between numbers like Ford, Cheve,Cadillac
29
Continuous Data
data that can have in between values like we are 175 inches tall
30
Nominal Data
Nominal data is classified without a natural form or rank, cats, doge,fish
31
Ordinal Data
can be sorted it has an order like 1,2,3 hot, mild,cold. It has to make logical sense.
32
Structured Data
highly specific and is tored in a pre defined format -excel spreadsheet/ if you send the data to someone else they will be able to work with the data
33
Umstructured Data
not in any particular format example audio or text files irt does not follow a predefined format/ this involves deep learning= Dalle-e 2
34
Population
the entire data set
35
Sample
Sample is a random sample of the data
36
Mean
Mean is the most common measure of central tendency
37
mean formula means
Sum of all data points/number of data points
38
Average is the
Arithmetic mean
39
meu is
Population
40
x bar
mean of sample size
41
Weighted mean
A weighted mean is a kind of average. Instead of each data point contributing equally to the final mean, some data points contribute more “weight” than others. If all the weights are equal, then the weighted mean equals the arithmetic mean (the regular “average” you’re used to). Weighted means are very common in statistics, especially when studying populations.
42
Weighted mean example
20 over 8.4- 7 over 6.1 would read 20*8.4 and 6*1 divide by 20 +7
43
Truncated Mean
we use this to handle outliers we would ignore the outlier and take the other side off the data set ex 9 50, 52,78 we would take off 9 and 78 and divide by number of values must note that we took x% off the data set
44
Mode
The value most often
45
median odd
it is the number in the middle
46
Median
add the two central numbers /2 that will be the median ( take the arithmetic mean
47
use discrete
mean, median, mode
48
Nominal Data
maybe mean, no median use on mode
49
Ordinal Data
mean maybe, median, mode
50
Numeric
mean, median,mode
51
Non Numeric
no mean, median, mode
52
Continious
median, mode,mode
53
no numeric reason no memean
have to divide by 2, it is is a letter we cannot find the sum
54
Continious
Height of people
55
Discrete Data
Number of Children in a family
56
Non Numeric
Cats, dogs, birds,fish can't add it up
57
Nominal Data
Has no specific order, cannot be sorted
58
Ordinal Data
Data that can be sorted 1,2,3 hot, mild, cold
59
To calculate a mean
We need numeric data
60
these caterories can overlap1
nominal, numeric, non numeric
61
These categories can overlap 2
ordinal, numeric,non numeric
62
Working with Household data
we would use median because of extreme values
63
step one to figure out central tendency
is it even possible use that central tendency
64
step two to figure out central tendency
if we can measue what makes the most sense
65
Measurement of Dispersion
are measurements of spread
66
measurement of dispersion
it measures how the data is spread across the mean
67
Mean is
the number that is as close as possible to all of the data sets ( balancing Point)
68
effects of measurements of spread
we get 2 things. The standard deviation and spread, they are similar to each other
69
varience number meaning
the samller the value we find the less the spread
70
reason for squaring
if we get a negative value squaring makes it positive, squaring it emphasises the larger deviations
71
Standard deviation is
the square root of varience
72
Varience is
Not usually used we use standard deviation instead
73
Varience formula uses
N-1 to correct the bias we generate from the mean
74
Then for standard deviation
you take the square root
75
Quartiles are
Related to thedata set
76
when talking about Quartiles we are talking about
the first, second, third set of data
77
The 1st quartile
Will be the first half of the median
78
1st quartile is
the bottom or lower 25%
79
3rd quartile is
the upper 75% of data
80
Second Quartile
Is the median or 50th percentile
81
First quartile data will be
bewlow the 25%
82
third quartile
Below 75%
83
define therange of a dtat set in a quartile
max-min values =range
84
q1 and q3
will give us an idea of how close the data set is to the mean
85
How do we calculate the first and third Quartile
first see if it has an even or odd number
86
calculate quartile odd
use the take away method like the median
87
Calculate even quartile
Use the take away method and divide the numbers /2
88
Quartile Spread
the difference between the first and third Quartile is the measure of its spread
89
Inter quartile range
q3-q1-71-68 =3 IQR =3
90
Histogram Buckets- Bins
The size of the bucket bins are very important, you are better with more than less. When you have less the data can be too general and not accurate enough
91
Line Plot
Also makes it easy to stack features
92
Inclusive internal
need to look this one up
93
Distribution Plot
Allows us to visualize the dispersion of data across variables most common method Histogram
94
Histogram is
The most common distribution plot
95
X axis
Horizontal Axis
96
Y axis
Vertical Axis
97
When do we use a scatter Plot
used to show the relationship between 2 features
98
When is a line plot Appropriate
When we know for sure there is a continuous relationship (linear) between 2 data points
99
2 Types of distribution Plots
Box and Whisker, KDE Kernel Density Estimation
100
Categorical Plots
Metric per category, many variations, most common is the simple bar plot
101
Always keep in mind the information I want to share of the story I am trying to tell
How does the story help analyzing that information to another
102
Bin Sizes
You can make smaller or larger
103
Histogram is
A distribution plot
104
in a histogram which axis is continuous
x Axis
105
Hat mx+b
Linear equation
106
M=
How steep the line is (the slope)
107
X=
How far it is from the line
108
B=
the value of y when x=0
109
M formula question
Rise over run
110
Y=
How far up and down the line
111
Ojive
Accumulating line Plot
112
Line plots are greater when
the relationship between the data points that have no in-between points like the weather or days
113
yHat means
The equation of a straight line in the slope intercept form y hat represents the predicted value
114
X Axis needs to be
Continuous Data
115
Why do we use line plots?
We use them for changes over time
116
Data is
Data is collected and observable information about something
117
Discrete Data
can only take on certain values, there are no in-between numbers like Ford, Cheve,Cadillac
118
Continuous Data
data that can have in between values like we are 175 inches tall
119
Nominal Data
Nominal data is classified without a natural form or rank, cats, doge,fish
120
Ordinal Data
can be sorted it has an order like 1,2,3 hot, mild,cold. It has to make logical sense.
121
Structured Data
highly specific and is tored in a pre defined format -excel spreadsheet/ if you send the data to someone else they will be able to work with the data
122
Umstructured Data
not in any particular format example audio or text files irt does not follow a predefined format/ this involves deep learning= Dalle-e 2
123
Population
the entire data set
124
Sample
Sample is a random sample of the data
125
Mean
Mean is the most common measure of central tendency
126
mean formula means
Sum of all data points/number of data points
127
Average is the
Arithmetic mean
128
meu is
Population
129
x bar
mean of sample size
130
Weighted mean
A weighted mean is a kind of average. Instead of each data point contributing equally to the final mean, some data points contribute more “weight” than others. If all the weights are equal, then the weighted mean equals the arithmetic mean (the regular “average” you’re used to). Weighted means are very common in statistics, especially when studying populations.
131
Weighted mean example
20 over 8.4- 7 over 6.1 would read 20*8.4 and 6*1 divide by 20 +7
132
Truncated Mean
we use this to handle outliers we would ignore the outlier and take the other side off the data set ex 9 50, 52,78 we would take off 9 and 78 and divide by number of values must note that we took x% off the data set
133
Mode
The value most often
134
median odd
it is the number in the middle
135
Median
add the two central numbers /2 that will be the median ( take the arithmetic mean
136
use discrete
mean, median, mode
137
Nominal Data
maybe mean, no median use on mode
138
Ordinal Data
mean maybe, median, mode
139
Numeric
mean, median,mode
140
Non Numeric
no mean, median, mode
141
Continious
median, mode,mode
142
no numeric reason no memean
have to divide by 2, it is is a letter we cannot find the sum
143
Continious
Height of people
144
Discrete Data
Number of Children in a family
145
Non Numeric
Cats, dogs, birds,fish can't add it up
146
Nominal Data
Has no specific order, cannot be sorted
147
Ordinal Data
Data that can be sorted 1,2,3 hot, mild, cold
148
To calculate a mean
We need numeric data
149
these caterories can overlap1
nominal, numeric, non numeric
150
These categories can overlap 2
ordinal, numeric,non numeric
151
Working with Household data
we would use median because of extreme values
152
step one to figure out central tendency
is it even possible use that central tendency
153
step two to figure out central tendency
if we can measue what makes the most sense
154
Measurement of Dispersion
are measurements of spread
155
measurement of dispersion
it measures how the data is spread across the mean
156
Mean is
the number that is as close as possible to all of the data sets ( balancing Point)
157
effects of measurements of spread
we get 2 things. The standard deviation and spread, they are similar to each other
158
varience number meaning
the samller the value we find the less the spread
159
reason for squaring
if we get a negative value squaring makes it positive, squaring it emphasises the larger deviations
160
Standard deviation is
the square root of varience
161
Varience is
Not usually used we use standard deviation instead
162
Varience formula uses
N-1 to correct the bias we generate from the mean
163
Then for standard deviation
you take the square root
164
Quartiles are
Related to thedata set
165
when talking about Quartiles we are talking about
the first, second, third set of data
166
The 1st quartile
Will be the first half of the median
167
1st quartile is
the bottom or lower 25%
168
3rd quartile is
the upper 75% of data
169
Second Quartile
Is the median or 50th percentile
170
First quartile data will be
bewlow the 25%
171
third quartile
Below 75%
172
define therange of a dtat set in a quartile
max-min values =range
173
q1 and q3
will give us an idea of how close the data set is to the mean
174
How do we calculate the first and third Quartile
first see if it has an even or odd number
175
calculate quartile odd
use the take away method like the median
176
Calculate even quartile
Use the take away method and divide the numbers /2
177
Quartile Spread
the difference between the first and third Quartile is the measure of its spread
178
Inter quartile range even
q3-q1-71-68 =3 IQR =3
179
inter quartile Odd
we would exclude the median and look at each half, these two numbers would then be even, take the two middle numbers/2 give you the mean
180
Quartile function in excel
need to look up
181
Quartiles are
Common to look for outliers in the data set
182
It is common to use this formula for quartile outliers
Calculate the median for each side Q1- (Q-1.5) Q3 (Q3 +4.5)
183
Line Plots
Great when a continiour relationship exists
184
Line Plots Use
We use line Plots for Changes Over Time or a connection between data points
185
if there is no in between points than it is
discrete points
186
Ojive
add value after value
187
Ojive
Does not make sense for Temperature Data
188
Ojive you can
Put saveral on the same plot
189
Distribution Plots are
Histograms
190
Histograms
X axis need to be continoius
191
Bar Charts are not
Continious data
192
Bar Charts have
have spaces between them
193
Bucket
Represent stoage of numbers for x to x
194
Bin or class
Have no gaps so it looks like a continous set of data
195
Each bin represents the
Number of occurrence that can fit in a bin, class, or Interval
196
height of a br chart
is the number of occurrence
197
Histogram
Helps organize and clean up the data
198
the bin or class is always
the same the size of the bin is very important
199
Natural Numbers
1,2,3,4,5
200
Whole numbers
Add zero