Data Management All Units Flashcards

(132 cards)

1
Q

100 baby weights studied; 1 baby was 7lbs!

A

variable; baby weights

data; 7lbs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

I am 24 years old, Canadian, and size petit.

A

quantitativenumerical; 24 years old
qualitativenon-numerical; Canadian
categorical; petit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

I have 1 dog, he is 16kg

A

discrete; 1 dog

continuous; 16 kg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

I conducted a research paper, Judy read it and published it on her blog. I published it at the university

A

primary data; I conducted a research paper
secondary data; Judy read it
secondary source; published on her blog
primary source; published it at the university

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

I want to know how many citizens have allergies. Now I want to know how many since a factory was made.

A

1 variable; allergies

> 1 variable; allergies since a factory was made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A swim race to determine the best is performed three times during the day. There are three timers.

A

inherent variability; three times during the day

measurement variability; three timers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A survey asked university students how they felt about tuition increase, for a paper regarding the general public.

A

sample; university students
population; general public
non-representative sample; university students to represent general public

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Picking sixty fish from five spots at the lake, not putting them back in to determine weights.

A

replication; picking 60
randomization; 5 spots
control; not putting them back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Names of family members are placed in a box after picked

A

simple random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Names of family on list, sample size divided by the total population = ‘k’ value, every kth member is picked

A

systemic random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Names of family members are divided into groups based on similarities, then placed in boxes, mixed, replaced if chosen.

A

stratified random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Suburbs within a city, placed in a box and mixed, replaced if picked, all picked are surveyed.

A

clusters; suburbs within a city

cluster random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

suburbs within a city, placed in a box, and mixed, replaced if picked, all picked are placed in new box and mixed, replaced if picked, final picked are chosen.

A

multi-stage random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

stand at convenience and ask first 40 people

A

convenience random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

survey posted on door of convenience store

A

voluntary random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

in own words

A

open question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

choose from alternatives

A

closed question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

circle 1

A

information question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

rate according to scale

A

rating question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

rank alternatives

A

ranking question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

choose any number of alternatives

A

checklist question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

sample does represent population

A

sampling bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

not all questions are answered

A

non-response bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

disproportionally polled

A

household bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
misleading question
response bias
26
Neighbours are asked the number of plants they own, the responses are sorted into a list, the list is divided into 10 groups, the groups are graphed (number in group=y*frequency*, group numbers=x*interval*)
frequency table; sorted into a list intervals; 10 groups histogram; graphed
27
qualitative graph
histogram
28
quantitative graph
bar chart
29
midpoints of histogram or barchart connected into a line only
frequency polygon
30
interval / total number of data points
relative percent frequency
31
90 degrees / 360 degrees = 0.25 = 25% of the circle graph( ? )
pie graph
32
1, *2, 2*, 3, 4, 5, 8
* mode = 2
33
1, 2, 2, *3*, 4, 5, 8
* median = 3
34
(1, 2, 2, 3, 4, 5, 8) / 7 = 3.5* = x̄
* mean = 3.5
35
mean, median, mode
central tendency
36
1, 2, 2, 3, 4, 5, 8 -> 8-1 = 7*
range
37
1, 2, 2, 3, 4, 5, 8 | |1-----|2*--|3*-----------|5*--|8
``` 2* = Q1 <25% below median, >75% above median 3* = Q2 (median) 5* = Q3 >25% above median, <75% below median ```
38
1, 2, 2, 3, 4, 5, 8 | 20th percentile = 7 x 0.2 = 1.4 number in 2nd place = 20th percentile
2 = 20th percentile
39
σ^2
variance
40
(1, 2, 2, 3, 4, 5, 8) / 7 = 3.5 σ^2 = [(1-3.5)^2 + (2-3.5)^2 + (2-3.5)^2 + (3-3.5)^2 + (4-3.5)^2 + (5-3.5)^2 + (8-3.5)^2 ] / 7 = 4.79
*4.79 = variance
41
(1, 2, 2, 3, 4, 5, 8) / 7 = 3.5 σ^2 = [(1-3.5)^2 + (2-3.5)^2 + (2-3.5)^2 + (3-3.5)^2 + (4-3.5)^2 + (5-3.5)^2 + (8-3.5)^2 ] / 7 = 4.79 σ = square root (4.79) = 2.19*
*2.19 = standard deviation
42
Age | Red | Yellow | Blue | Total | ----------------------------------------------- 4-7 | 14 | 2 | 4 | 20 | ----------------------------------------------- 8-12 | 10 | 4 | 6 | 20 | ----------------------------------------------- 13-18| 5 | 10 | 5 | 20 | ----------------------------------------------- Total | 29 | 16 | 15 | 60 |
contingency table* typically categorical data
43
A line that goes through as much data as possible on a graph
line of best fit* regression line
44
``` y= dependent / response variable x= independent / explanatory variable ```
scatter plot
45
dots tend to increase left -> right and upwards looking like an arrow
*correlation coefficient* r = 1 positive slope
46
dots tend to increase left -> right and upwards looking like a stretched oval
*correlation coefficient* r = 0.8 (r > 0 [max = 1], increase in one increases the other)
47
dots tend to decrease left -> right and downwards looking like an arrow
*correlation coefficient* r = -1 negative slope
48
dots tend to decrease left -> right and downwards looking like a sphere
*correlation coefficient* r = -0.4 (r < 0 [max = -1], increase in one decreases the other)
49
dots look like a w
*correlation coefficient* r = 0 no linear correlation, change in one, does not change the other
50
dots look like a circle
*correlation coefficient* r = 0 no linear correlation, change in one, does not change the other
51
dots look like a horizontal line
*correlation coefficient* r = 0 no linear correlation, change in one, does not change the other
52
dots look like a circle outline
*correlation coefficient* r = 0 no linear correlation, change in one, does not change the other
53
x | y | xy | x^2 | y^2 | ----------------------------------------------- 0 | 3 | 0 | 0 | 9 | ----------------------------------------------- 3 | 5 | 15 | 9 | 25 | ----------------------------------------------- 6 | 10 | 60 | 36 | 100 | ----------------------------------------------- 8 | 11 | 88 | 64 | 121 | ----------------------------------------------- 11 | 17 | 187 | 121 | 289 | ----------------------------------------------- 12 | 19 | 228 | 144 | 361 | y=mx + b m = [n(Σxy) - (Σx)(Σy)] / n(Σx^2) - (Σx)^2 = [6(578) - (40)(65)] / [6(374) - (40)^2] = 1.35 b = (Σy)/n - [m (Σx/n)] = 65/6 - 1.35 (40/6) = 1.85 y=1.35x+1.85
regression line (line of best fit) equation
54
coefficient of determination
how much y varies based on x | r^2 = [n(Σxy)-(Σx)(Σy)]^2 / [n(Σx^2)-(Σx)^2][n(Σy^2)-(Σy)^2]
55
correlation coefficient
rootr^2 | aka square the coefficient of determination
56
residual point
point that is not the same as the predicted value substitute residual x-value into regression equation to get expected y value, subtract from actual y value ay - py = res
57
outlier point
point that is a large residual point, away from line of best fit
58
causation
1 variable changes the other
59
percentage change
(new value - old value)/old value x 100%
60
polling bias
leading questions, sample problems
61
small sample bias
extreme data values in sample
62
hidden sample patterns
patterns (seasonal purchases, bus use during school season, etc.)
63
scale bias
when the x or y axis is made longer to make the overall graph appear larger or smaller in it's increase/decrease
64
bias in starting points of the axis
when the y axis starts at a value other than 0 to illustrate better growth or decline
65
data mining
using statistical analysis on large data sets to uncover hidden patterns
66
combinatorics
how many ways can I get to work (the study of how many ways through combination and permutations)
67
tree diagram
home / \ main (1) side (2) / \ / \ bus (1) bike (2) bus bike 2 x 2 = 4 possible ways*multiplicative counting principle
68
permutation
pick 3/10 girls for relay team | a = 1 ≠ a = 2
69
combination
pick 3/10 girls for track team | a=1=a=2
70
ordered pair
[a, b] ≠ [b,a]
71
ordered triple
[a,b,c] ≠ [c, a, b]
72
ordered N-tuple
[a, b,c,d, e] ≠ [a, c, b, d, e]
73
factorial notation
6! = 6 x 5 x 4 x 3 x 2 x 1 = 720
74
factorial notation for permutations
``` 10_P_3 = P(10, 3) = 10!/(10-3)! = 10! / 7! = 10 x 9 x 8 = 720 nPr = P(n,r) P(n,0) = 1 always regardless of n ```
75
factorial notation for combinations
``` 10_C_3 = C(10, 3) = 10! / (10-3)! x 3! = 10! / 7! x 3! = 120 nCr = C(n,r) = (n/r) ```
76
permutation with repetition
1234567 rearrange success = 7! / 3!x2!x1!x1! = 420 1233411 P(7,4) = 7!/(7-4)! = 7!/3! = 420
77
mutually exclusive
choose 2/4 boys or 2/5 girls C(4,2) = 4!/(4-2)!x2! = 6 C(5,2) = 5!/(5-2)!x2! = 10 6+10 = 16 ways* additive counting principle
78
direct approach
in a group of 4 boys and 5 girls, choose group of 4 with > or = 2 girls C(5,2) x C(4,2) = P(5,2)/2! x P(4,2)/2! = 10 x 6 C(5,3) x C(4,1) = P(5,3)/3! x P(4,1)/1! = 10 x 4 C(5,4) x C(4,0) = P(5,4)/4! x P(4,0)/0! = 5 x 1 =60 + 40 + 5 = 105 ways
79
indirect approach
C(9,4) - [C(5,0) x C(4,4)]- [C(5,1) x C(4,3)] = P(9,4)/4! - [P(5,0)/0! x P(4,4)/4!] - [P(5,1)/1! x P(4,3)/3!] = 126 - (1 x 1) - (5x4) = 105 ways
80
combination with all possible sizes
``` How many communities can 9 people form? with 1 = C(9,1) = 9 with 2 = C(9,2) = 36 with 3 = C(9,3) = 84 with 4 = C(9,4) = 126 with 5 = C(9,5) = 126 with 6 = C(9,6) = 84 with 7 = C(9,7) = 36 with 8 = C(9,8) = 9 ``` 9+36+84+126+126+84+36+9+1(with 9 people) = 511 + 1(no communities) = 512 2^9 = 512
81
venn diagram
A + B = A∩B intersection A & B = A∪B union S (the box around the venn diagram) universal set
82
inclusion-exclusion principle
``` n(A∪B) = n(A) + n(B) - n(A∩B) n(A∪B∪C) = n(A) + n(B) + n(C) - n(A∩B) - n(A∩C) - n(B∩C) + n(A∩B∩C) ```
83
pascal's triangle
row 0 *1 row 1 *1 1 row 2 *1 2 1 row 3 *1 3 3 1 row 4 *1 4 6 4 1 row 5 *1 5 10 10 5 1 row 6 *1 6 15 20 15 6 1 *AKA position '0'
84
pascal's identity
t_n,r + t_n, r+1 = t_n+1, r+1
85
pascal's triangle in relation to combinations
``` t_n,r = C(n,r) = (r/n) t_6,3 = C(6,3) = 6! / (6-3)! x 3! = 20 ```
86
sum of row
row 0 1 sum 2^0 = 1 row 1 1 1 sum 2^1 = 2 row 2 1 2 1 sum 2^2 = 4 row 3 1 3 3 1 sum 2^3 = 8 row 4 1 4 6 4 1 sum 2^4 = 16 row 5 1 5 10 10 5 1 sum 2^5 = 32 sum of row = #n = 2^n
87
pascal's triangle with routes
``` |1 ___5___15___35___70___B126| |1___4___10___20___35_____56| |1___3___6____10___15______21| |1___2___3____4____5_______6| |A___1___2____3____4_______5| ``` t_n,r = C(n,r) = (r/n) t_9,5 or t_9,4 C(9,5) or C(9,4) 9!/(9-5)!x5! or 9!/(9-4)!x4! =126 =126
88
pascal's triangle with restrictions
To spell 'EUCLID' E 1 -> = '0' U U 2 2^n -> 2^5 C C C 3 =32 possible ways L L L L 4 I I I I I 5 D D D D D D 6
89
trial
I tossed a coin to decide on supper (tossing of the coin is the trial)
90
possible outcome
heads = fish, tails = tofu | aka element* an item contained in a set or sample space
91
sample space compound experiment discrete sample space
*I watched to see how many girls came with a girl or a guy
92
continuous sample space
*I wanted to see ~how many~(event~) female puppies weighed more than maleM), (M>F)}*
93
theoretical probability
p(A) = n(A) / s(A) n(A) [success] s(A) [total possible]
94
grid
``` |1 ___2___3___4___5| |1| |2|______10___20_____56| |3|___3___6____10___15______21| |4|___2___3____4____5_______6| |5___1___2____3____4_______5| ```
95
odds
``` There's a 70% chance of rain, odds are 7:3 it will rain, and 3:7 it won't. p(A) = it will p'(A) = it won't A = p(A) + p'(A) 1-70% = p'(A) ```
96
cardinality
the number of possible outcomes in a probability experiment
97
mutually exclusive event
I need to decide on going to an event at the library, or staying home to do something else. *see disjoint sets
98
disjoint sets
p(A or B) = p(A) + p(B) | p(A∪B) = p(A) + p(B)
99
non-mutually exclusive event
I want to pick a playlist for a road trip; I like jazz and rock, my Dad likes heavy metal and rock. p(A or B) = p(A) + p(B) p(A∪B) = p(A) + p(B) - p(A∩B)
100
independent events
I want to know the probability of my dog having to pee in 4 hours, and a salad being available at work. p(A∩B) = p(A) x p(B) p(A∩B∩C) = p(A) x p(B) x p(C) I want to know the probability of my dog not having to pee in 4 hours and a salad being available at work. P(A') = 1 - p(A) P(A'∩B) = p'(A) x p(B) I want to know the probability of my dog not having to pee in 4 hours and a salad not being available at work. P(A') = 1 - p(A) P(A'∩B') = p'(A) x p'(B)
101
dependent event
I went to a cafe but the hostess was rude, so I don't think I'll go back.
102
conditional probability of B occuring, given A has occured
P(B|A) P(A∩B) = p(A) x p(B|A) I want to pick my name 2x in a row from a draw of 15, without putting it back (My name's only in twice) A = 2/15 B = 1/14 P(A∩B) = 2/15 x 1/14 = 2/210 = 1/105 If the first name isn't my name... p'(A) = 1 - 2/15 = 13/15 p(A'∩B) = p'(A) x p(B|A') = 13/15 x 1/7* = 13/105 * since name wasn't drawn, of 14 names left, 2 are mine 2/14
103
probability distribution diagram
graph that shows probability as y axis, outcomes on x axis | i.e. heads 50% or tails 50% for coin toss* discrete random variables
104
probability distribution diagram
graph that shows probability as y axis, outcomes on x axis | i.e. shoe size* continuous random variables
105
probability of possible outcome
``` X | p(x) 1 | 1/6 u.p.d.* 2 | 1/6 3 | 1/6 4 | 1/6 5 | 1/6 6 | 1/6 X = random variable p(x) = probability of possible outcome u.p.d.* uniform probability distribution p(x) = 1/n ```
106
weighted mean
5 students received 0, 3 = 50, 1 = 75 and 1 = 100 x̄_w = (5x0 + 3x50 + 1x75 + 1x100)/10 = 32.5* *aka expected value
107
binomial probability distribution
I will go to knitting club, or I won't. - The sum of all probabilities in an experiment = 1 - I want to know the probability of a weighted coin tossing 2 heads in a row/3 trials. 2/3 x 2/3 x 1/3 vs. -I want to know the probability of a tail being tossed at all in 3 tosses. 2/3 x 2/3 x 1/3 = 4/27 x 3 -> 1/3 x 2/3 x 2/3 or 2/3 x 1/3 x 2/3 or 2/3 x 2/3 x 1/3
108
direct method
``` p(x) = C(n,x) (p^x)(q^n-x) x = number of success n = independent trials q = probability of failure on each trial p = probability of success on each trial ``` p(1) = C(3,1)(1/3)^1(2/3)^2 = 3!/(3-1)!x1! x (1/3)^1 x (2/3)^2 = 4/9 = 44.4%
109
indirect method
I order 25 cookies, with a 1% estimate of a broken cookie. What is the probability of at least 5 being broken? p=0.01 q=1-0.01 = 0.99 p(x>/=5) = 1 - p(x<5) = 1 - p(x=0) - p(x=1) - p(x=2) - p(x=3) - p(x=4) p(x>/=5) = 1 -C(25,0)(0.01)^0(0.99)^25 - C(25,1)(0.01)^1(0.99)^24 - C(25,2)(0.01)^2(0.99)^23 - C(25,3)(0.01)^3(0.99)^22 - C(25,4)(0.01)^4(0.99)^21 p(x>/=5) = 1 - 0.777821359 - 0.196419535 - 0.023808429 - 0.00184375 - 0.00010243 = 0.00004497 = 0.0004%
110
expected value
E(X) = n x p expected value = number of trials x probabilities of success or E(X) = n x a/N
111
hypergeometric probability
``` p(x) = [C(a,x) C(N-a, n-x)]/C(N,n) a = number of possible successes x= actual number of success in experiment N = population in experiment n = number of objects being sampled ```
112
continuous random variable
I want to know the temperature
113
uniform distribution
p(x=25) = 1/infinity =0 (because 25.001) rectangle graph
114
unimodal distribution
normal distribution aka bell curve | mean = mode = median
115
bimodal distribution
Like a unimodal distribution on both sides (2 humps)
116
unimodal-negatively skewed
mode > median > mean
117
unimodal-positively skewed
mode < median < mean
118
measures of central tendency
mean, median and mode
119
measures of spread
range, percentile, standard deviation
120
mesokurtic distribution
``` z = (x - μ)/σ z = number of standard deviations variable is away from mean x = variable μ = mean σ = standard deviation ```
121
platykurtic distribution
scores are spread out, top looks flat
122
leptokurtic distribution
data is close to mode, top is pulled like a stretched pile
123
area under normal bell curve
area = 1, sum of all possibilities
124
z score equaling an exact number
probability = 0% (because 0.33)
125
+ z score
greater than mean μ p(z>0.67) = p(zz>0) = p(0>z>0.4) p(z
126
- z score
less than mean μ p(z>-0.67) = p(z<0.67) with normal bell curve, p(-0.4y) = 0.5 - y (equivalent from table)
127
continuity correction
modification of discrete data in order to use with z-score table 1/2 below and above given value (try to find 36 -> 35.5 and 36.5) z = (36.5 - 35)/4 = 0.38 z = (35.5 - 35)/4 = 0.12 p(35.5 < x < 36.5) = p(0.12 < z < 0.38) 36.5 - 35.5 -> 0.38 - 0.12 -> 0.1480 - 0.0478 (table numbers) = 0.1002 = 10.2%
128
binomial distribution probability
p(X) = C(n,x) (p^x)(q^n-x)
129
expected value
E(x) = np aka mean of data aka μ = np
130
standard deviation equation
``` σ = root(npq) n = number of possible trials p = success q = fail ``` np and nq >/= 5 for normal approximation of binomial distribution
131
time series graph
variable (i.e. population) y-axis | timeline x axis
132
percentile
value turns into z-score which tells the probability aka the percentile