Stats Flashcards

(91 cards)

1
Q

Most common observation study?

A

Surveys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are surveys? (Observational study)

A

Questionnaires presented to individuals, selected from a POPULATION OF INTEREST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the role of surveys (what they can and can’t do)?

A
  • Can only report relationships between variables

- Cannot claim CAUSE and EFFECT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an experiment?

A

The systematic procedure carried out under controlled conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the role of experiments (3)?

A
  • To discover an unknown effect
  • To illustrate a known effect
  • To test OR establish a hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What should experiments be designed to do?

A

Minimise BIASES that might occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When analysing a process, experiments are used to evaluate…

A
  • Which PROCESS INPUTS have a significant impact on the PROCESS OUTPUTS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s the process called behind the several different ways to collect experimental process input/output information?

A

Design of Experiments (DOE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Purpose of experimentation… (6)

A
  • Comparing alternatives
  • Identifying the significant inputs (factors) which affect the outputs response
    I.e. separating vital many from the trivial few
  • Achieving an OPTIMAL PROCESS OUTPUT (response)
  • Reduce Variability
  • Minimizing, Maximizing, or Targeting an Output
  • Achieve product & process robustness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

To minimize bias, you need to…

A

Select your sample of individuals randomly!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the three data collection types? (3)

A
  • Categorical data
  • Numerical data
  • Ordinal data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Categorical data?

A

Records qualities or characteristics about the individual, such as eye color or opinions (agree/disagree)
(NB Numbers do not have “real numerical meaning”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Numerical data?

A

Records measurements or counts regarding each individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Ordinal data?

A

Are in between categorical and numerical: data appear in categories, but the categories have a meaningful order (E.g. Rankings 1st - 5th (best to worst))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If the data set contains an even number of values… (median)

A

The median is the average of the two values that are in the middle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standard Deviation? (definition)

A

Quantifies the typical distance from any value in the data set to the centre

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Standard Deviation (equation)

A

sigma = sqrt (sum: xi - mean x)^2/n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Properties of standard deviation

A
  • Is always +ve
  • Smallest possible value is zero
  • Affected by OUTLIERS
  • Has the same UNITS as the original data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A random variable is…

A

a variable whose possible values are numerical outcomes of a RANDOM PHENOMENON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Types of random variables:

A
  • Continuous

- Discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A probability of distribution is…

A

a list of possible values of a random variable,

together with their probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A binomial distribution is…

A

a frequency distribution of the possible number of
successful outcomes in a given number of trials in each of which there is the same probability of success… (I.e. SUCCESS/FAILURE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Characteristics of a Binomial Distribution (4)

A
  • Must be a fixed number of trials (n)
  • Only two outcomes: SUCCESS/FAILURE
  • The probability of success,p, must remain the same for each trial (p)
  • The outcomes of each trial must be INDEPENDENT of each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

If a random variable X has a binomial distribution, PROBABILITIES for X can be calculated using the following formula:

A

(n choose x) (p^x)(1-p)^n-x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Binomial Distribution parameters:
``` n = no. trials x = no. successes n-x = no. fails p = success probability (any trial) 1-p = failure probability ```
26
Probabilities of a binomial distribution hold between...
0 to n (least/most no. successes in a trial)
27
For a binomial random variable the mean is:
µ = n.p
28
The variance of a random variable is...
The weighted average of the squared distances from the mean
29
The variance of a random variable is... (formula)
sigma^2 = n*p(1-p)
30
Discrete random variable:
A variable which can only take a countable number | of values
31
Continuous random variable:
A random variable takes on values within AN INTERVAL (has so many possible values that they might as well be considered continuous)
32
The most adopted distribution for continuous | random variables:
The normal distribution
33
The Normal Distribution: Definition
Random Variable X follows a normal distribution if its values fall into a bell-shaped continuous curve that is symmetric
34
The Normal Distribution: Fundamental characteristics (3)
- The area under the curve is EQUAL TO UNITY - It has symmetry about the centre (i.e., it has 50% of values less than the mean and 50% greater than the mean) -Each normal distribution is described via the mean, µ, and the standard deviation
35
Saddle Points:
Where the bell-shaped curve changes from concave down to concave up.
36
Distance between the mean and the saddle points
1 σ
37
For any normal distribution, almost all its values lie within __ standard deviations of the mean
3
38
The Standard Normal Distribution, AKA:
The Z-Distribution
39
The Standard Normal Distribution has mean equal to:
0
40
The Standard Normal Distribution has S.D. equal to:
Unity
41
The normal random variable of a standard normal distribution is called a...
Standard score / z-value
42
A value on the Z-distribution represents...
the number of standard deviations the data is | above or below the mean
43
68% of Standard normal distribution values are:
within 1 σ of the mean
44
95% of Standard normal distribution values are:
within 2 σs of the mean
45
99.7% of Standard normal distribution values are:
within 3 σs of the mean
46
To change a value of X into a value of Z, you can use this formula:
z = (X - µ)/σ
48
When a sample of data is taken from a given population of data...
the statistical results/characteristics vary from sample to sample
49
To build the sampling distribution of the sample mean (3):
To build the sampling distribution of the sample mean: 1) Take a sample of values from random variable X (population) 2) Calculate the mean of the sample, 3) Repeat step 1) and 2) over and over again
50
All the sample means result in a new population which is denoted using random variable
X~
51
The sampling distribution of the sample means gives all the possible values of the sample mean and quantifies...
how often they occur
52
A sampling distribution has its own...
shape, centre, and variability.
53
The mean of SAMPLING DISTRIBUTION X~ is denoted as:
µx~
54
The variability characterising a population of values ( | X) is quantified in terms of
Standard deviations
55
The variability in the sample mean X~ is measured in terms of standard errors
σx~ = σx/sqrt n
56
If the distribution of X is normal, then also the distribution of X~ is...
normal
57
If the distribution of X is unknown or not-normal, according to Central Limit Theorem (CLM), the distribution of X~ can be...
approximated with a normal distribution
58
For the sampling distribution X~, it can be approximated to the normal distribution if: (2)
- The population has mean µ, and standard deviation σ | - A sufficient amount of LARGE/RANDOM samples are taken
59
Further, the larger the sample size, n, the closer the distribution of the sample means will be to a...
normal distribution
60
Probability for X~ (formula)
Z = (X~-µx~)/(σx/sqrt n)
61
Confidence Interval:
A range of values so defined that there is a specified probability that the value of a parameter lies within it - sample statistic ± (margin of error) gives a range of likely values for the parameter under investigation.
62
The goal when making an estimate using a confidence interval is to
minimise the margin of error.
63
The size of the margin of error is affected by:
1) Confidence level 2) Sample size 3) Variability in the population
64
Confidence Level:
The probability that the value of a parameter falls within a specified range of values. ... in other words, the confidence level of a confidence interval corresponds to the percentage of the time the result would be correct if numerous random samples were taken.
65
For a given confidence level, the number of standard errors to be added and subtracted (±) is proportional to...
z*-, which determined from the standard normal distribution (Z-)
66
The confidence interval for a population mean is:
x~ ± z*(σx/sqrt n)
67
This means that as n increases both the standard error and the margin of error decrease, with this resulting in a
narrower confidence interval
68
as the confidence level increases,
the margin of error increases
69
When estimating a population mean, the sample size needed to achieve the desired margin of error can be estimated a priori via the following formula:
n = (z*σx/MOE)^2 (next greatest integer)
70
If σx is unknown,
a pilot test can be run in order to make a rough estimate
71
The sample size needed to achieve the desired margin of error can be estimated (very roughly!) via the following formula:
1/sqrt n
72
Variability (also called spread or dispersion) refers to how
spread out a set of data is. Variability is measured in terms of standard errors/deviations
73
To compare two different populations, it is common practice to calculate the confidence interval for the difference of two population means as:
x~-y~ ± z*sqrt(σ1/n1+σ2/n2)
74
A hypothesis test is
a procedure that uses data from a sample to confirm or | deny a claim about a population
75
Every hypothesis test is based on two hypotheses, i.e.:
- null hypothesis H0 | - the research (or alternative) hypothesis (denoted Ha)
76
Ha can be formed in three different ways, the population parameter is _____ to the claimed value (3)
- Not equal to - Larger than - Smaller than
77
The null hypothesis is set up so that H0 is
true unless some data and statistics demonstrate otherwise
78
a statistically significant result is when:
H0 is rejected in favour of Ha
79
As soon as the z-value of interest is known, proceed as follows:
⊗ if Ha is the less than alternative then: p-value = z-value ⊗ if Ha is the greater than alternative then: p-value = 1 - z-value ⊗ if Ha is the not-equal-to alternative then: p-value = 2*z-value
80
bivariate data set
each observation is described using two variables, x and y
81
After organising your bivariate data set, you can...
⊗ look for patterns ⊗ find a possible correlation ⊗ predict a value fory for a given value for x ⊗ summarise the dataset with scatterplots
82
given a bivariate data set, it is important to quantify
STRENGTH & DIRECTION of linear relationship
83
n in the correlation coefficient equation is..
the number of pairs of data
84
we have a strong linear relationship when
r+0.6
85
the correlation coefficient is dimensionless, so that changing the units of X and Y
does not affect r
86
the correlation coefficient does not change if variables X and Y are
switched in the data set
87
Pearson product moment correlation coefficient, R^2, ranges between
0 to 1 for no to perfect correlation
88
Function y=f(x) can be determined using a regression line provided that: (2)
- the data in the scatterplot follow (roughly) a linear distribution - we have a strong linear relationship between x and y, i.e. r+0.6
89
To determine m and b, you can use the following relationship
m = r(σy/σx)
90
A log-log regression line is expressed mathematically as:
y= a x^k
91
log-log line
Y = mX + b (X = logx, Y = logy)