STAT Notes Flashcards

Question

Define a bin

Answer 1

An area in which data is collected

Answer 2

Central values

Answer 3

Proportion of times a particular outcome will occur from a large sample of trials or the likelihood of a particular outcome of an event

Answer 4

Impossible

Answer 5

The actions of one have no impact on the results of the next trial

Answer 6

Graphical distribution of theoretical relative probabilities y=probability, x=potential outcomes

Answer 7

Equivalent to the relative probability

Answer 8

Table Probability tree

Answer 9

Multiply probabilities together

Answer 10

Theoretical probability of each outcome

Answer 11

Observed frequency of each outcome

Answer 12

Frequency distribution approaches probability distribution

Answer 13

Can be used when there are two groups (such as A and B or pass and fail) NOTE: we can create these groups if we define some outcomes as "success" and the others as "failure" and classify other outcomes beneath these banners

Answer 14

Predict the probability of success in a single trial Predict the proportion of successes in n trials

Answer 15

- 2 outcomes (P(success)=p and P(failure)=q) and p+q=1 - Each trial is independent with equal p - Fixed no. trials

Answer 16

Begins to resemble continuous data

Answer 17

Probability distribution

Answer 18

Area under the graph up until that point

Answer 19

- Understand the certainty of a hypothesis test - Don't base scientific decisions on hypothesis tests alone - Consider the wider picture and plausibility of results

Answer 20

H0: null hypothesis (no change) HA: alternative hypotheses (covers all other probability) These hypotheses must be mutually exclusive

Answer 21

Areas above the critical value (above the alpha)

Answer 22

Area at the end of the distribution

Answer 23

Two-tailed test

Answer 24

The p value assumes the null hypothesis is true and gives the probability of getting a result that extreme or more assuming this

Answer 25

One that shows all possible HA and H0 outcomes

Answer 26

False Positive Type I error We do not know what is true

Answer 27

There is a true negative H0 is true

Answer 28

False negative Type II error HA was true

Answer 29

True positive H0 is untrue, this does not confirm HA

Answer 30

True negative (H0 is true and we fail to reject H0) Type I error (H0 is true and we reject H0)

Answer 31

5% Type I error (95% true negative)

Answer 32

How powerful a test is at detecting true positives when there really is a difference to detect

Answer 33

When we are outside the critical value (in the direction of the H0) This is type II error and is shown where the HA graphs overlaps with H0

Answer 34

The area of overlap between the H0 and HA graphs (where HA is true)

Answer 35

Power=1-beta

Answer 36

2.1% of the time

Answer 37

Smaller It is more difficult to identify a true error

Answer 38

There will be a lower rate of false negatives (type II error)

Answer 39

Increase effect size: Separate the curves to be skinnier Increase distance between peaks

Answer 40

Power increases (less type II error)

Answer 41

Increased trials (decreases curve dispersion)

Answer 42

There must be two hypotheses: H0 - null hypothesis (no change/ effect) HA - alternative hypothesis (mutually exclusive and covers all other options (different for one and two-tailed tests))

Answer 43

It is only the probability of a false positive if the alternative hypothesis is true, we can not know if the alternative hypothesis is true we can only speculate based on evidence

Answer 44

Proportion of true positives for a particular HA

Answer 45

Comparing and testing several conditions or treatments

Answer 46

When comparing two samples with each other (i.e.: control and drug)

Answer 47

When comparing a sample to a mean

Answer 48

When samples are closely replated to one another (such as before and after a treatment)

Answer 49

Outcome variable is continuous dependent variable and experimental variable is bivariate independent variable Normal distribution Equal Variance

Answer 50

Contains two groups

Answer 51

A normal quantile-quantile plot compares quantiles of your data to theoretical quantiles for a normal distribution (if these match closely the data is normally distributed)

Answer 52

There is an increase in the probability of false positives (FWER (family-wise error rate))

Answer 53

Family wise error rate is the probability of getting a false positive if the null hypothesis is true

Answer 54

(1-alpha)^n in n tests

Answer 55

1-(1-alpha)^n

Answer 56

Compares several samples with each other and compares variance within samples with that between samples

Answer 57

Analysis of variance (ANOVA)

Answer 58

Compare means with one another to find statistical difference

Answer 59

Mean of sample means (Add all means and divide by number of groups)

Answer 60

Observational Experimental

Answer 61

Makes observations without intervention

Answer 62

A study where an intervention is made to test a hypothesis

Answer 63

Any relevant condition, characteristic, number or quantity that can be measured, assessed or counted

Answer 64

Explanatory variable

Answer 65

Response variable

Answer 66

One that could impact the measurement from your dependent variable in addition to your independent variable

Answer 67

The difference between the result for a whole population and the result from our sample or experiment.

Answer 68

Sampling error Bias

Answer 69

The possibility that the sample is not a perfect representation of the population

Answer 70

Normal (allowing for statistical testing)

Answer 71

Replication Balance Blocking

Answer 72

The more data we collect he more insignificant errors become

Answer 73

Technical Biological

Answer 74

These are additional measurements or analyses taken from the same sample. They help account for variability introduced by the measurement process itself.

Answer 75

These involve separate samples that are independently manipulated or tested under identical conditions

Answer 76

Grouping experimental units with similar properties

Answer 77

This is the process of comparing groups of similar sizes

Answer 78

Error caused by a systematic difference in the estimation of the sample and the whole population

Answer 79

Any (Design, data collection, analysis, publication etc...)

Answer 80

Simultaneous control groups Blinding Randomisation

Answer 81

A group of subjects not exposed to the experimental treatment but are treater the same in all other ways

Answer 82

Untreated control Vehicle control

Answer 83

Subject in it's native state with no treatment

Answer 84

Subject undergoes treatment with everything but the exact thing being tested (e.g.: the drug)

Answer 85

Testing against a pre-existing drug as opposed to a vehicle control

Answer 86

A control which defines what a positive result looks like

Answer 87

Result which defines what a negative result looks like

Answer 88

The process of obscuring whom has which treatment to limit the placebo effect

Answer 89

Assigning random places to random individuals such to not introduce further sampling bias

Answer 90

Correlation Regression

Answer 91

It's strength and direction

Answer 92

Correlation coefficient

Answer 93

Very weak correlation or negligible between the two variables

Answer 94

Weak or low correlation between the two variables

Answer 95

Moderate correlation between the two variables

Answer 96

Strong, high and marked correlation between the two variables

Answer 97

Very strong and very high correlation between the two variables

Answer 98

How much of the variation in one variable can be explained by the other

Answer 99

1. Looking for an association between variables where neither is experimentally manipulated 2. Experimentally manipulating one variable and looking to see whether the other variable changes too

Answer 100

Regression

Answer 101

A higher correlation coefficient

Answer 102

There is little variability about the line of best fit

Answer 103

When there is a linear correlation

Answer 104

Assessment of how well a linear regression line fits data

Answer 105

Using the r^2 value Looking at the residuals

Answer 106

As a straight line through the data points

Answer 107

The point (y) a dataset at a given is expected to be seen on a regression line

Answer 108

The distance between a given point and it's fitted value

Answer 109

Plot a residual plot - residual against fitted value - and observe if there are any patterns

Answer 110

A linear equation may not be appropriate for the data presented

Answer 111

Plots are evenly scattered about the line on either side with even distribution

Answer 112

Yes using the linear regression

Answer 113

No, we need to create a regression in the other direction to describe b in terms of a

Answer 114

Refers to a number of activities, often related to the misinterpretation of statistics, that occur in published scientific work

Answer 115

The practice of cherry picking refers broadly to only presenting one side of the story. Specifically in relation to statistics, this translates as choosing not to report parts of your analysis which do not agree with the story you are trying to tell. This is often used to "tidy up" or create a "convincing" story

Answer 116

Ultimately manipulating your data or analysis to result in a significant p value

Answer 117

- check the statistical significance before deciding whether to collect more data - stopping data collection as soon as results reflect those desired - excluding data after checking impact on significance - adjust models on the basis of whether or not a significant result is obtained without proper justification - rounding a p-value to the threshold - hidden multiple testing and therefore no p value adjustments

Answer 118

Hypothesis after results are known is presenting results that have been discovered as if they were expected or as if they were the main study aim (overstating prior knowledge of the study). Presenting ad hoc or unexpected results in this way is misleading

Answer 119

An unplanned or supplementary analyses conducted to explore specific aspects of data that weren't the primary focus of the study. This is done on an as-needed basis to investigate particular comparisons or relationships not initially accounted for in the main analysis.

Answer 120

No, they are questionable but not misconduct

Answer 121

Fabrication and falsification

Answer 122

Making up data or results

Answer 123

The manipulation of research materials, data or results

Answer 124

Data needs to be normally distributed Data should be from independent observations, which means that there is no relationship between the observations in each group or between the groups themselves. Equal variances between groups (Homogeneity of variances, Homoscedasticity)

Answer 125

The fundamental assumption that the variance of the errors (or residuals) should be constant across all levels of the independent variable(s) (Violated homoscedasticity is known as heteroscedasticity)

Answer 126

Refers to the similarity or uniformity of certain characteristics within a group or between groups.

Answer 127

K-1 Where K is the number of groups being compared

Answer 128

N-K Where K is the number of groups being compared and N is the total number of observations/data points collected.

Answer 129

Quantifies variability between the groups of interest and within groups of interest in separate rows

Answer 130

The square of the difference between each datapoint and the overall mean, also called SST, for sum of squares (total).

Answer 131

The sum of squares within the groups is defined as the square of the difference between each datapoint and the mean of the group it belongs to. This shows the variation among each single groups.

Answer 132

The sum of squares within the groups is defined as the square of the difference between each mean of the groups and the overall mean for each datapoint. This shows the variation among between the groups.

Answer 133

Q3+1.5 IQR

Answer 134

Q1-1.5 IQR

Answer 135

The binomial distribution is discrete, dealing with the number of successes in a fixed number of trials.

Answer 136

The normal distribution is continuous and is often associated with the distribution of measurements in a population.

Answer 137

The binomial distribution is characterized by the number of trials (n) and the probability of success (p).

Answer 138

The normal distribution is characterized by the mean (μ) and standard deviation (σ).

Answer 139

Use 1-(alpha/2) at each end

Answer 140

A boxplot is a qualitative analysis whilst an ANOVA is quantitative

Answer 141

ANOVA output This is a variance estimate and what is used to calculate the F-statistic, the next column. Calculated by taking the Sum of Squares divided by DF on the same row

Answer 142

This is defined as the ratio between the Mean Squares between and within. Calculated by Mean squares of row 1/mean squares of row 2.

Answer 143

If it is below a threshold value, the NULL hypothesis can be rejected

Answer 144

More likely to be a statistically relevant difference between groups.

Answer 145

F(dfbetween, dfwithin) = F Statistic, p =

Answer 146

post-hoc tests such as the Tukey Honest Significance test

STAT Notes Flashcards

(184 cards)