Part 1 Flashcards by Joey Baxter

Observation –> … –> … –> …

question, hypothesis, prediction

How well did you know this?

Not at all

Perfectly

Observation: Gammarus occurs almost entirely under stones (rather than open streams)

Question: … … Gammarus spend most of its time under stones?

why does

How well did you know this?

Not at all

Perfectly

Hypothesis - an … proposed to account for observed facts - there is often more than one hypothesis generated
e.g.

Gammarus occurs under stones because:

need to shelter from current
their food gets trapped and accumulates under stones
they are subject to predation by visually hunting fish and need to remain out of sight

explanation

How well did you know this?

Not at all

Perfectly

Predictions - what you would … … … if the hypothesis was true - should be testable and ideally unique to hypothesis it is based on

e.g. shelter hypothesis - a greater proportion of gammarus should be found in the open in streams with slow flow (or slower flowing areas of a stream)

predation hypothesis - gammarus should aggregate under stones more in streams where fish are present than where they are not

expect to see

How well did you know this?

Not at all

Perfectly

Hypotheses are … or not …, but rarely …

rejected, rejected, proved

just bc one hypothesis is supported doesn’t mean there isn’t another underlying explanation - can’t think of all possible hypotheses - with the right evidence we can be sure that hypotheses cannot be true

How well did you know this?

Not at all

Perfectly

Cycle of proposing hypotheses and then seeking evidence potentially capable of falsifying them is the scientific process often termed …

falsificationism

How well did you know this?

Not at all

Perfectly

A variable is…

any characteristic that can be measured or experimentally controlled on different items or objects

numeric or non-numeric (e.g. colour)

How well did you know this?

Not at all

Perfectly

A set of related variables is known as a … …

data set

How well did you know this?

Not at all

Perfectly

Numeric variables can be categorised as belonging to … or … scales

interval, ratio

How well did you know this?

Not at all

Perfectly

Categorical variables can be characterised as … or …

nominal, ordinal

How well did you know this?

Not at all

Perfectly

Nominal variables…

arise when observations are recorded as categories that have no natural ordering relative to one another, e.g. marital status, sex, colour morph

How well did you know this?

Not at all

Perfectly

Ordinal variables…

occur when observations can be assigned some meaningful order, but where the exact ‘distance’ between items is not fixed, or even known, e.g. degree of aggressiveness sorted into the categories: initiates attack (3), aggressive display (2), ignores (1), retreats (0).

Rank orderings are also a type of ordinal data (e.g. place in a race - 1st 2nd 3rd etc.)

can say something about relationship between categories: larger score = more aggressive response, greater score = slower runner. But cannot say aggressiveness score of 2 is twice as aggressive as a score of 1

How well did you know this?

Not at all

Perfectly

Interval scale variables take values on a … numerical scale, but where the scale starts at an … point. e.g. … on a … scale but not on a … scale

consistent, arbitrary, temperature, celsius, Kelvin

can say difference between 60 and 70 degrees C is the same as that between -20 and -10, but cannot say 60 degrees C is double the temperature of 30 degrees C

How well did you know this?

Not at all

Perfectly

Ratio scale variables have a true … and a known consistent mathematical relationship between any points on the measurement scale, e.g. … scale for temperature

zero, kelvin

on Kelvin scale 60K is double the temperature of 30K

How well did you know this?

Not at all

Perfectly

Can meaningfully … or … with interval scales, but cannot meaningfully …, as you can with ratio scales

add, subtract, multiply

How well did you know this?

Not at all

Perfectly

In general … variables are the best suited to statistical analysis

ratio

How well did you know this?

Not at all

Perfectly

Accuracy is…

how close a measurement is to the true value

How well did you know this?

Not at all

Perfectly

Precision is…

how repeatable a measure is, irrespective of whether it is close to the true value

How well did you know this?

Not at all

Perfectly

The number of … … we use suggests something about the precision of the result. A value of 12.4 actually measured with the same precision as 12.735 should properly be written …

significant figures, 12.400

How well did you know this?

Not at all

Perfectly

Usually the worst form of error is …, a … lack of accuracy

bias, systematic (the data are not just inaccurate but all tend to deviate from the true measurements in the same direction)

How well did you know this?

Not at all

Perfectly

E.g.s of bias:

…-… sampling
… of biological material
… by the process of investigation (e.g. adrenaline increased by process of sampling adrenaline in blood)
… bias

non-random (selective sampling techniques), conditioning, interference, investigator

How well did you know this?

Not at all

Perfectly

What does a population mean in statistics?

Any group of items that share certain attributes or properties

How well did you know this?

Not at all

Perfectly

The goal of statistics is to learn something about … by … data collected from them

populations, analysing

How well did you know this?

Not at all

Perfectly

Statistical populations are defined by the …

investigator

How well did you know this?

Not at all

Perfectly

What is a population parameter?

A numeric quantity that describes a particular aspect of the variables in the populations (describes a feature of the distribution of variables in the population) - e.g. population mean, variance, correlation

The sample chosen must be as ... as possible of the whole population

representative

A point estimate is useless on its own, as estimates are always derived from a ... ... of the wider population. They must be accompanied by a value of ....

limited sample, uncertainty

The chance variation that arises in different estimates using different random samples is known as ... ...

sampling error (or sampling variation)

The sampling distribution is the the distribution we expect a particular estimate to follow

yes

sample size is often denoted as "..."

Sampling error is ... as sample size is ...

reduced, increased

The standard error of an estimate is the ... ... of its ... ...

standard deviation, sampling distribution

R doesn't like ...

percentages (use decimals e.g. 0.4 to represent 40%)

... statistics works by asking "what would have happened if we were to repeat an experiment or collection exercise many times, assuming that the ... remains the same each time"

Frequentist, population then working out how likely a particular result is based on the distribution of data

The two most important ideas in frequentist statistics are ...-... and ... ...

p-values, statistical significance

Sampling with replacement: each artificial sample is called a ... ...

bootstrapped sample

If a probability (p) value is less than the chosen ... ... we say the result is said to be statistically significant

significance level

The process of assigning random labels is called ...

permutation

The p-value is the ... of obtaining a test statistic equal to or 'more extreme' than the ... value, assuming the ... hypothesis is true

probability, estimated, null

All frequentist statistical tests work by specifying a ... ... and then evaluating the observed data to see if they ... from the ... ... in a way that is inconsistent with ... variation

null hypothesis, deviate, null hypothesis, sampling

H0 is the ... hypothesis and H1 is the ... (or ...) hypothesis

null, test, alternative

The alternative hypothesis is essentially a statement of the effect we are ... ... ...

expecting to see (e.g. purple and green plants differ in their mean size)

... the null hypothesis is not ... the alternative hypothesis

rejecting, proving

Large p value means observed result is quite likely if the null hypothesis is ...

true (i.e. due to sampling variation) | - cannot reject null hypothesis (not the same as accepting the null hypothesis is true)

Do not confuse ... significance with ... significance

statistical, biological - a result may be statistically significant but biologically trivial, e.g. pH in open water (7.1) vs in beds of submerged vegetation (6.9) is statistically significant but a very small effect and almost certainly of no importance to all the invertebrates.

The significance of a result depends on a combination of three things: 1. The size of the true effect in the ... 2. The ... of the data 3. The ... size

population, variability, sample

We must always evaluate the ... of an analysis to determine whether or not we trust it

assumptions

In conceptual terms, the statistical models we use describe data in terms of a ... component and a ... component

systematic, random observed data = systematic component + random component

The normal distribution is completely described by its ... (a measure of "central ...") and its ... ... (a measure of dispersion)

mean, tendency, standard deviation

If a variable is normally distributed, then about ... of its values will fall inside an interval that is ... standard deviations wide

95%, four

The variable name on the left of the ~ must be the variable whose...

mean we want to compare. The variable on the right must be the indicator variable that says which group each observation belongs to.

Correlations are statistical measures that quantify an ... between two ... variables

association, numeric two sample t test - numeric btw categorical variables

A correlation quantifies, via a ... ..., the degree to which. an association tends to a certain pattern

correlation coefficient

If there is no relationship between the variables, the correlation coefficient will be .... The closer to ... the value, the weaker the relationship. A perfect correlation will be either ... or ..., depending on the direction.

zero, zero, +1, -1

A regression (not a correlation) allows us to make...

predictions about the value of one variable from the value of a second variable - as a line is fitted through the data

A simple linear regression allows us to predict how one variable (... ...) responds to another (... ...), using a straight-line relationship

response variable, predictor variable

How do we find line of best fit?

Line with lowest residual sum of squares | residuals are vertical distance from line of best fit

Response variable on ... axis, predictor variable on ... axis

y, x

Regression model: ... variable on the left of the ~, ... variable on the right

response, predictor

Larger F values indicate a stronger relationship between...

x and y

ANOVA: - Measure total variation using sum of squares of deviations from the ... ..., ... variation (within group variation = sum of squares of deviations from individual group means), and between-group variation (sum of squares of deviation of ... from the ... ...) - Convert to measures of variability that don't scale with sample size and number of groups (using ... ... ...) - each of 3 sums of squares has different d.f. value - total d.f, treatment d.f., error d.f. Then calculate mean square = sum of squares/ degrees of freedom

grand mean, residual, means, grand mean, degrees of freedom

Squaring negative deviations lead to...

a positive number

The important message is that ANOVA works by making just one comparison: the ... variation and the ... variation

treatment, error

One-way anova does not require ... ...

equal replication - it will work even where sample sizes differ between treatments

An experimental factor is a controlled variable whose levels are...

set by the experimenter

Anova p-value of lower than 0.05 suggests that...

at least one of the treatments is having an effect - global test of significance as it doesn't tell us anything about which means are different

Find standard error stuff in...

one-way anova section

Left skew - ... data Right skew - ... data

square, log

Independence: value of measurement from one object is not...

affected by the values of other objects

Pseudoreplication is an ... increase in the ... ... (and hence d.f.) caused by using ...-... data

artificial, sample size, non-independent

To carry out a t-test on paired data we have to: 1. Find the mean ... of all the pairs 2. evaluate whether this is significantly different from .... This is actually an application of the ...-... ...-...

difference, zero, one-sample t-test

In paired t-tests there is no need for the original data to be drawn from a ... .... It is the differences between pairs that do

normal distribution

What does RCBD stand for?

Randomised Complete Block Design - each block sees each treatment exactly once

... what you can; ... what you cannot

block, randomise

The only thing that distinguishes ANOVA and regressions is the..

type of predictor variable they accommodate (categorical vs numerical)

``` ANCOVA: residuals generated for: 1. Separate means vs grand mean 2. Common slope vs separate means 3. Separate slopes vs common slope (interaction) ```

yes

The word "treatment" should be used for ... rather than ... studies

experimental, observational

chi-squared must be carried out on the actual ... not ... or ..., or the ... of data

counts, percentages, proportions, means

A non-parametric test is just a catch-all term that applies to any test which doesn't assume the data are...

drawn from a specific distribution

Chi-squared tests are ...-...

non-parametric - as they make weak assumptions about the frequency data

non-parametric test calculations are done using the ... ... of the data

rank order

Paired t-test: Distribution of ... does not need to be normal! Only distribution of ... does!

samples, differences if differences not normally distributed - can use wilcoxon test

Mann_Whitney U null hypothesis: ... are the same

medians (looking for differing central tendency) - significant p-value means medians are likely to be different

Part 1 Flashcards

(83 cards)