Sub-topic 1: Variables, distributions and summary statistics Flashcards

Week 2 (58 cards)

1
Q

Biologists make observations of (collect data on) selected
variables on a sample from the population to estimate
the value of one or more parameters of that
population.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
Variable: any observable feature of the
natural world (e.g. number of limpets in
a quadrat, sex of a frog, moisture
content of a leaf). These are all variables
as they have the potential to vary
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population: the target group of interest in
the study. Can be finite (e.g. number of fish
in a pond) or infinite (e.g. number of fish in
the ocean)

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
Sample: we cannot practically count
every unit in a population, therefore we
sample a subset of a population and
attempt to draw inferences about the
entire population from this sample
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Parameters: a parameter is some
characteristic of the distribution of the
variables in a population (e.g. the
average or variance of weights of fish in
a pond)

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

VARIABLE A variable is any observable feature of the natural world

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DATUM A datum, or observation, is any one record of the state of a variable.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DATASET Any collection of observations made on a variable is a data set

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

POPULATION The set of all possible observations on a variable is the population

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

FINITE
POPULATION
Populations can be either finite or infinite. Finite populations have a finite, countable
number of elements and can, in theory at least, be completely sampled.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

INFINITE
POPULATION
Infinite populations have an infinite number of elements and can never be completely
sampled.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SAMPLE Large and infinite populations cannot be observed in their entirety, so we take only
(nearly always randomly) a sample (sub)set of observations from a population.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

PARAMETER A parameter is some characteristic of the distribution of the values of a variable in a
population.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

STATISTIC The term “statistic” is used in two ways: to refer to the entire body of procedures for
dealing with data; or to refer to estimates of population parameters based on samples.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

NOMINAL/
CLASSIFICATION
Features which can be classified into named groups, lacking
order

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ORDINAL/
RANKING
Features which can be ranked in order

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

NUMERICAL/
QUANTITATIVE
Features which can be enumerated or quantified (counted or
measured)
e.g. weight, number, temperature, counts of animals
Can be subdivided into
• e.g. Interval v Ratio: arbitrary zero and unit
(temperature [Celsius]) v true zero (weight)
• Discrete v Continuous: values which are whole
numbers (counts) v values which can be fractions
(weight)

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
Measures of location
Mode
most common value
Median
middle value
Mean
average value
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
Symbols
Sample mean
⨱
Population mean
μ
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
Measures of shape
Variance
spread of distribution
Skewness
skew of peak of distribution to one
side of the mean
Kurtosis
“peakedness” or “flatness” of
distribution
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
Symbols
Sample
variance
s2
Population
variance
σ2
Sample standard
deviation
s
Population standard
deviation
σ
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

VARIANCE (s2): measures the dispersion of data around their

mean value

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

NORMAL DISTRIBUTION: a symmetric distribution, often called a
bell-curve which describes many parameters of the natural world,
e.g. height, weight, test scores in a very large class

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

STANDARD DEVIATION: �2 = a measure of dispersion in the

data, standardised relative to the mean

25
SKEWNESS: measures the extent to which the distribution is | “pushed” to either side of the mean
Yes.
26
KURTOSIS: measures the “peakedness” of the distribution
Yes.
27
meso – middle, intermediate, halfway • lepto – small, fine, thin, delicate • platy – broad, flat
Yes.
28
``` Accuracy • How close the estimate is to the true value • A biased method gives estimates which differ consistently from the true value • Cannot be determined from the data ```
Yes.
29
``` Precision • How close repeated estimates are • Precision can be determined from the data (standard error, confidence interval) ```
Yes.
30
``` Accuracy – how true (unbiased) is the result? • Requires attention to the methods of sample selection and measurement • Ensure no bias in instruments or techniques • Calibrate, ground-truth or similar • Randomly select samples ```
Yes.
31
``` Precision – how variable is the result? • Requires attention to sampling design and effort • Vary the number of samples taken Vary the size of the sampling unit • Vary the arrangement of the sampling units ```
Yes.
32
``` Aim • To estimate the lead content of oysters with a SE (standard error) of 8 ppm Prior sampling indicates that s2 (variance) is usually about 900 ppm Method • Calculate SE for varying n and use number which gives desired SE > this is about 14 ```
Yes.
33
Recap: Take home messages • Variables are any feature of the world – divided into 3 main groups: • NOMINAL/CLASSIFICATION variables can be classified into named groups e.g. sex, colour, habitat type etc. • ORDINAL/RANKING variables can be ranked in order e.g. social position, size-class, education level etc. • NUMERICAL/QUANTITATIVE variables can be enumerated or quantified (counted or measured), e.g. height, weight etc.
Yes.
34
• Variable distribution can be graphically represented in frequency histograms showing the ‘shape’ and ‘spread’ of the data
Yes.
35
Summary statistics can be broadly divided into measures of location (e.g. mean, median and mode) and measures of shape (e.g. variance and standard deviation)
Yes.
36
Shape of distribution is vital in choosing appropriate statistical tests
Yes.
37
To be reliable observations and estimates should be accurate (not showing bias) and precise (not too variable)
Yes.
38
``` Precision – how variable is the result? • Requires attention to sampling design and effort • Vary the number of samples taken Vary the size of the sampling unit • Vary the arrangement of the sampling units ```
Yes.
39
``` Scheme Advantages Disadvantages Uses Simple random (SR) Usually simple to use Provides limited information, probably not efficient or precise Pilot studies, simple studies Stratified Usually provides more precise results than other methods; provides more information than SR More complex to run and analyse; may take more time to sample Situations where an area, or population, can be divided into homogeneous strata; testing hypotheses Cluster When the situation is suitable, this scheme is likely to be more efficient; provides more information than SR More complex to run and analyse; may be less precise than stratified sampling Situations where items of interest are naturally grouped in clusters; can be used to test some types of hypotheses Systematic Usually simple to use Unless done carefully, may provide biased estimates Drawing maps and similar situations ```
Yes.
40
There is no one-size-fits-all approach, and depends on, e.g.: • Cost/benefit à pilot studies useful in this regard • Accuracy and precision trade-offs • The study focus and ramifications • Three main samplings schemes (and a special fourth case): • Simple random • Stratified random • Cluster • (Systematic) • Models may be developed from observations and tested by • Sampling (mensurative) experiments [more general results]; or, • Manipulative experiments, generally trickier but can provide much more explicit tests of mechanisms
Yes.
41
Use a balanced design when possible • Balanced designs have equal numbers of replicates in all treatments • Analysis is usually easier • More readily meet the assumptions necessary for some tests • May conflict with the requirements of a stratified sampling scheme
Yes.
42
Use a multifactor design when possible • These designs are usually more efficient (more powerful for same effort) • These designs are usually more informative • Ensure that all combinations of treatments are included This is referred to as an orthogonal design Non-orthogonal designs can be difficult to analyse
Yes.
43
REPLICATION à DO IT! • Studies must be replicated in order to draw correct inferences • Avoiding pseudoreplication is imperative and requires close attention
Yes.
44
CONFOUNDING FACTORS • To be avoided at all costs • Makes it very difficult, if not impossible, to test hypotheses
Yes.
45
RANDOM & INDEPENDENT • Random samples alleviate bias and maintain independence • Non-independence can lead to incorrect conclusions • May be done if implicitly part of the study and appropriately accounted for
Yes.
46
BALANCED & COMPLETE (ORTHOGONAL) • Usually provide more/better information • Easier to analyse in many instances • Unbalanced can be accommodated, incomplete not so much
Yes.
47
Test Statistics • Theoretical distributions based on sample data for varying sample sizes (degrees of freedom)
Yes.
48
Test Statistics • Theoretical distributions based on sample data for varying sample sizes (degrees of freedom)
Yes.
49
Multiple pairwise tests (3+ means) • DON’T DO IT • Greatly inflate Type I error rate (≈ 5% per comparison) • ANOVA • Use when comparing 3+ means • Controls � at 0.05 (5%) for the entire procedure • Assumptions • Independent, normal, equal variance, additive • Check assumptions graphically and using Cochran’s test *generally robust to violations of normality and equal variance assumptions
Yes.
50
Post-hoc multiple comparisons • ANOVA identifies significant result but doesn’t tell you WHERE • Use post-hoc tests, i.e. Tukey’s HSD to determine which means differ
Yes.
51
``` Sub-topic 5: Two-factor ANOVA Design • Two rivers were sampled: one with pollutant released in upper reaches; the other was the closest similar unpolluted control • Samples were taken in the upper reaches of each river, at the mouth, and about half-way between • 3 water samples were collected at each combination of river and section • The number of plankton was counted and pollutants measured Factors River (Pollution): Polluted, Control Section (Area): Upper, Middle, Lower Replicates Water samples: 3 Variables Number of plankton Pollutants ```
Yes.
52
``` Arrangement and number of factors: Stratified (crossed/orthogonal) – all levels of one factor are present with all levels of the other(s) factor Cluster (nested/hierarchical) – some levels of one factor are present only at some levels of the other(s) factor Selection and number of replicates: Random – the replicates in each subgroup are randomly and independently selected Repeated measures – the replicates in some subgroups are the same as replicates in other subgroups Sub-topic 5: Two-factor ANOVA Selection and number of levels: Fixed – specific levels are chosen from the range available (or all available levels are used) Random – the levels in the study are randomly selected from those available and not all available are used These affect: - the complexity of the design - the appropriate model for the analysis - the power of tests for different effects ```
Yes.
53
``` Correlated variables vary together Parametric (Pearson’s) • For numerical or quantitative variables • r measures the closeness of the relationship • Correlation analyses linear relationships Non-linear relations need other methods Non-parametric (Spearman’s rank) • Non-parametric correlation is used when one or both variables is ordinal • May be useful when the assumptions of parametric analyses do not hold ```
Yes.
54
``` Assumptions of parametric correlation Normality of observations The observations on each of the variables are assumed to be normally distributed Linearity of relationship • The relationship is assumed to be linear (a straight line) • Checking assumptions Normality of observations • With many observations (>40) can plot frequency distribution • With fewer observations can do normal probability plots (not discussed in this unit) Linearity of relationship Graph the data! ```
Yes.
55
A valid test for a correlation requires the following: • Quantitative observations: if one or both variables are ranking, or ordinal, variables, use the non-parametric correlation coefficient (Sub-Topic 2)
Yes.
56
Independent observations: the selection of one point (e.g. animal) must not influence whether or not any other point is selected. Analyses of nonindependent observations may be unreliable
Yes.
57
Observations bivariate normal: this means that the two variables must be normally distributed. If one or both variables are not normally distributed it may be possible to transform them so that they are (Sub-Topic 4)
Yes.
58
Linear relationship: the correlation coefficient measures only the degree of linear relationship. If the relationship is not linear it may be possible to transform one or both variables so that it is (Sub-topic 4)
Yes.