Statistics & Research Design Flashcards

(66 cards)

1
Q

Univariate Analysis

A

Analysis of a single response variable.

(Ieno & Zuur, 2015)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Multivariate Analysis

A

Analysis of multiple response variables.

(Ieno & Zuur, 2015)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Outline the scientific method.

A
  1. Make an observation.
  2. Ask a question.
  3. Form a hypothesis, or testable explanation.
  4. Make a prediction based on the hypothesis.
  5. Test the prediction.
  6. Iterate: use the results to make new hypotheses or predictions.

https://www.khanacademy.org/science/biology/intro-to-biology/science-of-biology/a/the-science-of-biology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hypothesis

A

An explanation of something that was observed.

A clear statement that articulates a plausible explanation that would either refute or support that explanation.

Needs to be testable.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Prediction

A

More specific than hypothesis - it is the outcome that you expect to observe if your hypothesis is true.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is it common practice to come up with multiple alternative hypotheses?

A

It reduces the chance that the researcher becomes attached to one hypothesis and confirmation bias.

It causes researchers to think of possible causes for patterns in nature before-hand rather than after-the-fact making findings more reliable.

(Betts et al, 2021)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is a hypothesis important?

A
  1. Reduce bias
  2. More reliable
  3. Increase reproducibility

(Betts et al, 2021)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we do statistics?

A

Statistics allow us to
- make educated decisions,
- infer information from a sample rather than having to study a whole population
- make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When are hypotheses not useful?

A
  1. When the goal is prediction rather than understanding.
  2. When the goal is description rather than understanding.
  3. When the objective is a practical planning outcome such as reserve design.

(Betts et al, 2021)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is inductive research?

A

Observing first then coming up with explanations later.

(Betts et al, 2021)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the limitations of the Shannon-Wiener Biodiversity Index?

A

It won’t show ecological differences between the habitats.

i.e., my two wetlands may have the same biodiversity values even though they are made up of different species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Pseudoreplication

A

Occurs when subjects are not independent of each other but you treat them as if they were (e.g., sampling the same individual more than once).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does pseudoreplication apply to telemetry?

A

The lack of independence between successive observations in telemetry data or in the derived behaviour or fates of tagged fish can give rise to pseudoreplication if treated as independent observations in analyses. Failing to account for pseudoreplication can lead to incorrect conclusions in hypothesis testing frameworks as well as misinformed interpretations of the data.

(Brownscombe et al, 2019)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Population (in statistics)

A

The group of ALL things we are interested in (e.g., all house cats).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sample

A

Subset of the population that we measure.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the properties of a good random sample?

A

Every unit in the population has to have an equal chance of being included in the sample.

Every unit in the sample should be independent of each other - an observation of one individual should not provide any useful information about another individual in the sample.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the two major types of data?

A

Quantitative
and
Qualitative

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is quantitative data? How is it broken down?

A

Numerical data.

It can be discrete or continuous.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Discrete Data

A

Numerical data that includes integer values only (e.g., # of matings, # of species).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Continuous Data

A

Numerical data that is real numbers; can have decimals (e.g., length, mass).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is qualitative data? How is it broken down?

A

Categorical data i.e., data that is subdivided into categories.

It can be nominal or ordinal.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Nominal Data

A

Categorical data that has no inherent order (e.g., sex, hair color).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Ordinal Data

A

Categorical data that has a natural order (e.g., rank, life history stage).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Replication

A

Repeating a measurement.

The number of “subjects”, “objects”, or “individuals” sampled; how the procedure was repeated.

Each of the repetitions is called a replicate.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Biological Definition of Replicates
An exact copy of a sample that is being analyzed, such as a cell, organism or molecule, on which exactly the same procedure is done. This is often done in order to check for experimental or procedural error. In the absence of error replicates should yield the same result. However, replicates are not independent tests of the hypothesis because they are still the same sample, and so do not test for variation between samples. (Wikipedia)
26
Statistics
A method for describing and measuring aspects of nature from samples. We need it whenever the features we are trying to study are noisy/variable/unpredictable. Methods allow us to quantify uncertainty (error) in our esimates. (BIOL 1105 Notes)
27
Descriptive Statistics
Numbers that capture important features of a sample (prior to testing). Summarizes details of the sample (e.g., sample size, average tail length). (BIOL 1105 Notes)
28
Inferential Statistics
Numbers that capture important features of the population after conducting hypothesis testing. Used to determine how well our observed data fits with a particular hypothesis/null hypothesis. (BIOL 1105 Notes)
29
Response Variable
Aka dependent or outcome variable. The outcome we are interested in - effect. (BIOL 1105 Notes)
30
Predictor Variable
Aka independent or explanatory variable. The thing(s) that we hypothesize is affecting the outcome - cause. (BIOL 1105 Notes)
31
Confounding Variable
An 'extra' variable that you did not account for and that influences the variable you are investigating. (BIOL 1105 Notes)
32
Hypothesis Testing
Compares a dataset to the expectation derived from a specific null hypothesis. If the data are too unusual under the assumption that the null hypothesis is true, then we reject the null hypothesis. (BIOL 1105 Notes)
33
Null Hypothesis
A statement about a population parameter that negates our research hypothesis. i.e., there's no effect or relationship (BIOL 1105 Notes)
34
P-Value
The probability of getting a result at least as extreme (or more extreme) than the result we actually did get, assuming the null hypothesis is true. If p < 0.05 there is only a 5% chance that we would have obtained the results we did if the null hypothesis were true so we reject the null hypothesis. (BIOL 1105 Notes)
35
Confidence Intervals
Provide a measure of uncertainty on an estimate by indicating the plausible range in which we can expect the true value of the parameter to lie. e.g., If we repeated the sampling measure many times, the 95% CI is the interval that would capture the true value 95% of the time. (BIOL 1105 Notes)
36
Type I Errors
Aka false positives. Occur when we reject the null when it's actually true. For example, in telemetry this would be detecting the presence of an animal when it was actually absent. (Adams et al, 2012; BIOL 1105 Notes)
37
Type II Errors
Aka false negatives. Occur when we do not reject the null when it's actually false. For example, in telemetry this would be not detecting the presence of an animal when it was actually there. (Adams et al, 2012; BIOL 1105 Notes)
38
Power
The probability that a study will correctly reject a false null hypothesis. (BIOL 1105 Notes)
39
Are Type I or Type II errors worse for conservation? What about for my research?
Normally statistically we want fewer Type II errors. Because a study that has low type II error is said to have high power which is what we want. This is because if the power is too low, it allows little chance of finding a significant difference even when a real difference exists. But it really needs to be looked at in the practical sense. So for example, in conservation if you get a false positive (a type I error) and say this action needs to be done to protect such and such species even though it wouldn't actually work you could be spending a lot of money for nothing. But if you get a false negative you could be doing something to save that species but don't.....so really I think the spending money for nothing is better and we still want to limit the type II errors. And I think the same for my research. (BIOL 1105 Notes; Brown et al, 2012)
40
What does statistical power depend on?
Alpha level. Sample size. The magnitude of the effect/difference we are studying. The variability (spread) in the data. The test we are using. (BIOL 1105 Notes)
41
What's the best way to increase statistical power?
Use a larger sample size. In an observational study where we can't control things this is also the only way which would apply to my research. (BIOl 1105 Notes)
42
What does a high statistical power indicate?
A really high power means that we'd virtually always correctly reject a false null hypothesis, i.e., it means we can more easily detect what we're looking for. (BIOL 1105 Notes)
43
Power Analysis
In a power analysis, the objective is to estimate the sample size needed to detect an effect (i.e. departure from the null hypothesis) with a reasonable level of power while allowing for a margin of error. (Brown et al, 2012)
44
Accuracy
How close a measurement is to the true value. (Zar, 2010)
45
Precision
How close repeated measurements are to each other. (Zar, 2010)
46
Effect Size
A value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event (such as a heart attack) happening. (Wikipedia)
47
Margin of Error
Expressed as +/- percentage points, margin of error tells you to what degree your research results may differ from the real-world results, revealing how different – more and less – the stated percentage may be from reality. A smaller margin of error is better as it suggests the results are more precise. https://www.qualtrics.com/experience-management/research/margin-of-error/
48
Collinearity and Multicollinearity How do you address?
When two predictor variables are correlated. When this happens, these variables cannot independently predict the response variable. Multicollinearity is when more than two are correlated. To address you need to check for this during analysis and may have to just keep one when performing hypothesis tests. https://www.britannica.com/topic/collinearity-statistics
49
Interactions (in Statistics)
The effect of one causal variable on an outcome depends on the state of a second causal variable (that is, when effects of the two causes are not additive). (Wikipedia)
50
Random Effects
Factors that that vary randomly across individuals or groups and affect the response variable. e.g., receiver location, individual differences, sampling year The most familiar types of random effect are the blocks in experiments or observational studies that are replicated across sites or times. Random effects also encompass variation among individuals (when multiple responses are measured per individual, such as survival of multiple offspring or sex ratios of multiple broods), genotypes, species and regions or time periods. (Bolker et al, 2009; Whoriskey et al, 2019)
51
Why did I decide to use generalized linear mixed models (GLMMs)?
Take random effects into account to prevent pseudoreplication. For example, telemetry data are usually collected on a random subset of individuals from a population. To conduct population‐level inference, individual ID, space, time, and receiver location can be included in the model. (Whoriskey et al, 2019)
52
Generalized Linear Mixed Models (GLMMs)
Models that combine the properties of two statistical frameworks that are widely used in EE, linear mixed models (which incorporate random effects) and generalized linear models (which handle nonnormal data by using link functions and exponential family [e.g. normal, Poisson or binomial] distributions). GLMMs are the best tool for analyzing nonnormal data that involve random effects. (Bolker et al, 2009)
53
Cronbach’s alpha
Cronbach’s alpha is a number between 0 and 1 that measures internal consistency reliability of Likert scales. Zero indicates low internal consistency reliability and 1 indicates high internal consistency reliability. Internal consistency reliability is how well a group of questions measure the same construct. In general, a good Cronbach’s alpha is between 0.75 and 0.90. Cronbach’s alpha is impacted by the number of questions with more questions producing a higher Cronbach’s alpha value, therefore, if the Cronbach alpha level is low, it is possible it is just a matter of needing to add more questions and not poor reliability. If the Cronbach’s alpha is above 0.90 the survey likely has redundant questions that can be removed. (Tavakol, 2011)
54
Ordinal Logistic Regression
Ordinal logistic regression is a statistical analysis method that can be used to model the relationship between an ordinal response variable and one or more explanatory variables (which can be discrete, continuous, or ordinal). This will be used for the social study as likert responses are considered ordinal since they are categorical and have no natural order and because I'm looking at the impacts of various responses on pro-environmental behaviour. https://cscu.cornell.edu/wp-content/uploads/91_ordlogistic.pdf
55
What are the assumptions of ordinal logistic regression?
The dependent variable is measured on an ordinal level. One or more of the independent variables are either continious, categorical or ordinal. No Multi-collinearity - i.e. when two or more independent variables are highly correlated with each other. Proportional Odds - i.e. that each independent variable has an identical effect at each cumulative split of the ordinal dependent variable. https://www.st-andrews.ac.uk/media/ceed/students/mathssupport/ordinal%20logistic%20regression.pdf
56
Thematic Analysis
One of the most common forms of analysis within qualitative research. It emphasizes identifying, analysing and interpreting patterns of meaning (or "themes") within qualitative data. (Wikipedia)
57
Home Range Analysis and Kernel Density
Home range analysis looks at the area an animal uses for the majority of its activities. Kernel density is one method to evaluate home range. It determines the probability of finding an animal at any one spot. (Calenge, 2023)
58
What are the benefits and disadvantages of using Likert scales?
Benefits: - Easy to implement - More standardized - Easier to quantify - Makes questions easier to answer for respondent Disadvantages: - If their real choice isn't listed, they're forced to choose another - Subject to bias
59
How do you code survey responses?
Each response in Likert scale gets assigned a particular number in a defined way (e.g., 5 = strongly agree = more positive, 0 = strongly disagree = more negative) then these numbers are used to find overall scores. Open-ended questions are assigned theme codes then summarized.
60
Why should the first step of data analysis be to identify outliers and what to do with them?
They can heavily impact results. (Ieno & Zuur, 2015)
61
What is the difference between absolute and relative abundance?
"Absolute abundance refers to the total number of organisms in a system (i.e., density or population estimates)." "Relative abundance provides and index (e.g., CPUE) of absolute abundance." (Quist et al, 2009 NA Sampling methods book)
62
Relative Species Composition
"The proportional (percentage) numerical or gravimetric abundance of a species within a collection of species." (Quist et al, 2009 NA Sampling methods book)
63
CPUE How does it relate to density?
Catch per Unit Effort "the number of fish sampled per unit of effort" "assumed to be directly proportional to density" (Quist et al, 2009 NA Sampling methods book)
64
What are the assumptions with CPUE?
"assumes that changes in CPUE reflect a proportional change in abundance, which is often not the case" Different sampling gears cannot estimate the same CPUE because the catchability of fish wth each is different. (Quist et al, 2009 NA Sampling methods book)
65
How is CPUE calculated? Which should biologists use? Why?
Either: 1. Divide the total number of fish by the total amount of effort. Or 2. "Calculate as outlined in 1 for each sampling unit (e.g., net set, electrofishing transect, then average." Biologists should use the second one especially if effort wasn't the same across samples. This is because the second method provides a more accurate mean with variances and these are needed when doing analysis. (Quist et al, 2009 NA Sampling methods book)
66