2 - Me, Myself and I Flashcards
(32 cards)
Why is biology a considered a quantitative subject?
Research relies heavily on accurate and precise measurements, and variables are often manipulated and use of controls allow us to observe cause and effect relationships. We can quantify diversity via experiments, and using this data we can observe trends and create graphs; We can also create hypotheses, making predictions and establishing causes.
Why are statistical analyses important?
Mathematical models and statistical analyses are important as they help us to understand genetic data and complex mechanisms.
What is the mean, median and range?
Mean - the average of a dataset.
Median - the middle value of a dataset.
Range - maximum and minimum values of a dataset (spread of variable values).
What is the difference between a sample and population?
A sample is only a small subset of the total population, and so when looking at results of a sample we must take this into consideration - the data collected will/may differ from the wider population.
The population is all members of a defined group.
What is a sampling error and its causes?
A sampling error is the random variation introduced into a dataset as a function of only sampling a subset of the total population - there is a difference in the value(s) from the sample compared to the true population value(s).
Ways to represent categorical data (1).
Chi-square test assumes variables are categorical (can be divided into groups), independent and are >5. It can be used where the observations are assigned into mutually exclusive classes - these are compared to those under the null.
Ways to represent continuous data (1).
Boxplots are effective in presenting continuous data (changes over time) - it shows the median, range, IQR and dots for outliers.
What is the null hypothesis?
The default expectation that categorical outcomes are all equally likely and so there is no relationship or association.
What is the alternative hypothesis?
The expectation that categorical outcomes are not all equally likely, and so there is a relationship between two measured phenomena, or association.
What are the degrees of freedom?
This refers to the number of values in a calculation that are free to vary - minimum is 1.
What does p<0.05 mean?
The probability is statistically significant and so we can reject the null and accept the alternative.
How does statistical significance relate to the p-value?
p < 0.05 means that there is strong evidence supporting the alternative hypothesis. So if it is small, we are more inclined to reject the null and favour the alternative as we now have less than a 5% chance of seeing a trend/deviation following the null.
How can p-values be used as evidence?
If the p-value is sufficiently lower than 0.05, then we know to reject the null as the probability of it occurring is little - but too close to the threshold may incline you to repeat the experiment and increase sample size; This then allows us to see whether we support the null or an alternative through statistical analysis. It also allows us to see whether the deviation from the null is likely due to sampling error.
What is a type I and II error?
Type I: false positives e.g. p = 0.049.
Type II: false negatives e.g. p = 0.051.
In both cases we should not reject/accept any hypotheses, and instead should collect more data.
These sampling errors may arise due to sample size (data collection) and experimental design.
What is effect size?
The effect size is the degree to which the phenomena affects the whole population, and not just the sample - the magnitude of the effect. Small effect size - indicates minimal/negligible effects, large - substantial effects.
Why is biological context important?
Context surrounding data is key as we need to know at which level the effect is observed at - like population - and the data must be analysed with sample and effect size taken into consideration. We do not know the cause and effect relationship solely from the statistics.
What is a t-test and when would you use it?
It determines whether the mean of one group is statistically different from the mean of another (only suitable for two variables). It can be used for a boxplot.
What is a regression model and its uses?
A regression model describes the relationship between a response and explanatory variable - cause and effect (with context). This can be used for a scatter plot.
What is the line of best fit in relation to regression models?
The line of best fit represents the relationship between the two variables (dependent and independent) in the regression model. There will be residuals that do not fit with the trend, and these are the differences between observed and predicted values. The line is calculates the values which minimises the amount of these residuals (hence best fit).
What is the workflow for data analysis?
(1)Plot data, (2) initial visual analysis, (3) statistical test, (4) interpret test output, (5) interpret output in biological context.
What is the R^2 value?
The proportion of the variance of our response variable explained by the explanatory variable - will be between 0 and 1. The closer to 1 the stronger the relationship/association - 0 means none.
What is a biphasic relationship?
Two phases can be identified.
What is the 95% confidence interval?
A range where it is likely that the true population average is located - given we never know this for sure.
How does sample size relate to error margin?
The smaller the sample, the greater margin for error, and so there is a larger 95% confidence interval.
The larger and more unbaised the sample distribution, the better representation of the population distribution, and so the interval decreases in size as we know the sample average will be closer to that of the true population.