Me Myself and I Flashcards by Ruaridh Duncan

Why is biology considered a quantitative

Biology involves collecting and analyzing quantitative data to test hypotheses and make prediction. Biological research relies on quantitative measurements of different parameters.

How well did you know this?

Not at all

Perfectly

Why is understanding data so important?

There is a lot of variation within biology, by understanding data, we can establish causes and make predictions. We can establish cause-effect relationships by manipulating variables and understanding the resultant data.

How well did you know this?

Not at all

Perfectly

Why aren’t excel files used to collect data?

Excel files cant be easily opened and used in other software, CVS files are easier to work with.

How well did you know this?

Not at all

Perfectly

How can graphs and statistics help you understand data?

Graphs allow you to visualize data, statistics allow you to summarize data.

How well did you know this?

Not at all

Perfectly

What is the difference between samples and populations?

A sample is usually only a subset of the total population. A population is all the members of a defined group.

How well did you know this?

Not at all

Perfectly

How can we make sure our sample is representative of the population?

Sampling bias is the result of poor experimental design and biases the results. In order to avoid this, the sample should be as representative of the entire population as possible.

How well did you know this?

Not at all

Perfectly

What is sampling error?

The random variation introduced into a data set as a result of only sampling a subset of the population. The results from your sample may not be applicable to the entire population.

How well did you know this?

Not at all

Perfectly

What type of data collection may create bias?

Self-reported data can create bias due to inaccurate reports.

How well did you know this?

Not at all

Perfectly

Why is statistical testing important?

Its not enough to look at the data and make assumptions, it is important to statistically test the reliability and significance of our findings.

How well did you know this?

Not at all

Perfectly

What is a statistical hypothesis?

A statistical hypothesis is a statement or assumption about the characteristics of a population or the relationship between variables that is subject to statistical testing

How well did you know this?

Not at all

Perfectly

What is the null hypothesis?

The default expectation that categorical outcomes are equally likely, and there is no relation between two measured phenomena or that there is no association between groups.

How well did you know this?

Not at all

Perfectly

What is an alternative hypothesis?

The expectation that categorical outcomes are not equally likely, that there is a relation between two measured phenomena or an association between groups

How well did you know this?

Not at all

Perfectly

When should you used a chi-squared test?

When comparing two categorical variables e.g. diabetes prevalence amongst males and females

How well did you know this?

Not at all

Perfectly

When should you use a t-test?

When comparing one categorical variable with one continuous variable- compares the means of both groups e.g. blood pressure amongst males and females

How well did you know this?

Not at all

Perfectly

When should you use a general linear model?

When trying to establish a relationship between two continuous variables e.g. height and finger length.

How well did you know this?

Not at all

Perfectly

What is statistical significance?

Statistical significance is the claim that the produced results would be very unlikely under the null hypotheses, so there is a relationship or association between variables.

How well did you know this?

Not at all

Perfectly

What is the p-value?

P-value is a measure of statistical significance, it is the probability of the shown results occurring under the null hypothesis

How well did you know this?

Not at all

Perfectly

What is the p-value threshold?

Study These Flashcards

The p-value threshold is 0.05. If a p-value is lower than this, results are generally considered statistically significant.

Why does a very low p-value indicate more significance?

Study These Flashcards

The lower the p-value, the less likely the results would occur under the null hypothesis. A p-value of 0.05 suggests there is a 5% chance the data would show randomly under the null hypothesis.

Why may a p-value very close to the threshold not be reliable?

Study These Flashcards

A p-value very close the the threshold may be due to a sampling error. If this is the case, bigger more representative samples may help reduce sampling bias

What is a type one error?

Study These Flashcards

False positive, this is when results provide evidence against the null when it is true.

What is a type two error?

Study These Flashcards

False negative, this is when results provide evidence for the null when it is not true.

What is effect size?

Study These Flashcards

Effect size is the magnitude of the effect seen in the results. The p-value may be very low but the effect may be negligible e.g. drug trial shows very low change in variable is not useful. Effect size with continuous variables measures the strength of the association and the gradient of the line.

Why is biological context important for interpreting data?

Study These Flashcards

Biological context helps interpret data correctly, can give a explanation to why a result has occurred e.g. spikes in names such as harper when Beckhams daughter was born, or drops in names such as alexa.

What data does a boxplot show?

A boxplot is a efficient way to present continuous data against categorical data. It shows the median, max/min, interquartile range and outliers.

What does a statistically significant result indicate?

A statistically significant result indicates likelihood, we can use a statistically significant result to make predictions but it does not have 100% confidence.

What is the difference between correlation and causation?

Correlation is when two variables are related to each other, causation is when one variable causes a change in another.

What can correlation indicate?

1. Variable x causes variable y 2. Variable y causes variable x 3. A third independent variable is causing both.

What does a t-test tell you?

Whether the means of two groups are statistically different.

What does a regression model produce?

A regression model produces a p-value testing how significant the results are, a y-intercept and gradient of the line of best fit, and an r2 value to measure the strength of correlation.

What is a line of best fit?

The straight line that best represents the relationship between the dependent and independent variable. The sum of residuals will be close to zero on a strong line of best fit.

What are residuals?

Residuals are the differences between the observe and those predicted by the regression line.

How can we use the line of best fit?

The line of best fit can be used to predict the value of one variable at the value of the other. The straight line equation can be used to do this.

What is the workflow for data analysis?

1. Plot data 2. Initial visual analysis 3. Statistical tests 4. Interpret test output 5. Interpret results in biological context

Why is a 95% confidence interval used?

When using a sample, it is likely that the results will not be exactly equivalent to the population. The 95% confidence interval is the range in which the population will lie 95% of the time.

Why is using large samples important?

The larger the sample the more likely it is to be representative of the population. Larger samples = smaller 95% confidence intervals.

What does overlapping confidence intervals mean?

When the intervals overlap, the results are not likely significant.

How does football relate to height?

Football is a male dominated sport so football players are usually taller than non football players, it is more likely a measure of gender against height. Example of third independent variable.

When is a fishers exact test used?

When you have a small sample size.

What is a multi-variate model used for?

A multi-variate model can be used to test multiple independent variables against a dependent variable. They can tell you how much each variable contributes to your results. This allows you to control for confounding variables.

What is the difference between cross sectional data and longitudinal data?

Cross sectional data is taken all at one time point by collecting data from different individuals and generations (cohorts). Longitudinal data follows an individual over a long period of time.

What is multiple testing?

Testing many variables at once and hoping one is significant. Research question should be decided before experiment takes place. This often results in real co-variables but also chance associations.

What is cherry picking?

Cherry picking is only presenting positive results and ignoring other findings. This is a misrepresentation of the data. All positive and negative data should be published.

Me Myself and I Flashcards

(43 cards)