final Flashcards

(78 cards)

1
Q

statistics

A

a field of mathematics that develops and studies methods to collect, analyze, interpret, and present empirical evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

empirical vs anecdotal evidence

A

empirical - information received from the observation or measurements of patterns using experimentation
anecdotal - evidence collected in a casual or informal manner that relies heavily on personal testimony or conclusions (not statistical data collection)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

data

A

a collection of numerical facts or information from which conclusions can be drawn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

raw data

A

unformatted data (numerical measurements, instrument readings, text) that has not been processed or analyzed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

replicates

A

parallel measurements of a phenomenon to estimate variability in your sample (the number of replicates = n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

sampling effort

A

how much data do we need?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

precision vs accuracy

A

precision - how fine the divisions on a scale of measurement are
accuracy - how close to the truth our measurement is
(accuracy is the priority)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

descriptive statistics

A

quantitative description of observations sampled from a population (mathematically summarizing patterns, data centers, and variability without making conclusions about overall meaning of data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

data distribution (historgram)

A

sampled populations arranged by rank order and graphically presented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

normal distribution

A

an arrangement of data in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

central tendency

A

numeric value describing a central position in a dataset
mean, median, mode are all valid measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

skew

A

positive vs negative
positive - /_
negative - _/\

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

central limit theorem

A

if a population with finite variants is sufficiently sampled, the mean of all the samples from the population will be approximately equal to the mean of the population, AND the means from the samples will approach a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

steps of scientific method

A

planning - what are you going to do? learn the system, develop ideas about how the system works (maybe do a pilot study), decide hypothesis, figure out what data you will need
recording - collect and properly accord data, can take many forms, must record extremely carefully
analyzing - interrogate data to test hypothesis, analysis cannot be successful if data gathering was not designed with analysis in mind, should allow you to accept or reject null
reporting - disseminating methods and media will depend on the type of work and audience, statistical results must be reported using proper conventions, graphs must be properly labelled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

continuous data

A

data that can take any value (usually measured)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

discrete data

A

numerical data that can take a limited number of values (often counted)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ordinal data

A

data in categories that can be placed in order but the magnitude of difference between categories is not fixed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

categorical data

A

data in categories that can’t be usefully ordered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

null and alternate hypothesis

A

what do we test when we use them
test the null and decide if it is statistically probable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

random sampling

A

best choice, random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

systematic sampling

A

transects (sampling on a created line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

mixed sampling

A

stratified random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

haphazard sampling

A

when you are unable to randomly sample because of practicality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

mean, median, mode

A

mean - average
median - less skewed middle
mode - most frequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
quartiles
rank data from smallest to largest smallest is first number, largest is 5th median is third middle of first and third is 2nd, middle of fifth and third is 4th
26
why divide by n-1 when calculating varience
penalty for having a small amount of replicates
27
shapiro-wilk test for normality
takes a data distribution and determines whether it is significanyly different to normal p-value of less than .05 = not normal, reject Ho
28
SEM (standard error of the mean)
=Sx/sqrt n estimate of how close the sample mean is compared to the true population mean standard deviation of resampled mean
29
descriptive projects
30
difference projects
is a different to b, bar charts and box and whisker plots, categorical variable and want to know if the response variable differs between categories
31
correlation/regression projects
links between variables, usually quantitative variables are independent and quantitative variables are dependent
32
association projects
similar to correlations but with categorical data
33
how to calculate mean
bar x = (E^n i=1 * xi)/n
34
how to calculate median
the middle value
35
how to calculate mode
the most often
36
how to calculate range
rank order observations - highest-lowest
37
how to calculate variance
=(E6n i=1(xi-barx)^2/n-1 OR =SS/n-1
38
how calculate standard deviation
=sqrt(E^n i=1 (xi-bar x)^2/n-1) OR =sqrt(SS/n-1)
39
how to calculate standard error
=Sx/sqrt n
40
outcomes of hypothesis testing
test null p-value is the probability that the null hypothesis is correct from the data gathered
41
what project uses histograms
descriptive test
42
what projects use box plots
descriptive, difference (side by side)
43
what projects use scatterplots
correlation and regression
44
what projects use line plots
correlation and regression
45
what projects use pie charts
association
46
what probability is used as a threshold for hypothesises
.05
47
set up for a t-test
create hypothesis, collect data, data must be normally distributed, each point must be independent
48
what happens to t when mean, standard deviation, and n
when t increases, mean difference increases when t decreases, standard deviation increases when t increases, n increases
49
what test is needed to decide if data is appropriate for a t-test
find if the data are normal (boxplot and shapiro-wilk test) greater than .05 = the data is normal and a t-test can be done
50
one tailed vs two tailed t-test
one tailed - more power to detect directional effect (greater than or less than) two tailed - shows evidence that the difference between means is greater than expected
51
paired t-test
repeated observations collected for a single variable with 2 levels (differences between sample point 1 and sample point 2 are compered for the same sample unit)
52
how do non-parametric tests work
use the rank of data and rank from smallest to largest, compare the ranks mann - whitney (two sample) and wilcoxon (paired) tests
53
when do we have independent replicates
when the replicates are not connected to each other
54
simple pseudoreplication
only a single replicate per treatment and subsamples are collected from each area
55
sacrificial pseudoreplication
experimental units are replicated
56
temporal pseudoreplication
only a single replicate per treatment and subsamples are collected from it repeatedly over time
57
phylogenetic pseudoreplication
closely related individuals are the units being sampled (seeds, tadpoles, insect larvae)
58
technical pseudoreplication
different observers or instruments are used for different parts of the experiment
59
true positive
Ho is true and we fail to reject it
60
true negative
Ho is false and we reject it
61
false positive
Ho is true and we reject it type 1
62
false negative
Ho is false and we fail to reject it type 2
63
what is linear regression used for
to look for a relationship between quantitative independent and continuous variables
64
linear regression assumptions
data are independent and randomly selected data can be reasonably described by a linear relationship residuals are normally distributed residuals have constant variance regardless of x-value no extreme outliers (assumptions 3 and 4 are less important)
65
equation for best-fit line (linear regression)
y=mx+b y - dependent variable m - slope of the line x - the dependent variable b - the point that the line crosses the y-axis
66
what is the null hypothesis for linear regression
m=0 no relationship between x and y p-value of more than .05 = no relationship, fail to reject p-value of less than or equal to .05 = reject
67
p-value meaning for linear regression
tells us if there is a significant slope
68
r^2 for linear regression
how much of the variation in our dependent variable is explained by the regression r2=explained variation/total variation values range between 0 (none of the variation is explained by regression) and 1 (all of the variation is explained by regression)
69
non-parametric linear regression
when data do not meet assumptions Spearman's Rank does not give a slope or intercept tells if the null hypothesis should be rejected cannot assume the relation is linear
70
anova
difference between 3 or more levels of a categorical variable looks at variance in the dependent variable responses for each group
71
assumptions for anova
data are independent and randomly selected residuals are normally distributed around group means each within-group residual variance is equal no extreme outliers
72
hypotheses for anova
null - V1=V2=V3=Vt alternative - V1=V2=V3 use Tukey's HSD test
73
Tukey's HSD
used after getting a significant result for an anova test looks for pairwise differences controls the type 1 error rate - gives p-values for all pairwise differences
74
non-parametric anova
if assumptions are not met - kruskal-wallis test based on ranks p-value of greater than .05 = fail to reject null hypothesis p-value of less than or equal to .05 = reject the null hypothesis and conclude that one group has a different mean rank to at least one other group
75
chi-squared
used to compare two datasets that are categorical compares the observed data to what would be expected if the values for each variable did not depend on the values for the other
76
chi-squared hypothesis
null - no association between the variables alternative - association between the two variables
77
chi-squared p-value interpretations
p-value of greater than .05 = fail to reject p-value of less than or equal to .05 = reject the null and conclude there is an association between the variables, can look at our observed data to determine where the largest differences are between observed and expected
78