Midterm 1 Flashcards

(100 cards)

1
Q

Statistics

A

science of collecting, organizing, and analyzing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do biostatisticians look to achieve?

A

attempt to gain insight and draw conclusions using data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can stats lie?

A

No but they can be wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some ways to chart categorical data?

A

bar graphs and pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some ways to chart quantitative data?

A

histograms and scatterplots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some methods to organize and summarize raw data?

A

Graphically, numerically, and exploratory data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variable?

A

Characteristic of an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Classifying variables?

A

Questions to ask when designing or reviewing an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Categorical variable?

A

individual placed in a category-arithmetic operations cannot be applied to these data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Quantitative variable?

A

things that arithmetic operations can be performed on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a pie chart represent?

A

How one categorical variable breaks down into components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a bar graph represent?

A

Each characteristic is represented by a bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a histogram represent?

A

Summary graph from a single variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a dot plot represent?

A

Raw data. Used to describe patterns in variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a time plot represent?

A

Horizontal Variable (time). Changes in line between points show a change in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the vertical axis represent in a histogram?

A

Frequency or relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an extreme point known as?

A

Outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the mean?

A

Measures of location or measures of central tendency –
measuring center.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the median?

A

midpoint of the distribution such that half of the
numbers are smaller and the other half are larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the median if n is even?

A

mean of centre two numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the mode?

A

the most common or frequent value - a list can have more than
one mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Is the median resistent to outliers?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Is the mean resistant to outliers?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Quartiles?

A

Quartiles mark the mid point between the lower observation
and median and the median and the upper observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the five number summary?
Lowest number, Q1, median, Q3, and largest number
26
What is a graph with the five number summary?
Box plot
27
What is interquartile range?
Distance between first and third quartiles.
28
What is the standard deviation?
Measures variation around the mean
29
How do you organize a statistical problem?
State, plan, solve, conclude
30
What is a density curve?
Line drawn through historgam
31
Is a density curve generalizable?
Yes it ignores outliers
32
What do bars of histograms represent?
Area
33
What is the area under of density curve always equal to?
1
34
Median of the density curve?
the point where half the observations lie above and half below – point where there are equal areas left and right of median line
35
Mean of the density curve?
the balance point of the curve if it were made out of a solid material
36
What greek letters represent mean and standard deviation?
meuw (mean) and sigma (standard deviation)
37
What are Normal distributions (curves)?
Bell-shaped curves
38
Why are Normal distributions important?
1) Good descriptions for some distributions of real data. * 2) Good approximation to many chance outcomes. * 3) Many statistical inference procedures based on the Normal distribution.
39
What is the distance of 1 deviation on a bell curve?
The point of which the curvature changes
40
What is the 68-95-99.7 rule?
About 68% of all observations are within 1 standard deviation (σ) of the mean (μ). * About 95% of all observations are within 2 σ of the mean μ. * Almost all (99.7%) observations are within 3 σ of the mean
41
What is the shorthand for distribution curves?
N(mean, standard deviation) (standardization)
42
What does the z score represent?
indicates how far the observation falls from the mean and the direction. How many standard deviations away?
43
How are x and z related?
When x is larger than the mean, z is positive. * When x is smaller than the mean, z is negative
44
What does the Standard normal table show?
Area under standard normal curve to the LEFT of the z value
45
What is cumulative proportion?
proportion of observations that lie at or below x
46
What is a normal quartile plot?
Z values on x axis and regular values on y axis. Use technology to obtain these. Help to see trend of data.
47
Response variable?
dependent variable (y axis)
48
Explanatory Variable?
independent variable (x axis)
49
Bivariate Data
relationship between two variables
50
What is a common way to visualize the relationship between 2 variables?
Scatter plot (2 dimensions)
51
Where do either of the variables go on a scatter plot axis?
Response variable on vertical axis Explanatory variable on horizontal axis
52
What three factors are there to look for in a scatter plot?
Form, direction, and strength (and outliers)
53
What measurement is important for strength and direction on a scatterplot?
Correlation coefficient
54
What is strength of a scatterplot dependent on?
The scale of the axis
55
What does r represent in a scatterplot?
+/- means direction and closest to 1 means a strong correlation
56
Facts about r (correlation).
1) Correlation does not distinguish between explanatory and response variables. 2) Both variables need to be quantitative. 3) r has no unit of measurement so for any given data set, when the units of measure change, r does not. 4) Positive r indicates positive association between the variables; Negative r indicates negative association. 5) r is always a number between -1 and +1. Values near 0 indicate a poor relationship. 1 or -1 indicate a perfect linear relationship. 6) r is not resistant - greatly affected by outliers - use with caution with outliers. 7) r only measures strength of linear relationships - not curved relationships.
57
What is the linear line used in scatterplots known as?
Regression line
58
What does a regression line explain?
How y changes in terms of x
59
What method is used to have the best-fit regression line?
Least-squares method
60
What is the least squares regression line?
Line where the vertical distance of the data is at a minimum
61
What does a slope in a regression line represent?
Rate of Change
62
What does an intercept in a regression line represent?
the value of a when x =0
63
Should you use a regression line with an extreme outlier?
No
64
What is the coefficient of determination?
Correlation coefficient squared r^2
65
What does r^2 represent?
the fraction of variance in y that can be explained by the regression model
66
What are residuals?
Shows how far data stray from the regression line
67
What are vertical lines to regression line called?
Residuals
68
What does the +/- with residuals indicate?
– Residual is positive if it lies above the regression line. – Residual is negative if it lies below the regression line
69
What is a residual plot?
When the regression line lies horizontal to be able to compare residuals
70
What is an influential individual?
An outlier who if removed changes the regression line significantly
71
Extrapolation?
Expanding your data set. Do not do this!
72
Lurking variable?
a variable that has an important effect on the relationship but is not among the variables studied.
73
Observational study?
observing natural events. Confound lurking variables.
74
Experiment?
observation + manipulation of variables. Cause and effect relationship
75
What is a sample?
The part of the population we actually examine and for which we do have data
76
What is probability sampling?
individuals or units are randomly selected; the sampling process is unbiased
77
What is convenience sampling?
individuals or units are randomly selected; the sampling process is unbiased
78
What is single random sampling?
Everyone has a chance of being selected equally
79
What is a probability sample?
a sample chosen by chance
80
Stratified random sampling?
population divided into groups of similar individuals called strata.
81
What is interference?
using the sample to infer something about the population
82
What are cohort studies?
enlist individuals of common demographic and keep track of them over a long period of time (“prospective”). Individuals who later develop a condition are compared to those who don’t develop the condition
83
What are case-control studies?
start with 2 random samples of individuals with different outcomes, and look for exposure factors in the subjects’ past (“retrospective”)
84
What is an experimental unit?
Individuals of which an experiment is done on
85
What is a factor?
Explanatory variable (independent variable)
86
What is a treatment?
specific experimental condition
87
What is a confounding factor?
an explanatory (independent) variable that affects or distorts the relationship between another explanatory variable and its’ response (dependent) variable since it is related to both
88
What is a control group?
A treatment to which the other treatments are compared to eliminate the effects of lurking variables on the experimental outcome.
89
What is a placebo?
Fake experimental variable is given to the control group. Helps to make the experiment double blind.
90
What do randomized comparative experiments use?
Comparison and randomization
91
Why are Randomized comparative experiments considered the best designed experiments?
give good evidence that the treatments actually cause the differences observed in the response.
92
What are the factors that create an ideal experiment?
Control, randomize, and sample size
93
What is something a well-designed experiment can result in that other types of studies cannot?
A causation statement. Associations means causation. A causes B.
94
What is realism?
Purpose of experiments. Discovering how the universe and world around us works.
95
What is a block design?
subjects are divided into blocks (groups sharing a given characteristic) before the randomization, in order to account for possible differences between the blocks. lets us choose how many individuals of each block will receive each treatment.
96
What is a main outcome measure?
Most important result from experiment
97
What is a match pair design?
Combines randomization and matching
98
What is the placebo effect?
People think something is helping them, when really it is in their head
99
What is a double-blind experiment?
Neither the patients nor experimenters know who is getting a placebo and who is getting the real thing
100
What does bimodal mean?
Two peaks (two modes)