Flashcards in Biostats Deck (50):

1

## In biostatistics name the steps in study design. There 5 steps.

###
1) Design of studies--> sample size/selection of study participants/role of randomization

2) Data collection variability --> important patterns in data are obscured by variability.

3) Inference -> draw conclusions from limited data

4) Summarize --> what summary measures will best convey the results

5) Interpretation --> what do the results mean in terms of practice, the program and the population

2

## What are the 4 types of data in biostatistics?

###
1) Binary (Dichotomous) data: yes/no answers

2) Categorical Data: either nominal (no ordering) or ordinal (ordering)

3) Continous Data: blood pressure, weight, etc

4) Time to event data: time in remission

3

## There are different statistical methods for different types of data. What two methods are used for binary data?

###
Fishers Exact Test

Chi-Square Test

4

## what method is used for continous data?

###
2 sample t test

wilcoxon rank sum (nonparametric) test

5

## How would you calculate the mean of a sample (sample average)?

### Add up data and then divide by the sample size

6

## What is the difference between population and sample in regards to data?

###
Population --> the entire group about which you want information (all women ages 30 and 40)

Sample --> a part of the population from which we actually collect information; used to draw conclusions about the whole population.

7

## How is population vs sample mean differentiated when it comes to statistical symbols?

###
Population Mean --> Mu

Sample Mean --> X

8

## The median number is the middle number. What happens when the sample size is an even number?

### Average the two middle numbers

9

## what are ways in which spread of the distribution can be explained?

###
Min and Max

Range --> min - max

sample standard deviation (SD)

10

## Why would a researcher feel it appropriate to make a histogram?

### Way of displaying the distribution of a set of data by charting the number of observations whose values fall within pre defined numerical ranges

11

## How would one go about making a histogram?

###
Divide the data into equal intervals

Count the number of observations in each class

Draw the histogram

Label scales

12

## Generally, now many intervals should you have in a histogram?

###
depends on the same size , n

usually the guideline is the square root of n

13

## What are other types of histograms?

###
frequency histogram

relative frequency histogram

relative frequency polygon

(note see lecture page 9 for images)

14

## There are several shapes of distribution when plotting data, explain what right skewed and left skewed and symmetrical means

###
Symmetrical --> right and left sides are mirror images (mean = median = mode)

Left Skewed (negatively skewed) --> long left tail; mean long right tail; mean> median (ex: hospital stays)

15

## Describe in general terms what probability density refers to?

### smooth idealized curve that shows the shape of the distribution in the population

16

## What are some features of a normal (gaussian) distribution

###
symmetric

bell shaped

mean = median = mode

(mean is the center) (SD is the spread)

17

## what does the 68--95-99.7 Rule mean?

###
In any normal distribution, approximately;

68% of the observations fall within one standard deviation of the mean

95% of the observations fall within two standard deviations of the mean

99.7% of the observations fall within three standard deviations of the mean

18

## What is a Z score?

###
Tells how many standard deviations from the population mean you are

Z = observation - population mean / SD

19

## What are the standard Z scores?

###
Z= 1 --> observation lies one SD above the mean

Z=2 --> observation lies two SD above the mean

Z = -1 --> observation lies one SD below the mean

Z= -2 --> observation lies two SD below the mean

20

##
If female heights, mean = 65 , s =2.5 inches

what is the Z score for 72.5 inches and 60 inches?

###
Z= 72.5

Z = 72.5 - 65/2.5 = +3.0 SD above the average

Z= 60

Z = 60-65/ 2.5 = -2.0 SD below the mean

21

##
Example:

Suppose the population is normally distributed: if you have a standard score of Z=2, what percent of the population would have scores greater than you?

###
2.5% (95% so total would be 5% but it asks for greater then)

(refer to the 68-95-99.7 rule)

22

##
Example:

If you have a standard score of Z=2, what % of the population would have scores less then you?

###
97.5% (this person is 2 SD away which would be 5% so therefore it would be 100-2.5 = 97.5 )

again refer to the 68-95-99.7 rule

23

##
Example:

If you have a standard score of Z=3, what % of the population would have scores greater than you?

###
.15% (this person is 3 SD away which is 0.3% total however this asks for greater then so therefore the answer would be 0.15% )

again refer to the 68--95-99.7 rule

24

##
Example:

If you have a standard score of Z=-1.5, what % of the population would have scores less than you?

###
this requires a table 6.68%

however knowing that 2 SD would be 2.5% therefore the answer has to be higher then 2.5% but less then 16%

25

##
Example:

Suppose we call "unusual" observations those that are either at least 2 SD above the mean or about 2 SD below the mean. What % are unusual? (in order words, what % of the observations will have a standard score either Z> +2.0 or Z 2?

### 5% of outside of 2 (again this is known from the rule of 95%)

26

## what % of the observations would have Z > 1.0 (aka more than 1 SD away from the mean?

###
32%

(again 100-68 = 32)

27

## What % of the observations would have Z > 3.0?

###
0.3%

(again 100-99.7 = 0.3)

28

## What % of the observations would have Z > 1.15

###
well Z > 1.0 would be 32% and Z >2.0 would be 5%

so therefore the answer would be between 32% and 5%

29

## what is the difference between a parameter and a statistic?

###
Parameter --> number that describes the population; this is a fixed number (population mean; population proportion)

Statistic --> number that describes a sample of data; can be calculated (sample mean; sample proportion)

30

## What are errors from biased sampling?

###
study systemically favors certain outcomes

voluntary response

non response

convenience sampling

solution? randomly sampling

31

## what are errors from random sampling?

###
caused by change occurrence

get a bad sample because of bad luck

can be controlled by taking larger sample

32

## When a selection procedure is biased does taking a larger sample help?

###
no

this just repeats the mistake on a larger scale

33

## When a sample is randomly selected from the population, it is called what?

### random sample

34

## What is an advantage to random sample?

###
helps control systematic bias

(however there is still some sampling variability or error)

35

## If we repeatedly choose samples from the same population, a statistic will take different values in different samples, what is this called?

### Sampling Variability

36

## The spread of a sampling distribution depends on the sample size. Is it better to have a bigger or smaller sample size?

###
larger unbiased samples are better

larger samples also give us more tightly clustered histograms therefore more values are closer to the mean

37

## If the researcher was to increase the sample size by a factor of 4 what would happen to the spread?

### The spread each time will be cut in half

38

## Describe the sampling distribution

###
what the distribution of the statistic would look like if we chose a large number of samples from the same population

it describes the distribution of all sample means, from all possible random samples of the same size taken from a population.

39

## What is the central limit theorem?

###
Provided this mathematical result: sampling distribution of a statistic is often normally distributed

For the theorem to work, it requires the sample size (n) to be large (n >60)

40

## What is a standard errors (SE)?

### Measures the precision of your sample statistic such as the sample mean or proportion that is calculated from a number (n) of different observations.

41

## As the sample size gets bigger what happens to the standard error?

### gets smaller and therefore the more precise the sample mean is.

42

## Standard Error of the Mean (SEM) is again a measure of the precision of the sample mean. What is the formula to calculate SEM?

###
s/square root n

example: blood pressure on random sample of 100 students

Sample Size: n=100

Sample Mean: X=123.4

Sample SD: s= 14.0

SEM: 14/sq.root 100 = 1.4mmHg

43

## How close to population mean (mu) is sample mean (X)?

###
the standard error of the sample mean tells us 95% of the time the population mean will lie within about 2 standard errors of the sample mean.

X+- 2SEM

123.4 +- 2 x 1.4

123.4 +- 2.8

we are 95% confident that the sample mean is within 2.8mmHg of the population mean. The 95% error bound is 2.8

44

## From the blood pressure example, what would be the 95% Confidence Interval (CI)?

###
123.4 +- 2.8

We are highly confident that the population mean falls in the range 120.6 to 126.2

45

## Is a 99% or 90% CI wider?

###
99% CI is wider

90% is narrower

46

## The length of CI decreases (narrower) when n and s do what?

###
n increases

s decreases

(level of confidence decreases)

47

## what are the two underlying assumptions for a 95% CI for the population mean?

###
Random Sample of Population

Sample Size n is at least 60 to use +- 2SEM

48

## How would one calculate 95% CI for mean if sample size is smaller or larger then 60?

###
based on a t- table

df is degrees of freedom: n-1

according to the df you find the t value

49

##
For example if:

n=5

X= 99mmHg

s= 15.97 then what is the 95% CI?

###
99 +- 2.776 (from t table) x SEM 15.97/sq. root 5

99 +- 2.776 x 7.142

99 +- 19.93

The 95% CI for mean blood pressure is:

(79.17, 118.83)

50