Statistics Flashcards

1
Q

What are the two main types of data

A

quantitative

qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is ordinal data

A

the data can be given a meaningful order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is nominal data

A

there is no relationship that is meaningful in terms of order of the categories ie. it is just name e.g. atkins diet and paleo diet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is binomial data

A

there are only two options e.g. yes or no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a random sample

A

one in which each member of the population has an equally likely non zero chance of being included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a stratified sample

A

one in which certain categories of the population must be represented e.g. if we know the library is 50 percent history books, 30 percent science and 20 percent others. in a sample of 20 we must select 10 history books, 6 science and 4 others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a convienience sample

A

one that is not chosen randomly but is all that is available eg. all patients at an outpatient dermatology clinic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

when would you use a bar or pie chart

A

categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when would you use histograms, stem and leaf plots and box and whisker plots

A

to visualise continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what does a scatter plot show

A

the relationship between two variable and how one changes in relation to the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

when would you use the mean and when would you use the median to describe the centrality of data

A

mean - normal distriuted not skewed data
media- if data is more skewed or significant outlier
mode- used for qualitative data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what do you do differently when calculating the sample variance/sd as opposed to the population

A

use n-1 as the denominator instead of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does the standard deviation show

A

the spread of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does positively skewed mean

A

that more of the values are clusted towards the bottome of the scale - such as alcohol intake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is negatively skewed

A

most of the values are clustered at the higher range of the scale - rare in clinical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the coefficient of skewness

A

a value which shows how skewed the data is - the closes to 0 the more symmetrical the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what does a value of 0 for the kurtosis mean

A

indicates that the shape of the data is close to the normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is inference

A

making predictions about a population based on the data collected from a smaller sample or series of smaller samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are the characteristics of a normal distribution

A
continuous
symmetrical
bell shaped curve
mean, median and mode are equal
single central peak
values between -infinity and +infinity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the binomial distribution

A

for binary data e.g. dead/alive, male/female

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the poisson distribution

A

for events which occur at random intervals of time or space e.g. deaths per year.
rare events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is the mean and sd of a standard normal distributions

A

mean = 0
sd = 1
we write z~ N (0,1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

where would you expect 95 percent of values to like in normally distributed data

A

mean +/- 1.96 x SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how can you assess the normality of data

A

Informal review of properties of normal distribution
Inspection of a normal plot
Shapiro- Wilk test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Name ways in which you can transform data to make it plausibly normal and when you would use each one

A
Logarithmic - fairly skewed data in which the variances are proportional to the mean
Square root - countrs
Reciprocal - highly skewed data
Cube- volumes
Logit - proportions
26
Q

what is the variance of expected number of events

A

nxpx (1-p)

27
Q

when can the binomial distribution be approximated to normal

A

if np > 5 and n(1-p) >5

28
Q

what is the standard error

A

the standard deviation of the mean

29
Q

When can you make inferences about sample means based on the normal distrubution

A
  1. sample is selected from normal population with known SD or the sample size is large
    2 observations in the sample are independent
30
Q

when should the hypothesis be defined

A

before data is collected

31
Q

what is a type 1 error

A

rejecting a true null hypothesis

32
Q

what is a type 2 error

A

accepting a false null hypothesis

33
Q

What does the level of significance of a test mean

A

the probability of making a type one error

34
Q

what is the generally accepted risk of making a type 2 error

A

20 percent

35
Q

if your significant level is 5 percent what is you confidence level

A

95 percent

36
Q

When is students t distribution used

A

When the population standard deviation is not known - for normally distributed data

37
Q

What is the degrees of freedom in t distribution

A

one less than the sample size

38
Q

What is the difference between and independent and dependent sample

A
independent = different people
dependant= same people

however if samples from two different groups are match e.g. for age and gender the sample could then be viewed as dependant

39
Q

What are the steps that can be done to compare the means of two samples with incomparable sample variances

A
  1. investigate the relationship between the means and variances
  2. use Welch’s modified t test
  3. do non parametric tests
  4. do not process with the test of the means
40
Q

what does it mean if the F statistic is not significant

A

the variances of the two samples are comparable

41
Q

when can you use the normal approximation for a binomial trial

A

if both np and n(1-p) are greater than 5

42
Q

what is regression

A

provides information about the nature of the relationship e.g linear

43
Q

what is correlation

A

asses the extent of the associations between two variables

44
Q

when is a logistic regression used

A

when one variable is categorical

45
Q

how do we measure the linear relationship between two variables

A

correlation coefficient

46
Q

what is the most commonly used measure of correlation

A

Pearson’s product moment correlation coefficient (r)

47
Q

What are the three main points to remember about r

A

r value increases with sample size
at least one value should be normally distributed
random sample
the pairs of variables are independent
correlation can be mathematically significant but not clinically significant

48
Q

what is r squared

A

measure of the proportion of the variation in the dependent variable which is attributable to its linear relationship with the independent variable

49
Q

what assumptions are made when using regression methods

A

correlation between x and y significant
for each value of the x variable, the values of the y variable have a normal distribution
variances of these normal distributions are equal

50
Q

up to what sample size can the Shapiro wilk test provide a test for normality

A

up to 2000

51
Q

what do the results of the Shapiro wilk test mean

A

closer to 1 = the more normal the data is

52
Q

what is a cohort study

A

a group of disease free subjects are followed up over time

53
Q

what is a case control study

A

retrospective study of people with a disease. compares factors they have been expose to with controls

54
Q

advantages and disadvantages of a cohort study

A

less likely to be biased
expensive
not suitable for rare diseases

55
Q

what are the advantages/disadvantages of a case control study

A

cheap and easy to do

could be biased

56
Q

what factors influence the sample size needed in a study

A
significance level
power of the test
size of effect to be identified
standard deviation of the measurements- greater the SD the greater the sample size needed
study design
practical issues
57
Q

when are parametric tests used

A

for normally distributed data

58
Q

when are non parametric tests used and name examples

A

for skewed i.e. not normally distributed data

e.g. chi squared, Wilcoxon, sign

59
Q

what is another name for non parametric tests

A

distribution free

60
Q

what is the disadvantage of non parametric techniques

A

they are less powerful than parametric techniques as such all efforts to transform data to approximate normal distribution should be done first

61
Q

what do non parametric techniques use as a representative of centre

A

the median

62
Q

when should a large sample Wilcoxon statistic which approximated the normal be used

A

when n>25