Describing Data Flashcards

1
Q

What does descriptive statistics do?

A

Helps to organise and summarise data in easily communicable mannger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are measures of central tendency?

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is the mean or median more affected by extreme values?

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What makes the mean more accurate?

A

Higher number of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the unit of mean the same as?

A

The unit of original measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a geometric mean?

A

When individual observations are log transformed, averaged and then back-transformed using antilog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advantage of geometric mean?

A

Will be closer to median if log-transformed data had symmetrical distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Difference between mean and geometrical mean?

A

Geometrical mean will be less

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is weighted mean?

A

Individual values are multiplied by weights (constants) attached to them before averaging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is weighted mean used?

A

When some individual observations are more or less valuable than others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Another name for the median?

A

50th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What data is median preferable for?

A

Nominal data when treated as values (not as counts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does 5th percentile mean?

A

The value below which 5% of observations lie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What type of data is mode mostly used for?

A

Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When can mode be useful for ordinal data?

A

To understand most common rating obtained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In which type of distribution are the mean, mode and median equal?

A

Normal, symmetric distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Where will median lie in skewed distribution?

A

Between mean and mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What happens to mean in positive skew?

A

Mean will be higher than median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Name some measures of variability

A

Range
Variance
SD
SE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is range?

A

Difference between highest and lowest scores in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the interquartile range?

A

Difference between 75th and 25t percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why does variance give more information than the range?

A

Includes scores in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Formula for variance

A

Sum of squared differences of individual observations from mean/(number of observations - 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is degrees of freedom?

A

N-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

When is variance high?

A

When scores are widely scattered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How is variance expressed?

A

In squared units of the original measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the formula for SD?

A

Square root of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the most commonly used measure of dispersion?

A

SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is coefficient of variation a measure of?

A

Relative spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How does one calculate the coefficient of variation?

A

Sd / mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Unit of coefficient of variation?

A

Percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Formula of SE?

A

SD / square root of sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What leads to smaller SE?

A

Larger sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What do authors use SE for?

A

To describe variability of sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What does SE give estimate of?

A

How the mean of the sample is related to the mean of the population
Precision and uncertainty of how study sample represents population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What does SD estimate?

A

Variability in study sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What does SE tell us of the mean?

A

How precise our estimate of the mean is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Graphs used for categorical and discrete numerical data

A

Bar chart

Pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Graphs for continuous data

A

Histogram
Dot plot
Scatter diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Difference between bar chart and histogram

A

No gaps between bars so data is continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How to draw a dot plot

A

Dot placed for each observation along one axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

When does dot plot become a scatter gram?

A

When dot plot is extended to two axes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What measures can be plotted on a scattergram?

A

Two continuous measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What happens in a steam and leaf plot?

A

Plot first few digits of numerical observation along vertical axis
Then add numbers to one or both sides to represent individual values of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is a box whisker plot?

A

Rectangle drawn encompassing 2nd and 3rd quartile of observations
Median value is the line cutting through the rectangle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What do whiskers in box whisker plot show?

A

Minimum and maximum values of observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Why is a normal distribution important?

A

A number of statistical tests assume data comes from normal distribution
In a normal population, the mean and variance (and SD) are not dependent on each other
Many natural phenomena are normally distributed
Central limit theorem

48
Q

What is the central limit theorem?

A

States that if we draw equally sized samples from a non-normal distribution, the distribution of the means of these samples will still be normal as long as the samples are large enough

49
Q

What sample size is large enough to give normal distribution for experimental purposes?

A

30

50
Q

Properties of normal distribution

A

Bell shaped
Mean, median and mode are same value
Curve is symmetric about the mean - skew is 0
Kurtosis is 0
Tials of curve reach close to x axis but never touch it

51
Q

What is kurtosis?

A

Flatness of the curve

52
Q

What parameters have to be specified to describe normal distribution

A

Mean - where the peak of the density occurs

SD - indicates spread of curve

53
Q

At a given value for variance, what will higher mean to do a cure

A

Shift curve to right

54
Q

At a given value for mean, what will higher Sd do to curve?

A

Decrease peakedness of curve

55
Q

At a given value for a mean, what will lower SD do to a curve?

A

Increase peakedness

56
Q

What is a leptokurtic curve?

A

Sharp peak

57
Q

What is a standard normal distribution?

A

Normal distribution whose mean is 0 and SD is 1 unit

58
Q

What is standard normal deviate expression denoted by?

A

z

59
Q

What is the formula for standard normal deviate?

A

(random value ‘x’ - mean) / SD

60
Q

Value of mean in negative skew?

A

Left of the median

61
Q

What is the interquartile range?

A

Distance from value at 1st quartile to value at 3rd quartile

62
Q

SE calculation

A

SD/square root of n

63
Q

Calculation for CI for population mean

A

Mean +/- 1.96 x SE

64
Q

What is Gaussian distribution?

A

Normal distrbution

65
Q

What do one tailed tests do?

A

Examine only one direction of alternative hypothesis

66
Q

What is usual value of beta?

A

0.2

67
Q

What is an unpaired test?

A

2 groups have different subjects

68
Q

What is a paired test?

A

Same subjects at different points in time

69
Q

Descriptions of categorical data

A

Mode

Frequency

70
Q

Descriptions of non-normal data

A

Median

Inter-quartile range

71
Q

Descriptions of normal data

A

Mean

SD

72
Q

Comparing two unpaired groups of categorical data

A

Chi-squared

Fischer’s exact test

73
Q

Comparing two paired categorical groups

A

McNemars

74
Q

Comparing two unpaired non-normal groups

A

Mann-Whitney U Test

75
Q

Comparing two paired non-normal groups

A

Wilcoxon’s rank sum test

76
Q

Comparing paired or unpaired normal data

A

Student’s t test

77
Q

Comparing > 2 paired categorial data

A

Chi-squared

78
Q

Comparing >2 unpaired categorial groups

A

McNemars test

79
Q

Comparing >2 unpaired non-normal groups

A

Kruskal-Wallis ANOVA

80
Q

Comparing >2 paired non-normal groups

A

Friendman test

81
Q

Comparing >2 normal data; paired or unpaired

A

ANOVA

82
Q

What do statistical tests give us?

A

Value for p

83
Q

What types of data are contingency tables used for?

A

Categorical

84
Q

X and Y axis for contingency tables

A

X: Outcome
Y: Risk/variable

85
Q

Impact of small sample size on correlation coefficient?

A

Less the value of r

86
Q

How can one dampen the effect of outlying values in small samples?

A

Using ranks of raw data instead of absolute numbers

87
Q

What is used if both variables are normal

A

Pearson

88
Q

What is used if 1 variable is normal, the other non-normal

A

Spearman

89
Q

What is used if 1 variable is normal, the other categorical

A

Spearman

90
Q

What is used if 1 variable is non-normal, the other normal

A

Spearman

91
Q

What is used if both variables are non-normal?

A

Kendall

92
Q

What is used if one variable is categorical and the other normal?

A

Spearman

93
Q

What is used if both variables are categorical?

A

Spearman

Kendall

94
Q

What does regression equation do?

A

Describes relationship between 2+ variables by an equation that has a predictive value

95
Q

What is needed to construct a regression line?

A

Regressoin equation

96
Q

What can a regression line represent?

A

Relationship between variables on a scattergraph

97
Q

Where on the scattergraph is the IV?

A

X axis

98
Q

Where on the scattergraph is the DV?

A

Y axis

99
Q

Equation of best fit for regression line

A

y=a+bx

100
Q

What is a in y=a+bx

A

intercept of the regression line on y axis

101
Q

What is b in y=a+bx

A

Regression coefficient (slope of regression line)

102
Q

What does b in y=a+bx describe

A

Strength of relationship

103
Q

What is x in y=a+bx

A

Value of IV

104
Q

What happens to PPV and NNV as prevalence of a disorder decreases?

A

PPV will decrease

NNV will increase

105
Q

What is serial testing?

A

When 2 or more tests are used in sequence until the test returns a negative result
A diagnosis is only confirmed if all tests return a positive test

106
Q

Advantages of serial testing

A

Increases specificity

Useful if treatment is hazardous

107
Q

What does larger AUC in ROC curve correspond to?

A

The better the test

108
Q

AUC of 0.5 in ROC curve?

A

Worthless test

109
Q

AUC of 1 in ROC cure?

A

Perfect test

110
Q

How is cumulative survival probability calculated?

A

When end event occurs, survival probabilities are determined by using survival probability prior to event occurring and adjusting this using post event survival rate of remaining uncensored subjects.

111
Q

Endpoint probability calculation?

A

1 - survival probability

112
Q

What is hazard?

A

Probability that a subject will have an endpoint at a given time

113
Q

What does hazard >1 mean

A

The factor increases risk of outcome

114
Q

What does hazard <1 mean

A

Factor decreases risk

115
Q

What does it mean if chi square is bigger than its degree of freedom?

A

Evidence of heterogeneity

116
Q

How does forest plot show evidence of heterogeneity?

A

CI do not overlap with other studies