Stats Flashcards

(53 cards)

1
Q

Qualitative Variable

A

A non-numerical variable, like hair colour or eye colour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quantitative Variable

A

A numerical variable, like length, time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Continuous variable

A

Can take any value within a given range, e.g. height, time, age,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Discrete variable

A

Can only take certain values, e.g. shoe size, cost in £ and p, number of coins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of data do histograms deal with?

A

Continuous data –> no spaces between the bars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define the mode

A

The value which occurs most often.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define the mean average

A

Where you add up all the numbers and then divide by the number of numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the median

A

The median is the middle number in an ordered list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When is the mode used?

A

You should use the mode if the data is qualitative (colour etc.) or if quantitative (numbers) with a clearly defined mode (or bi-modal). It is not much use if the distribution is fairly even.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is the mean used?

A

This is for quantitative data (numbers), and uses all pieces of data. It gives a true measure, and should only be used if the data is fairly symmetrical (not skewed).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is the median used?

A

You should use this for quantitative data (numbers), when the data is skewed, i.e. when the median, mean and mode are probably not equal, and when there might be extreme values (outliers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the range?

A

The range is the largest number minus the smallest (including outliers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What’s an outlier?

A

An extreme value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ordinal variables

A

An ordinal variable is a categorical variable for which the possible values are ordered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Nominal variables

A

Nominal variables have two or more categories without having any kind of natural order. They are variables with no numeric value, such as occupation or political party affiliation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Binary variables

A

Binary variables are variables which only take two values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the various sources of data?

A
Routinely collected data:
- Mortality and census data
- Hospital activity data
- Primary care data
- Infectious disease notifications
- Regular national surveys (e.g. Health Survey for England)
Research study data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Continuous variables

A

Have numerical values

Measurements are on a continuous scale i.e. can take an infinite number of distinct values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Discrete variables

A

Also known as count variables

Have numerical values, but they must be integers e.g. Number of fillings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the appropriate graphical presentations to use for 1 categorical variable?

A

Bar chart
Pie chart
Frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the appropriate graphical presentations to use for 1 continuous variable?

A

Histogram

Bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the appropriate graphical presentation to use for a categorical outcome and categorical exposure?

A

Contingency table

23
Q

What is the appropriate graphical presentation to use for a numerical outcome and categorical exposure?

A

Box and whisker plot

24
Q

What is the appropriate graphical presentation to use for a numerical outcome and numerical exposure?

25
What are other terms for 'exposure'?
``` Explanatory variable Independent variable X variable Risk factor Treatment group ```
26
What are other terms for 'outcome'?
``` Response variable Dependent variable Y variable Case/control group Disease group ```
27
Bar charts
heights of the bars are proportional to the frequencies | useful for comparing the frequencies in each category relative to the others
28
Pie charts
areas of the sectors are proportional to the frequencies | useful for comparing the frequencies in each category with the whole group
29
Normal distribution
A distribution where the mean, median, and mode are roughly similar A probability distribution that describes data that is symmetric around a mean
30
Describe positive skewness
Mode < median < mean | Also known as a right-skewed distribution because it has a long right tail.
31
Describe negative skewness
Mode > median > mean | Also known as a left-skewed distribution because it has a long left tail
32
Define the term 'standard deviation'
Measure of spread of observations around the mean
33
Define the term 'interquartile range'
Range from first (25%) to third (75%) quartiles of a distribution
34
What is a distribution?
* describes the frequency (or probability) of occurrence for a given value * describes the shape of the data
35
What can we do with a distribution?
- make inferences about a wider population - generate confidence intervals (assessing variability of estimates) - test hypotheses - calculate sample size
36
Define skewness
A measure of the asymmetry of the distribution
37
What is a null hypothesis?
A hypothesis saying that the outcome is not associated with the exposure
38
What is an alternative hypothesis?
A hypothesis saying that the outcome is associated with the exposure
39
Why use statistical tests?
We use statistical tests to help us judge if our observed effect size is due to chance or if it is real.
40
What is the significance level?
* The probability that you will find an effect that does NOT actually exist * Strength of evidence needed to reject NULL hypothesis * Normally set to 5%
41
Define the term 'inferential statistics'
Inferential statistics allows you to make predictions (“inferences”) from that data. With inferential statistics, you take data from samples and make generalisations about a population.
42
What is meant by standard error?
* Standard Error is an inferential statistic. | * It is an estimate of how variable a statistic would be if we repeated our study numerous times.
43
What are p-values?
P-values give the probability that we observed an effect size as large as we did if the null hypothesis is true i.e. effect size is zero
44
What do p-values tell you?
P-values tell us the strength of the evidence against the null hypothesis that there is no association. As the p-value decreases the evidence against the null hypothesis increases.
45
What do confidence intervals tell you?
The confidence interval shows the range of values in which the true effect size is likely to lie. A 95% confidence interval tells us that in 95% of replicate experiments, the true value will lie in the interval.
46
How can the concept of disease be influenced?
* Evidence of symptoms * Technological and medical development * Sociocultural environment
47
Define the term 'abnormality'
Different from what is usual or average, especially in a way that is bad
48
List the three types of abnormality
* Abnormal if unusual * Abnormal if associated with clinical abnormality * Abnormal if increased risk of future disease
49
Abnormal if unusual
Common in laboratory testing to define normal as the range which includes 95% of values found in healthy subjects. This means abnormal is the top and bottom 2.5% of the population
50
What's wrong with defining abnormal as unusual?
By definition 5% of healthy people will have “abnormal” i.e. “unusual” values
51
Abnormal if associated with clinical abnormality
More logical to label values of a test as abnormal if these values are clearly associated with the presence of a disease state.
52
What's wrong with defining abnormal as being associated with clinical abnormality?
There's almost always overlap between values in diseased subjects and those in healthy subjects
53
Abnormal if increased risk of future disease
A biochemical measure in asymptomatic individual may be associated with future disease in a causal way