Flashcards in Stats Deck (53)

Loading flashcards...

1

## Qualitative Variable

### A non-numerical variable, like hair colour or eye colour

2

## Quantitative Variable

### A numerical variable, like length, time

3

## Continuous variable

### Can take any value within a given range, e.g. height, time, age,

4

## Discrete variable

### Can only take certain values, e.g. shoe size, cost in £ and p, number of coins.

5

## What type of data do histograms deal with?

### Continuous data --> no spaces between the bars

6

## Define the mode

### The value which occurs most often.

7

## Define the mean average

### Where you add up all the numbers and then divide by the number of numbers

8

## Define the median

### The median is the middle number in an ordered list.

9

## When is the mode used?

### You should use the mode if the data is qualitative (colour etc.) or if quantitative (numbers) with a clearly defined mode (or bi-modal). It is not much use if the distribution is fairly even.

10

## When is the mean used?

### This is for quantitative data (numbers), and uses all pieces of data. It gives a true measure, and should only be used if the data is fairly symmetrical (not skewed).

11

## When is the median used?

### You should use this for quantitative data (numbers), when the data is skewed, i.e. when the median, mean and mode are probably not equal, and when there might be extreme values (outliers).

12

## What is the range?

### The range is the largest number minus the smallest (including outliers).

13

## What's an outlier?

### An extreme value

14

## Ordinal variables

### An ordinal variable is a categorical variable for which the possible values are ordered.

15

## Nominal variables

### Nominal variables have two or more categories without having any kind of natural order. They are variables with no numeric value, such as occupation or political party affiliation.

16

## Binary variables

### Binary variables are variables which only take two values.

17

## What are the various sources of data?

###
Routinely collected data:

- Mortality and census data

- Hospital activity data

- Primary care data

- Infectious disease notifications

- Regular national surveys (e.g. Health Survey for England)

Research study data

18

## Continuous variables

###
Have numerical values

Measurements are on a continuous scale i.e. can take an infinite number of distinct values

19

## Discrete variables

###
Also known as count variables

Have numerical values, but they must be integers e.g. Number of fillings

20

## What are the appropriate graphical presentations to use for 1 categorical variable?

###
Bar chart

Pie chart

Frequency table

21

## What are the appropriate graphical presentations to use for 1 continuous variable?

###
Histogram

Bar chart

22

## What is the appropriate graphical presentation to use for a categorical outcome and categorical exposure?

### Contingency table

23

## What is the appropriate graphical presentation to use for a numerical outcome and categorical exposure?

### Box and whisker plot

24

## What is the appropriate graphical presentation to use for a numerical outcome and numerical exposure?

### Scatter plot

25

## What are other terms for 'exposure'?

###
Explanatory variable

Independent variable

X variable

Risk factor

Treatment group

26

## What are other terms for 'outcome'?

###
Response variable

Dependent variable

Y variable

Case/control group

Disease group

27

## Bar charts

###
heights of the bars are proportional to the frequencies

useful for comparing the frequencies in each category relative to the others

28

## Pie charts

###
areas of the sectors are proportional to the frequencies

useful for comparing the frequencies in each category with the whole group

29

## Normal distribution

###
A distribution where the mean, median, and mode are roughly similar

A probability distribution that describes data that is symmetric around a mean

30

## Describe positive skewness

###
Mode < median < mean

Also known as a right-skewed distribution because it has a long right tail.

31

## Describe negative skewness

###
Mode > median > mean

Also known as a left-skewed distribution because it has a long left tail

32

## Define the term 'standard deviation'

### Measure of spread of observations around the mean

33

## Define the term 'interquartile range'

### Range from first (25%) to third (75%) quartiles of a distribution

34

## What is a distribution?

###
• describes the frequency (or probability) of occurrence for a given value

• describes the shape of the data

35

## What can we do with a distribution?

###
- make inferences about a wider population

- generate confidence intervals (assessing variability of estimates)

- test hypotheses

- calculate sample size

36

## Define skewness

### A measure of the asymmetry of the distribution

37

## What is a null hypothesis?

### A hypothesis saying that the outcome is not associated with the exposure

38

## What is an alternative hypothesis?

### A hypothesis saying that the outcome is associated with the exposure

39

## Why use statistical tests?

### We use statistical tests to help us judge if our observed effect size is due to chance or if it is real.

40

## What is the significance level?

###
•The probability that you will find an effect that does NOT actually exist

•Strength of evidence needed to reject NULL hypothesis

•Normally set to 5%

41

## Define the term 'inferential statistics'

### Inferential statistics allows you to make predictions (“inferences”) from that data. With inferential statistics, you take data from samples and make generalisations about a population.

42

## What is meant by standard error?

###
•Standard Error is an inferential statistic.

•It is an estimate of how variable a statistic would be if we repeated our study numerous times.

43

## What are p-values?

### P-values give the probability that we observed an effect size as large as we did if the null hypothesis is true i.e. effect size is zero

44

## What do p-values tell you?

###
P-values tell us the strength of the evidence against the null hypothesis that there is no association.

As the p-value decreases the evidence against the null hypothesis increases.

45

## What do confidence intervals tell you?

###
The confidence interval shows the range of values in which the true effect size is likely to lie.

A 95% confidence interval tells us that in 95% of replicate experiments, the true value will lie in the interval.

46

## How can the concept of disease be influenced?

###
• Evidence of symptoms

• Technological and medical development

• Sociocultural environment

47

## Define the term 'abnormality'

### Different from what is usual or average, especially in a way that is bad

48

## List the three types of abnormality

###
• Abnormal if unusual

• Abnormal if associated with clinical abnormality

• Abnormal if increased risk of future disease

49

## Abnormal if unusual

### Common in laboratory testing to define normal as the range which includes 95% of values found in healthy subjects. This means abnormal is the top and bottom 2.5% of the population

50

## What's wrong with defining abnormal as unusual?

### By definition 5% of healthy people will have “abnormal” i.e. “unusual” values

51

## Abnormal if associated with clinical abnormality

### More logical to label values of a test as abnormal if these values are clearly associated with the presence of a disease state.

52

## What's wrong with defining abnormal as being associated with clinical abnormality?

### There's almost always overlap between values in diseased subjects and those in healthy subjects

53