lecture 7 Flashcards

1
Q

Inference

A

Inference is the formal name given to learning from data using statistical tools.

Act of taking estimate from the sample and turning it into some impression of the unknown value in the population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parameter

A

The numerical measure of the quantity of interest in the population.

Parameters are generally unknown, but can be hypothetical.

true unknown value in the population is known as a parameter

find particular value at which we can no longer improve for a physiological event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Risk as a parameter

A

there is some true value of the relative risk of this event but we do not know it so we collect our study and estimate that value and this estimate becomes our best guess at the unknown value of the parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Random variable

A

An unknown quantity that varies in an unpredictable way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Once a random variable is observed…

A

we refer to an observed or realised value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Notation of random variables

A

Random variables are represented by upper case Roman letters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Notation of observed or realised values

A

Lower case Roman letters represent the observed or realised value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Random variables are described by

A

probability distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

observed values of random samples are

A

data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Statistic

A

A statistic is a numerical summary of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Estimate

A

An estimate is a special kind of statistic used as an intelligent guess for a parameter.
Often estimates are denoted by adding a circumflex: μˆ is an estimate of the parameter μ.

on this course we will generally use x ̄ to denote the estimate of
the parameter μ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A statistical model

A

A statistical model is a mathematical description of the way the data are generated.
- Expressed in terms of parameters and random variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The main types of variables are?

A

continuous
discrete
categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Continuous variables define

A

Continuous - can be expressed on a continuous scale in which every value is possible.

e.g. rainfall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Discrete variables define

A

Discrete - can be put in one-to-one correspondence with the counting numbers.

e.g. number of people in this room with brown eyes, whole numbers, don’t use decimals as it doesn’t make sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Categorical variables define

A

Categorical - restricted to one of a set of categories. For example ‘Heads’ or ‘Tails’.

17
Q

Binary categorical variables

A

Categorical variables give rise to categorical data. The simplest kind involves just two categories. For example, a person could be:
M ̄aori/non-M ̄aori. smoker/non-smoker. diabetic/non-diabetic.
Such data are also called binary data, dichotomous data, yes/no data and 0-1 data.
The 0-1 data refers to codes we use for the different outcomes. For example, 1 would typically represent participants with the outcome and 0 would represent participants without the outcome.

18
Q

Categorical variable - more than two

A

Data are nominal if there is no natural (or relevant) ordering:
Blood group: A/B/AB/O.
Prioritised ethnicity: M ̄aori/Pacific/Asian/NZ European/Other .
Note: Ethnicity per se is not a categorical variable because people can identify with more than one ethnicity.

Data are ordinal if there is a natural ordering:
‘Degree of pain’: minimal/moderate/severe/unbearable.
‘Socio-economic deprivation’:
eg NZDep06, measured on a scale from 1 (least deprived) to 10 (most deprived).
However in this case it can be misleading to code the categories as integer values (e.g. 0,1,2,3 for ‘Degree of pain’). Is ‘unbearable’ three times more severe than ‘moderate’?

19
Q

Categorical variable - more than two - nominal

A

Data are nominal if there is no natural (or relevant) ordering:
Blood group: A/B/AB/O.
Prioritised ethnicity: M ̄aori/Pacific/Asian/NZ European/Other .
Note: Ethnicity per se is not a categorical variable because people can identify with more than one ethnicity.

nominal - no natural order in magnitude

20
Q

Categorical variable - more than two - ordinal

A

Data are ordinal if there is a natural ordering:
‘Degree of pain’: minimal/moderate/severe/unbearable.
‘Socio-economic deprivation’:
eg NZDep06, measured on a scale from 1 (least deprived) to 10 (most deprived).
However in this case it can be misleading to code the categories as integer values (e.g. 0,1,2,3 for ‘Degree of pain’). Is ‘unbearable’ three times more severe than ‘moderate’?- mathematical definition does not hold, be aware of this distinction

ordinal - scale based on the nature of the response

21
Q

Discrete numerical

A

Discrete variables give rise to discrete data

With discrete data, observations take only certain numerical values, typically integers or whole numbers. For example:
number of cases of cancer diagnosed during a day. number of children in a family (0,1,2,3,4,…).

It is important to note that these are not like categorical data as the numerical representations are always consistent… e.g. 3 children is three times as many as one.

This type of data can be treated as though it is categorical if we must, but this discards information about the magnitude of the relationships between the numbers.

22
Q

Continuous numerical

A

Continuous variables give rise to continuous data.

Continuous data arise from some form of measurement. For example:
height, age, blood pressure, serum cholesterol, …

In practice many continuous variables only take positive values. Often there is no further restriction on values other than that caused by the accuracy of the equipment for recording values. However while the underlying variable may be truly continuous, the data may be coarsened (e.g. age in years rather than age in days/hours/minutes/seconds…).

23
Q

When looking at plotted data for a continuous variable you should look at …

A

modality (unimodal, bimodal etc) and symmetry (determine whether skew present)

24
Q

determining which side the skew is on

A

if asymmetric the skew is in the direction of the tail

i.e. tail on the left is a left skew which can also be called a negative skew

25
Q

Ratio

A

Ratio: fraction given by one quantity over another. Both quantities have the same units.
Example: In a class with 10 boys and 20 girls, the ratio of boys to girls is 10/20 (= 1/2) and the ratio of girls to boys is 20/10 (= 2)
This is known as the odds of girls vs boys.

26
Q

Proportion

A

Proportion: fraction of one quantity when compared to the whole.

27
Q

Percentages

A

Proportions are often expressed in terms of percentages.
To convert proportions to percentages, multiply by 100 and add a % sign. To convert percentages to proportions, divide by 100 and remove % sign.

28
Q

Rates

A

Rates are like ratios for quantities with different units. Number of new diagnoses of HIV in NZ per year. Number of children per family.

Usual practice is to simplify rates to a ‘per unit’ measure.
- 13 deaths over 5 years is 2.6 deaths per year.

Rates are a good way to compare things

29
Q

scores

A

Obtaining measures of continuous phenomena is not always easy. Where exact measurement is not possible a score may be used.
For example:
Responses in a question about back pain might be on a scale of 1 (no pain) to 5 (unbearable pain). - these are scores, not results of a discrete random variable because who is to say that when someone says they are a 4 that it is twice as painful as 2
Levels of agreement in a survey (e.g. course evaluation) might be labelled ‘a great deal’ / ‘somewhat’ / ‘not much’ / ‘not at all’.

These are now treated like ordinal categories rather than continuous data. The responses might be numbered (e.g. 0,1,2,3,4), but care must be taken interpreting these as numerical data.

typically used when dealing with ordinal categorical values

can be somewhat misleading, usually recorded as a number but like categorical variables a mathematical consistency related to the magnitude of these numbers does not hold so may as well just use the scores as labels

30
Q

Censored data

A

Censoring in a study is when there is incomplete information about a study participant, observation or value of a measurement.

not sure what the actual measurement should be

With censored data the underlying variable follows a continuous distribution, but some values are not known exactly

Censored data are categorised by two variables, e.g. for right censored data one variable gives the last known value and another indicates whether or not the measurement is censored.

Observation of interest plus yes/no flag whether the particular value is censored i.e. whether the value we have gotten has reached some particular limit

31
Q

Types of censored data

A

right censored
left censored
interval censored

32
Q

Right censored

A

Right censored - the true value is known to be larger than a recorded value
for example, we know that someone lived until at least 31 Dec 2017.

33
Q

Left censored

A
  • Left censored - the true value is known to be smaller than a recorded value
    for example, we know that a measurement is less than a known limit of detection.
34
Q

Interval censored

A

Interval-censored - the true value is known to lie between two values for example, we know the date of infection with HPV is after a negative test and before a positive test 2 years later.

35
Q

_______ censored in prospective studies …

A

right censored

e.g. loss to follow up