lecture 7 Flashcards by Katherine Nicholls

Inference

Inference is the formal name given to learning from data using statistical tools.

Act of taking estimate from the sample and turning it into some impression of the unknown value in the population of interest

How well did you know this?

Not at all

Perfectly

Parameter

The numerical measure of the quantity of interest in the population.

Parameters are generally unknown, but can be hypothetical.

true unknown value in the population is known as a parameter

find particular value at which we can no longer improve for a physiological event

How well did you know this?

Not at all

Perfectly

Risk as a parameter

there is some true value of the relative risk of this event but we do not know it so we collect our study and estimate that value and this estimate becomes our best guess at the unknown value of the parameter

How well did you know this?

Not at all

Perfectly

Random variable

An unknown quantity that varies in an unpredictable way.

How well did you know this?

Not at all

Perfectly

Once a random variable is observed…

we refer to an observed or realised value.

How well did you know this?

Not at all

Perfectly

Notation of random variables

Random variables are represented by upper case Roman letters.

How well did you know this?

Not at all

Perfectly

Notation of observed or realised values

Lower case Roman letters represent the observed or realised value.

How well did you know this?

Not at all

Perfectly

Random variables are described by

probability distributions

How well did you know this?

Not at all

Perfectly

observed values of random samples are

data

How well did you know this?

Not at all

Perfectly

Statistic

A statistic is a numerical summary of data.

How well did you know this?

Not at all

Perfectly

Estimate

An estimate is a special kind of statistic used as an intelligent guess for a parameter.
Often estimates are denoted by adding a circumflex: μˆ is an estimate of the parameter μ.

on this course we will generally use x ̄ to denote the estimate of
the parameter μ.

How well did you know this?

Not at all

Perfectly

A statistical model

A statistical model is a mathematical description of the way the data are generated.
- Expressed in terms of parameters and random variables.

How well did you know this?

Not at all

Perfectly

The main types of variables are?

continuous
discrete
categorical

How well did you know this?

Not at all

Perfectly

Continuous variables define

Continuous - can be expressed on a continuous scale in which every value is possible.

e.g. rainfall

How well did you know this?

Not at all

Perfectly

Discrete variables define

Discrete - can be put in one-to-one correspondence with the counting numbers.

e.g. number of people in this room with brown eyes, whole numbers, don’t use decimals as it doesn’t make sense

How well did you know this?

Not at all

Perfectly

Categorical variables define

Study These Flashcards

Categorical - restricted to one of a set of categories. For example ‘Heads’ or ‘Tails’.

Binary categorical variables

Study These Flashcards

Categorical variables give rise to categorical data. The simplest kind involves just two categories. For example, a person could be:
M ̄aori/non-M ̄aori. smoker/non-smoker. diabetic/non-diabetic.
Such data are also called binary data, dichotomous data, yes/no data and 0-1 data.
The 0-1 data refers to codes we use for the different outcomes. For example, 1 would typically represent participants with the outcome and 0 would represent participants without the outcome.

Categorical variable - more than two

Study These Flashcards

Data are nominal if there is no natural (or relevant) ordering:
Blood group: A/B/AB/O.
Prioritised ethnicity: M ̄aori/Pacific/Asian/NZ European/Other .
Note: Ethnicity per se is not a categorical variable because people can identify with more than one ethnicity.

Data are ordinal if there is a natural ordering:
‘Degree of pain’: minimal/moderate/severe/unbearable.
‘Socio-economic deprivation’:
eg NZDep06, measured on a scale from 1 (least deprived) to 10 (most deprived).
However in this case it can be misleading to code the categories as integer values (e.g. 0,1,2,3 for ‘Degree of pain’). Is ‘unbearable’ three times more severe than ‘moderate’?

Categorical variable - more than two - nominal

Study These Flashcards

nominal - no natural order in magnitude

Categorical variable - more than two - ordinal

Study These Flashcards

ordinal - scale based on the nature of the response

Discrete numerical

Study These Flashcards

Discrete variables give rise to discrete data

With discrete data, observations take only certain numerical values, typically integers or whole numbers. For example:
number of cases of cancer diagnosed during a day. number of children in a family (0,1,2,3,4,…).

It is important to note that these are not like categorical data as the numerical representations are always consistent… e.g. 3 children is three times as many as one.

This type of data can be treated as though it is categorical if we must, but this discards information about the magnitude of the relationships between the numbers.

Continuous numerical

Study These Flashcards

Continuous variables give rise to continuous data.

Continuous data arise from some form of measurement. For example:
height, age, blood pressure, serum cholesterol, …

In practice many continuous variables only take positive values. Often there is no further restriction on values other than that caused by the accuracy of the equipment for recording values. However while the underlying variable may be truly continuous, the data may be coarsened (e.g. age in years rather than age in days/hours/minutes/seconds…).

When looking at plotted data for a continuous variable you should look at …

Study These Flashcards

modality (unimodal, bimodal etc) and symmetry (determine whether skew present)

determining which side the skew is on

Study These Flashcards

if asymmetric the skew is in the direction of the tail

i.e. tail on the left is a left skew which can also be called a negative skew

Ratio

Ratio: fraction given by one quantity over another. Both quantities have the same units. Example: In a class with 10 boys and 20 girls, the ratio of boys to girls is 10/20 (= 1/2) and the ratio of girls to boys is 20/10 (= 2) This is known as the odds of girls vs boys.

Proportion

Proportion: fraction of one quantity when compared to the whole.

Percentages

Proportions are often expressed in terms of percentages. To convert proportions to percentages, multiply by 100 and add a % sign. To convert percentages to proportions, divide by 100 and remove % sign.

Rates

Rates are like ratios for quantities with different units. Number of new diagnoses of HIV in NZ per year. Number of children per family. Usual practice is to simplify rates to a ‘per unit’ measure. - 13 deaths over 5 years is 2.6 deaths per year. Rates are a good way to compare things

scores

Obtaining measures of continuous phenomena is not always easy. Where exact measurement is not possible a score may be used. For example: Responses in a question about back pain might be on a scale of 1 (no pain) to 5 (unbearable pain). - these are scores, not results of a discrete random variable because who is to say that when someone says they are a 4 that it is twice as painful as 2 Levels of agreement in a survey (e.g. course evaluation) might be labelled ‘a great deal’ / ‘somewhat’ / ‘not much’ / ‘not at all’. These are now treated like ordinal categories rather than continuous data. The responses might be numbered (e.g. 0,1,2,3,4), but care must be taken interpreting these as numerical data. typically used when dealing with ordinal categorical values can be somewhat misleading, usually recorded as a number but like categorical variables a mathematical consistency related to the magnitude of these numbers does not hold so may as well just use the scores as labels

Censored data

Censoring in a study is when there is incomplete information about a study participant, observation or value of a measurement. not sure what the actual measurement should be With censored data the underlying variable follows a continuous distribution, but some values are not known exactly Censored data are categorised by two variables, e.g. for right censored data one variable gives the last known value and another indicates whether or not the measurement is censored. Observation of interest plus yes/no flag whether the particular value is censored i.e. whether the value we have gotten has reached some particular limit

Types of censored data

right censored left censored interval censored

Right censored

Right censored - the true value is known to be larger than a recorded value for example, we know that someone lived until at least 31 Dec 2017.

Left censored

- Left censored - the true value is known to be smaller than a recorded value for example, we know that a measurement is less than a known limit of detection.

Interval censored

Interval-censored - the true value is known to lie between two values for example, we know the date of infection with HPV is after a negative test and before a positive test 2 years later.

_______ censored in prospective studies ...

right censored | e.g. loss to follow up

lecture 7 Flashcards

(35 cards)