Statistics Flashcards

1
Q

What is statistical inference used for

A

Used to determine the probability that an observed association may be due to chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is causal inference?

A

Systematic process of determining whether a factor (exposure) that is statistically associated (or not?) with the outcome (disease) is in fact a causal risk factor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a causal risk factor?

A

A factor that directly influences or contributes to the increased likelihood of a specific outcome or event (disease).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a variable?

A

Any observable event that can vary and can be measured on individuals is called a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List some examples of variables

A

Height, litter size, blood count, enzyme activity, coat colour, body weight, age, gender, pregnancy status, disease status, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are data?

A

Facts (especially numerical facts) related to specific variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 2 categories of variables?

A
  1. Qualitative
  2. Quantitative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the types of Qualitative data?

A
  • Dichotomous (binary)
  • Nominal
  • Ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the types of Quantitative data?

A
  • Discrete (count)
  • Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe Dichotomous data

A

Data where every observation is in one of two categories (yes/no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some examples of dichotomous data?

A
  • Died or survived, fat/thin, male/female, young/old
  • Prevalence: number of occurrence (the yes)/the population at risk (yes plus no)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can dichotomous data be distributed?

A

Reporting the numbers and % of subjects in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe Nominal data

A

Three or more categories or classes identified by labels that have no inherent ordering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some examples of nominal data?

A

Cow breed
- Friesian
- Hereford
- Angus

Foetuses following infection with pestivirus
- Foetal death
- Congenital disorder
- Born alive but persistently
infected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can nominal data be distributed?

A
  • Reporting the numbers and % of subjects in each category
  • Bar charts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe Ordinal data

A
  • Data in three or more categories with the categories having some inherent order
  • The difference between values is not necessarily constant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can Ordinal data be distributed?

A
  • Reporting the numbers and % of subjects in each category
  • Bar charts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are some examples of ordinal data?

A

Severity of colic
- Mild
- Moderate
- Severe colic

Colour of gums
- Normal
- Pale
- White

  • NRL ladder
  • Age grouping
  • Clinical assessment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe discrete data

A
  • Counts
  • Can have only values as whole number (integers)
  • Ordered with standard distance between values
  • Measured in units which cannot be subdivided any further
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Examples of discrete data

A
  • Number of new disease cases
  • Number of teats on a sow
  • Number of animals
  • Heart rate
  • Somatic cell count
  • Bacterial count
  • Strongyle egg counts in faecal sample (eggs/gram)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How can discrete data be distributed?

A
  • Categorising
  • Drawing a histogram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Describe continuous data

A
  • Have any value within a defined range (not restricted to certain specified values such as integers)
  • Generated through measurements
  • Difference between consecutive values can be arbitrary small
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How can continuous data be distributed?

A
  • Categorising and drawing a histogram
  • Box & Whisker plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Examples of continuous data

A
  • Body weight
  • Blood pressure
  • Age
  • Hormone concentration

Scales
- Interval scale (Zero is arbitrary)
- Temperature in deg C
- Ratio scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
The distinction between discrete and continuous data
Red blood cell count – It is a discrete variable, but it has so many possible values that it can be treated as continuous data (cells/ml) Seed diameter - Continuous, but if it is measured to nearest 0.05mm it may be treated as discrete
26
What type of data is pulse rate?
Quantitative, continuous
27
What type of data is eye colour?
Qualitative, nominal
28
What type of data is dairy milk yield per cow?
Quantitative, continuous
29
What type of data is No. of lesions?
Quantitative, discrete (count)
30
What type of data is pregnancy status of each cow?
Qualitative, nominal
31
What type of data is number of puppies per litter?
Quantitative, discrete
32
What are descriptive statistics?
Tests conducted to explore patterns in the data and to validate/check the data
33
What are some types of descriptive statistics?
Evaluation of; - Graphical presentation of the data - Measures of central tendency (mean, median, mode) - Measures of spread (range, percentile, variance, standard deviation)
34
What does graphical presentation of data allow for?
- Easier interpretation of the data than from frequency tables - Exploration of the distribution of data - Exploration of patterns in the data - Assistance in the interpretation of statistical inferences
35
What are the common types of graphs?
- Bar charts - Histograms (frequency distribution) - Box plots - Scatter plots
36
When are bar charts used?
- For nominal data with no inherent order - Also used for ordinal data
37
When are histograms used?
- For continuous data or count data - Y-axis can be frequencies or percentages - For continuous data, values need to be allocated to classes - Measurements are arbitrarily tied to the class interval (but you can change the interval)
38
When are Box and Whisker plots used, and what is the box and whiskers?
- For continuous data - Box displays the 25 and 75 percentile and the median - Whiskers as end points display the minimum and maximum values
39
When are Scatter plots used?
- To show the relationship between two continuous variables – Regression lines can be displayed to show any linear relationship
40
What are measures of central tendancy?
Measurements that provide an indication where the data is located in the range of the potential values of a variable
41
What is the mean and how is it calculated?
- Average - Calculated by summing each individual observation and dividing it by the number of samples
42
What is the median and how is it calculated?
- The point where half the observations fall above and half below 1. Sort the data in order from lowest to highest 2. Calculate (n+1)/2 to give you the order ranking of the median, otherwise 3. If the number of data points is odd, the median is the middle value. 4. If the number of data points is even, the median is the average of the two middle values
43
Why is the median a more "robust" measure than the mean?
- It is not affected by a few extreme values (outliers), hence; - Is more accurate indicator of the ‘average’ in skewed distributions
44
What is the mode and how is it calculated?
The value that appears most frequently in a dataset
45
Describe the mean, median and mode in a normal distribution
They are all very similar
46
Describe the mean, median and mode in a skewed distribution
Mode and median may be similar, but mean will be a poor indicator of the central tendency
47
What are measures of spread and the types?
- Used to describe the variability in the data 1. Range 2. Percentile 3. Variance 4. Standard Deviation
48
What is range?
- Difference between the lowest and highest value - A rough estimate, because it is based on only two observations - Many authors give minimum and maximum values instead of the range
49
What is percentile?
- A number that indicates the **percentage** of values less than or equal to that number - The 50th percentile is the median
50
What is the interquartile range (IQR) and how is it calculated?
- It describes the middle 50% of values when ordered from lowest to highest - The difference between Q1 and Q3 is IQR
51
What does Q1 describe?
Q1 is the 25th or lower percentile is the value where 25% of the values are smaller than Q1 and 75% are larger
52
What does Q2 describe?
Q2 is the median or 50th percentile
53
What does Q3 describe?
Q3 is the 75th or upper percentile is the value where 75% of the values are smaller than Q3 and 25% are larger.
54
What does Q4 describe?
Q4 is the 100th percentile, which is the maximum value in the dataset *excluding outliers*.
55
When is the median a more appropriate measure of central tendency than the mean?
When data has outliers or is skewed
56
What does variance describe?
57
What are variance and standard deviation measures of?
- The spread of the data around the mean. - They summarise how close each observed data value is to the mean value.
58
What happens to the variance and standard deviation in datasets with a small spread?
They become very small because all the results are very close to the mean
59
What is variance (s squared) and how is it calculated?
- Average squared difference from the mean - The sum of the squares of the difference of each of the n values from the mean, divided by the degrees of freedom (n-1)
60
What is standard deviation and how is it calculated?
- Average distance from the mean - The square root of the sample variance – Estimates the average variation of the n values (xi) from the mean (x bar) and hence tells us how much variability can be expected among individuals
61
What is one limitation of percentiles?
They do not directly use all data values.
62
Why is variance squared?
- One way to describe spread using all data values would be to calculate the difference from the mean for each value. - **However,** if we add these, they sum to zero because the negative differences cancel the positive differences. - The solution for this is to square each difference to remove the negatives - If we sum these values, we get the sum of squares
63
What is a limitation of variance?
- Sums of squares are of little use when comparing data sets with different numbers of values. - Even where two data sets have the same spread, the sums of squares will be larger in the dataset with more values.
64
How do we avoid the limitation of variance not taking into account data sets with different numbers of values?
1. If the data set represents all values from an **entire population**, we divide the sum of squares differences by the number of values and obtain the population variance 2. If the data set consists of a **sample of a population,** we divide the sum of squares by (number of values-1). The resulting measure is called the sample variance.
65
What is a second limitation of variance and how does using standard deviation address this?
- It is not in the original measurement units - By taking the square root
66
Because standard deviation is easily interpreted in normal distribution, it is often used to determine...
‘Normal’ ranges for laboratory tests