DVT: Data, Variables and Tables Flashcards
(17 cards)
What 2 ways can data variables be classified?
Numerical:
- Quantitative
- Individuals measured or count
Categorical:
- Qualitative
- Individuals classified into groups
What are examples of numerical variables?
- Weight
- BP
- Prothrombin time
- Age
- No. long distance flights in last month
- No. cigarettes per day
What are examples of categorical variables?
- Smoker/non-smoker
- On anticoagulation medicine?
- History of cancer
- Alive after 6 months?
- Blood group type
- Causes of death
- Pain assessment
- Stage of cancer
How are numerical values measured?
On interval scales (interval or distance b/w points on scale has precise numerical meaning
What is binary data?
Subtype categorical
Can only take 2 values (often yes/no)
Also known as dichotomous
What is nominal data?
Subtype categorical
More than 2 categories, but no natural order (A, B, AB, O)
What is ordinal data?
More than 2 categories, with a natural order e.g. Stage I, II, III, IV
How can data be summarised?
- Numerical - Measures of central tendency (mean, median), measures of spread (standard deviation, range)
- Categorical - Frequencies, proportions, percentages. Use tables and charts to do this
Why should data be summarised?
- Data monitoring - Ensure what’s being collected is valid to spot errors that can be corrected
- Data checking/cleaning - Ensure collected data correct, identify any outliers
- Summary of results - Basic description, potential precursor to more complex analysis
How can central tendency be measured?
Mean - Average of all values, good measure of centre at a symmetrical distribution. Much more useful in practice but over influenced by extreme values
Median - Value at which 50% data points lie, better for skewed distributions because only slightly affected by extreme values
Describe symmetrical bell shape
Mean = Median
Describe negatively skewed bell shape
Mean < median, long tail to left
Describe positively skewed bell curve
Mean > Median, long tail to right
Can range be a measure of spread?
Dependent on outliers (i.e. extreme values)
Range doesn’t indicate whether these values are distinct from main body of data (larger sample, wider range)
- Useful if data not normal (symmetrical)
Splits data so there are equal frequencies in each group
Define reference range and how it can be estimated?
A set of values within which a specific test result is considered to be within the normal or healthy range for a particular population
Can be estimated by a large sample of individuals from the defined population is recruited, and their results for the specific test or measurement are collected.
The collected data is analyzed to calculate:
Mean: The average value of the test or measurement across the sample.
Standard Deviation (SD): A measure of how much the individual results deviate from the mean.
The reference range is usually defined as the mean plus or minus a certain number of standard deviations. Commonly, the 95% reference range is calculated as mean ± 2 SD. This means that approximately 95% of individuals in the defined population would be expected to have values within this range.
How can numerical data be further classified?
As continuous or discrete:
Continuous - All possible values within range, Continuous numerical data refers to numerical data that can take on any value within a given range, including decimals and fractions. Continuous data is measured rather than counted
Discrete - Takes certain values in given range, Discrete numerical data refers to data that can only take on certain, separate values, typically whole numbers, and are usually counted rather than measured
How do we calculate confidence intervals?
Mean +/- 2 Standard error