Module 1 Flashcards
(55 cards)
Biostatistics
A branch of statistics that applies statistical theories and methodologies to the collection, review, and analysis of data arising from biological, agricultural, medical, and public-health-related contexts
Population
In statistics, a population referes to the entire group or collection of items, individuals, or elements that share a common characteristic and are of interest for the purpose of analysis or study. It encompasses every possible member of a group under consideration
For example - public health researchers conducted a comprehensive survey to assess the vaccination rates against the flu virus among the population of elderly individuals living in nursing homes
Sample
A sample is a smaller subset of the population selected to study or analyze the characteristics of the entire population. Samples are used to make inferences and draw conclusions about the larger population without having to examine every single individual within it. The goal is to ensure that the sample is representative of the population to obtain accurate and meaningful results.
Example: public health officials collected a random sample of 500 households in the urban area to study the prevalence of air population-related respiratory illnesses among residents
Variable
Any characteristic, number of quantity that can be measured or counted. It varies or changes from one observation to another, hence the name. Variables can be classified into different types such as categorical
Exmaple: in a public health study examining the relationship between dietary habits and the onset of diabetes, the amount to daily sugar intake could be considered a variable.
What are the three main components of statistics as a discipline?
Design - how we collect data
Description -describing data in a sample
Inference - using data from a sample to make generalizations about a population
When is it appropriate to use absolute frequency vs relative frequency
Absolute frequency - when you want to convey the actual number of cases of occurrences. Relative when you want to provide a sense of proportion or rate
What are the advantages and disadvantages of transforming a continuous variable into an ordinal variable?
Advantages: simplicity - making it easier for non-experts to understand
Applicability - interventions can be tailored to high, medium, and low risk for example
Disadvantages
Loss of information
Arbitrary boundaries
What are the types of data?
Nominal - qualitative and used to name or label variables without assigning any quantitative value or order. Categories are mutually exclusive and cannot be ranked or measured. Examples: Gender, nationality, types of transportation
Ordinal - Ordinal data is categorical data where variables have a natural, ordered sequence, but the intervals between categories are not necessarily equal or known. You can rank the values, but you cannot quantify the difference between them. Examples: Survey ratings (e.g., poor, fair, good, excellent), letter grades, socioeconomic status
Discrete - Discrete data is quantitative and consists of countable, indivisible values. Each data point is a distinct, separate value, often representing “the number of” something. Discrete data cannot take on every possible value within a range, only specific, separate values. Examples: Number of students in a class, number of cars in a parking lot.
Continuous - Continuous data is quantitative and can take on any value within a given range, including fractions and decimals. It represents measurements and can be infinitely subdivided. Continuous data is often obtained through precise measurement tools. Examples: Height, weight, temperature, time spent on a website
What are the primary measures of location
Percentile - the p-th percentile is the value that p% of observations lie below
Median - the 50th percentile value (the middle value)
Mean - the average value
How do median and mean interact when data is unimodal
Symetric: Median = Mean
Left Skewed: Mean < Median
Right Skewed: Median < Mean
What is dispersion
How spread out from the center are typical observations
Range - distance between the smallest and largest
IQ Range- distance between 25% and 75%
What is Variance
Average squared distance
What is standard deviation
The square root of the variance puts it into the same units as the original data
What is anecdotal evidence
Refers to unusual observations that are easily recalled because of their striking characteristics. While it cannot be used as a basis for a conclusion, it can inspire the design of a more systematic study
Simple Random Sample
Each member of a population has the same chance of being sampled. Each case is sampled independently of theother cases
What can happen with non-response
A non-response bias can skew the results and lead to incorrect conclusions about a population
What is stratified sampling
The population is divided into strata before cases are selected within each stratum. The strata are chosen such that similar cases are grouped together.
This is especially useful when the cases are similar with respect to the outcome of interest, but the cases between strata are different
What is a cluster sample
The population is divided into cluster,s then a fixed number of clusters is sampled and all observations from those clusters are included
Useful when high case-to-case variability, but clusters are similar
What is multistage sampling
Similar to cluster sampling, but instead of keeping all observations in each cluster, a random sample is collected within each cluster
Useful when high case-to-case variability, but clusters are similar
What are the three principles on which experimental design is based? is based
Control - control for extraneous variables and choose a sample that is representative of the population of interest
Randomization - ensures balance and protects against bias. Allows differences in outcomes to be reasonably attributed to a treatment rather than inherent variability between patients
Replication - results from a large study are more likely to be reliable than those from small samples
What is a confounding variable
A variable associated with both the explanatory and response variables
What is the difference between a population and a sample
A population is the collection of individuals about whom you want to make inferences. A sample is a subset of the population on whom data is collected.
What is Simpson’s Paradox
An extreme example of confounding where associations observed in several groups disappear of change direction when the groups are combined
What are the three principal points where data analysis could go wrong
Collection - when gathering data
Processing - when analyzing the data and its implications
Presentation - when sharing your findings with others