C1 Intro to Probability and Data with R M1-3 Data Flashcards
Which type of variable is hdi (Human Development Index, combining factors of life expectancy, educational attainment, and income) with levels very high, high, medium, and low human development)?
Ordinal Categorical Variable
There is an inherent ordering to the levels of this categorical variable (from very high to low), and hence this is an ordinal categorical variable.
What are the two main types of numerical variables?
Continuous and Discrete
Continuous variables can take any value within a range, while discrete variables can only take specific values.
Define continuous variables.
Can take any value within a range (e.g., height)
Continuous variables allow for an infinite number of possible values.
Define discrete variables.
Can only take specific values (e.g., number of cars owned)
Discrete variables are countable and often represented as whole numbers.
What are the two categories of categorical variables?
Ordinal and Nominal
Categorical variables represent characteristics or qualities.
Define ordinal variables.
Have a meaningful order (e.g., satisfaction levels)
The order matters in ordinal variables, unlike in nominal variables.
Define nominal variables.
No inherent order (e.g., morning person vs. afternoon person)
Nominal variables categorize data without a ranking system.
What do researchers do in observational studies?
Collect data without interfering with how it arises.
What can researchers establish in observational studies?
An association (correlation) between variables.
In general, observational studies can provide evidence of a naturally
occurring association between variables, but they cannot by themselves show a causal connection.
What are the two types of observational studies?
- Retrospective studies (using past data)
- Prospective studies (collecting data throughout the study)
What is the main feature of experiments in research?
Researchers randomly assign subjects to treatments.
What do experiments allow researchers to establish?
Causal connections.
Why is random assignment important in experiments?
It helps control for confounding variables.
What are confounding variables?
Extraneous factors that may influence both the explanatory and response variables.
What is Convenience Sample Bias?
When only easily accessible individuals are included.
This type of bias can lead to non-representative samples because it does not account for the broader population.
What causes Non-response Bias?
Occurs when a non-random fraction of the sampled individuals respond, leading to unrepresentative results.
It can skew the results if the non-respondents differ significantly from respondents.
What is Voluntary Response Bias?
Arises when only those with strong opinions choose to respond.
This bias often leads to overrepresentation of extreme views in survey results.
What is Simple Random Sampling?
Each case has an equal chance of selection.
This method ensures that every individual in the population has the same probability of being chosen.
Define Stratified Sampling.
Population is divided into strata, and samples are taken from each.
This technique is useful for ensuring representation from different segments of the population.
What characterizes Cluster Sampling?
Population is divided into clusters, and entire clusters are sampled.
This method is often used when populations are large and geographically dispersed.
Explain Multistage Sampling.
Combines cluster sampling with additional sampling within selected clusters.
This approach allows for a more refined sampling process, potentially increasing efficiency.
What is a strategy to minimize sampling bias in studies?
Use Random Sampling
Ensures that every individual in the population has an equal chance of being selected.
What is Stratified Sampling?
Dividing the population into homogeneous subgroups and randomly sampling from each stratum
Ensures representation across key characteristics like age or gender.
How does increasing sample size help in studies?
It reduces the impact of bias and increases the reliability of results
A larger sample size generally leads to more accurate and generalizable findings.