ML Soc-Tech Flashcards
(83 cards)
What is the primary focus of statistical models?
Inference, such as determining whether a relationship exists or not
This involves hypothesis testing.
What do theories describe in the context of statistical models?
First principles of how and why X causally relates to Y
What are models in the context of theories?
Instantiations of theories that provide a local mathematical description or understanding of a phenomenon
What is the significance of parameters in models?
Parameters are interpretable and models are tractable
Why is the number of parameters limited in statistical models?
To avoid overfitting and enhance model interpretability
What do the Law of Large Numbers and Central Limit Theorem indicate?
As sample size increases, the sample mean converges to the population mean and approaches a normal distribution
What is the main focus of machine learning (ML)?
Prediction, specifically input-output relationships
What is prioritized in ML over interpretability or inference?
Prediction accuracy on unseen data
What is a tractable system, model, or problem?
One that can be solved or analyzed using existing mathematical or computational methods within a reasonable time and resource limit
What does learning in machine learning involve?
Approximating a function g that maps input x to output y by analyzing patterns within data
What is the goal of learning in ML?
To fit the training data and generalize to unseen examples
What is the difference between learning and estimation?
Learning prioritizes developing a predictive model, while estimation focuses on determining specific parameters within a model
What type of data does supervised learning work with?
Labeled data (x, y)
What is the purpose of classification models in supervised learning?
To categorize data into predefined classes
What is the output of regression models in supervised learning?
A continuous value or quantity
What is generalization in machine learning?
A measure of how effectively a model captures underlying patterns in data rather than memorizing specific details
What causes poor generalization in models?
Overfitting or underfitting
What is overfitting?
Occurs when a model is too complex, capturing noise in the training data instead of general patterns
What is underfitting?
When a model is too simplistic and fails to capture significant trends in the data
What does the bias-variance trade-off describe?
The balance between a model’s capacity to generalize and its ability to fit the training data
What does high bias indicate in a model?
The model is too simple and fails to learn relationships in the training data effectively
What does high variance indicate in a model?
The model is overly complex and captures noise in the training data
What are the steps in ML model development?
- Study phenomenon & clean data
- Discover data
- Explore associations
- Train ML model
- Evaluate model
- Analyze errors
What are the assumptions on data in regression?
- Linearity
- Constant variance (homoskedasticity)
- Errors are independent and identically distributed
- No correlation between errors
- No perfect collinearity in features