Machine Learning Flashcards
What is machine learning?
- technology that allows computers to learn from data without having to program it
subset of artificial intelligence (AI) that enables computers to learn from data without having to program it, using algorithms to identify patterns and use this knowledge to make predictions or decisions
What’s difference between supervised machine learning and unsupervised machine learning?
-supervised learning uses labeled data to train models for prediction or classification (eg. Mom telling you this is a teddy bear)
-unsupervised learning uses unlabeled data to discover patterns and structures (you have to figure out what it is based on group common characteristics)
What is difference between labeled data and unlabeled data?
- labeled date: cat picture with title cat
- unlabeled data: cat picture
In supervised machine learning what’s the difference between regression problems and classification problems?
- regression problem: what will weather be tomorrow
- classification problem: will it be hot or cold tomorrow
What is the difference between dimension reduction and clustering in unsupervised machine learning?
-dimension reduction: reducing number of feature variables (independent variables)/frequencies so you don’t overfit and use the most important variables to explain outcome (similar to parisomy)
-clustering observations or groping observations based on common characteristics
What difference between deep learning and reinforcement learning?
- deep learning learns from labeled data (deep learning is like teaching the robot to learn on its own by looking at lots of examples).
- reinforcement learning learns by interacting with an environment and receiving rewards or penalties for its actions, recording rewards or penalties, then learning with another interaction
What is generalization in machine learning algorithms?
- how well model explains training data and applies this knowledge to new data
model that doesn’t explain training data very well is considered underfit
model that explains training data too well it’s considered overfit
The dataset for machine learnings models are usually divided into 3 samples what are the 3 samples and uses of 3 samples?
- training sample: sample used to find relationship and is in sample data
- validation sample: validate and fine tune data and is in sample data
- test sample: test the model on new data which is a small portion of the total data set, is out of sample data
What is difference between bias errors, variance errors, and base errors?
- bias error: oversimplifying a model causing underfitting (often due to not enough independent variables that explain the dependent variable)
- variance error: making model too complex or overfitting (often due to too many independent variables that explain the dependent variable too well)
- based on error: model that is good fit/robust, good balance between bias error and variance error.
What are two methods for addressing overfitting models? CC
- complexity reduction: limiting number of features (independent variables) and penalizing algorithms that are too complex. (achieved by include only parameters that reduce out of sample errors)
- cross validation: divides data into training sample, validation sample, and test sample and then sees how well model generalizes data in the unseen samples
What is penalized regression in supervised machine learning, and what is noise?
-Penalized regression: technique used in machine learning to prevent models from becoming too complex and overfitting the training data. Overfitting happens when a model learns not just the underlying patterns but also the noise in the data, which makes it perform poorly on new, unseen data.
noise refers to random, irrelevant, or erroneous information that doesn’t represent the true underlying patterns you’re trying to learn (eg. Measurement errors when collecting data, outliers, etc)
How does penalized regression solve complex models or assigning excessively large coefficients to some features?
- Penalized regression addresses this by adding a penalty term to the loss function, discouraging the model from relying too heavily on any single feature or using too many features unnecessarily.
How does the lasso work, and what happens to coefficients as lasso increases?
- as lasso increases the coefficients decrease, shrinking the effect of each independent variable
- machine model will create coefficients,as lasso increases each coefficient for each independent variable will decrease, eventually reducing some not as important coefficients down to 0
What is support vector machine in supervised machine learning?
- a system to help a model classify an image or data into one of 2 classes
- comparing apples to oranges, you graph apples on one side and oranges on the other side. Draw a line down the middle called the separating hyperplane. then draw 2 more parallel lines to the hyperplane called support vectors, one for apples and one for oranges. when the machine is given a new fruit, it’ll look at the graph to see where the fruit falls on the graph.
What happens if data points falls within the support vectors and hyperplane, called the margin?
- within the margin (the space between the hyperplane and the support vectors), it means the point is in a more uncertain area — the model is less confident about its classification.
What is k nearest neighbor in supervised machine learning?
- machine learning how to classify data or what to do with data based on nearest neighbors of data points. It means follow the majority of what your neighbor data points are doing.
- eg. you see a bunch of kids playing soccer and basketball on the playground. Pick a Number (k): Let’s say k = 3. You decide to check what the 3 closest kids are playing. Count the Games: If 2 kids are playing soccer and 1 kid is playing basketball, you choose soccer because more kids near you are playing it.
What is classification & regression tree in supervised learning (CART)?
- CART is like a decision tree that helps you figure things out by asking yes/no questions until you reach an answer.
eg.
1. Is it sunny?
• Yes → Go to the next question
• No → Stay inside
- Is it hot outside?
• Yes → Play outside with water balloons
• No → Play outside with a ball
What is ensemble learning in supervised machine learning?
Ensemble learning: like having a team of experts instead of relying on just one person’s opinion. The idea is to combine multiple models (called weak learners) to create a stronger, more accurate model. working together these models make better predictions than any single model could on its own!
What are 3 types of ensemble learning techniques for supervised machine learning? VBR
- voting classifiers: following majority. (eg. 4 models say default 3 models say no default, you go with default)
- bagging (bootstrap aggregating): training different models independently on different subsets of data, the model will either say yes bankruptcy or no bankruptcy and go with majority
- random forest: an algorithm that combines multiple decision trees, each trained on a random subset of the data and features to make predictions
What is principal component analysis in unsupervised machine learning, and what can’t principal component analysis be used for?
- takes many features or observations and groups them into just a few composite variables. composite variables with the most explanatory power (aka line of best fit) are graphed first then least explanatory power composite variable is plotted on same graph as the observations. composite variables should be uncorrelated with each other so when the 2nd composite variable is graphed it won’t be parallel to the first composite variable
- (can’t be used for regression problems)
What are eigenvectors and eigenvalue?
- eigenvectors (direction): tells you best way to combine independent variables (eg. Combine height and weight because they change together)
- eigenvalue (importance): tells us how important or how much variation the independent variable contributes. so if eigenvalue is higher the independent variable is going to be higher and more explanatory for the dependent variable
What is projection error in principal component analysis?
- distance between the observation and the most explanatory composite variable (aka line of best fit)
What is a scree plot?
- graphing each composite variable based on their explanatory power from most explanatory to least explanatory composite variable. most people want it to be around 80-90% explanatory
What is clustering for unsupervised machine learning?
- process of organizing observations into groups that share common features.
- PCA is different because it focuses on reducing observations into composite variables