Machine Learning Flashcards by Shannon Smith

What is machine learning?

technology that allows computers to learn from data without having to program it

subset of artificial intelligence (AI) that enables computers to learn from data without having to program it, using algorithms to identify patterns and use this knowledge to make predictions or decisions

How well did you know this?

Not at all

Perfectly

What’s difference between supervised machine learning and unsupervised machine learning?

-supervised learning uses labeled data to train models for prediction or classification (eg. Mom telling you this is a teddy bear)
-unsupervised learning uses unlabeled data to discover patterns and structures (you have to figure out what it is based on group common characteristics)

How well did you know this?

Not at all

Perfectly

What is difference between labeled data and unlabeled data?

labeled date: cat picture with title cat
unlabeled data: cat picture

How well did you know this?

Not at all

Perfectly

In supervised machine learning what’s the difference between regression problems and classification problems?

regression problem: what will weather be tomorrow
classification problem: will it be hot or cold tomorrow

How well did you know this?

Not at all

Perfectly

What is the difference between dimension reduction and clustering in unsupervised machine learning?

-dimension reduction: reducing number of feature variables (independent variables)/frequencies so you don’t overfit and use the most important variables to explain outcome (similar to parisomy)

-clustering observations or groping observations based on common characteristics

How well did you know this?

Not at all

Perfectly

What difference between deep learning and reinforcement learning?

deep learning learns from labeled data (deep learning is like teaching the robot to learn on its own by looking at lots of examples).
reinforcement learning learns by interacting with an environment and receiving rewards or penalties for its actions, recording rewards or penalties, then learning with another interaction

How well did you know this?

Not at all

Perfectly

What is generalization in machine learning algorithms?

how well model explains training data and applies this knowledge to new data

model that doesn’t explain training data very well is considered underfit

model that explains training data too well it’s considered overfit

How well did you know this?

Not at all

Perfectly

The dataset for machine learnings models are usually divided into 3 samples what are the 3 samples and uses of 3 samples?

training sample: sample used to find relationship and is in sample data
validation sample: validate and fine tune data and is in sample data
test sample: test the model on new data which is a small portion of the total data set, is out of sample data

How well did you know this?

Not at all

Perfectly

What is difference between bias errors, variance errors, and base errors?

bias error: oversimplifying a model causing underfitting (often due to not enough independent variables that explain the dependent variable)
variance error: making model too complex or overfitting (often due to too many independent variables that explain the dependent variable too well)
based on error: model that is good fit/robust, good balance between bias error and variance error.

How well did you know this?

Not at all

Perfectly

What are two methods for addressing overfitting models? CC

complexity reduction: limiting number of features (independent variables) and penalizing algorithms that are too complex. (achieved by include only parameters that reduce out of sample errors)
cross validation: divides data into training sample, validation sample, and test sample and then sees how well model generalizes data in the unseen samples

How well did you know this?

Not at all

Perfectly

What is penalized regression in supervised machine learning, and what is noise?

-Penalized regression: technique used in machine learning to prevent models from becoming too complex and overfitting the training data. Overfitting happens when a model learns not just the underlying patterns but also the noise in the data, which makes it perform poorly on new, unseen data.

noise refers to random, irrelevant, or erroneous information that doesn’t represent the true underlying patterns you’re trying to learn (eg. Measurement errors when collecting data, outliers, etc)

How well did you know this?

Not at all

Perfectly

How does penalized regression solve complex models or assigning excessively large coefficients to some features?

Penalized regression addresses this by adding a penalty term to the loss function, discouraging the model from relying too heavily on any single feature or using too many features unnecessarily.

How well did you know this?

Not at all

Perfectly

How does the lasso work, and what happens to coefficients as lasso increases?

as lasso increases the coefficients decrease, shrinking the effect of each independent variable
machine model will create coefficients,as lasso increases each coefficient for each independent variable will decrease, eventually reducing some not as important coefficients down to 0

How well did you know this?

Not at all

Perfectly

What is support vector machine in supervised machine learning?

a system to help a model classify an image or data into one of 2 classes
comparing apples to oranges, you graph apples on one side and oranges on the other side. Draw a line down the middle called the separating hyperplane. then draw 2 more parallel lines to the hyperplane called support vectors, one for apples and one for oranges. when the machine is given a new fruit, it’ll look at the graph to see where the fruit falls on the graph.

How well did you know this?

Not at all

Perfectly

What happens if data points falls within the support vectors and hyperplane, called the margin?

within the margin (the space between the hyperplane and the support vectors), it means the point is in a more uncertain area — the model is less confident about its classification.

How well did you know this?

Not at all

Perfectly

What is k nearest neighbor in supervised machine learning?

Study These Flashcards

machine learning how to classify data or what to do with data based on nearest neighbors of data points. It means follow the majority of what your neighbor data points are doing.
eg. you see a bunch of kids playing soccer and basketball on the playground. Pick a Number (k): Let’s say k = 3. You decide to check what the 3 closest kids are playing. Count the Games: If 2 kids are playing soccer and 1 kid is playing basketball, you choose soccer because more kids near you are playing it.

What is classification & regression tree in supervised learning (CART)?

Study These Flashcards

CART is like a decision tree that helps you figure things out by asking yes/no questions until you reach an answer.

eg.
1. Is it sunny?
• Yes → Go to the next question
• No → Stay inside

Is it hot outside?
• Yes → Play outside with water balloons
• No → Play outside with a ball

What is ensemble learning in supervised machine learning?

Study These Flashcards

Ensemble learning: like having a team of experts instead of relying on just one person’s opinion. The idea is to combine multiple models (called weak learners) to create a stronger, more accurate model. working together these models make better predictions than any single model could on its own!

What are 3 types of ensemble learning techniques for supervised machine learning? VBR

Study These Flashcards

voting classifiers: following majority. (eg. 4 models say default 3 models say no default, you go with default)
bagging (bootstrap aggregating): training different models independently on different subsets of data, the model will either say yes bankruptcy or no bankruptcy and go with majority
random forest: an algorithm that combines multiple decision trees, each trained on a random subset of the data and features to make predictions

What is principal component analysis in unsupervised machine learning, and what can’t principal component analysis be used for?

Study These Flashcards

takes many features or observations and groups them into just a few composite variables. composite variables with the most explanatory power (aka line of best fit) are graphed first then least explanatory power composite variable is plotted on same graph as the observations. composite variables should be uncorrelated with each other so when the 2nd composite variable is graphed it won’t be parallel to the first composite variable
(can’t be used for regression problems)

What are eigenvectors and eigenvalue?

Study These Flashcards

eigenvectors (direction): tells you best way to combine independent variables (eg. Combine height and weight because they change together)
eigenvalue (importance): tells us how important or how much variation the independent variable contributes. so if eigenvalue is higher the independent variable is going to be higher and more explanatory for the dependent variable

What is projection error in principal component analysis?

Study These Flashcards

distance between the observation and the most explanatory composite variable (aka line of best fit)

What is a scree plot?

Study These Flashcards

graphing each composite variable based on their explanatory power from most explanatory to least explanatory composite variable. most people want it to be around 80-90% explanatory

What is clustering for unsupervised machine learning?

Study These Flashcards

process of organizing observations into groups that share common features.
PCA is different because it focuses on reducing observations into composite variables

What is k means clustering in unsupervised machine learning?

- process of organizing observations into k amount of clusters, without overlapping the clusters. in other words you’re trying to minimize the observations within cluster closest to the centroid (center of cluster) and maximize the distance between cluster 1 and cluster 2.

What are steps in k clustering?

1. Assign k amount of clusters 2. Plot data and it will fall into the clusters 3. Find average of cluster 1, and average of cluster 2. 4. Reposition cluster 1 centroid to average point, and reposition cluster 2 centroid to average point. 5. Repeat process 2-4 over & over until the centroids don’t need to be repositioned.

What is hierarchical clustering?

- similar to k clustering it organizes data in clusters. except the # of clusters aren’t predetermined.

What are the 2 types of hierarchical clustering?

- agglomerative clustering (bottom up clustering): starts with all observations, clusters the two closest observations into 1, then clusters the next 2 closed observations into 1. - divisive clustering (top down clustering): starts with all observations and one big cluster with all the data, then the cluster is broken down into 2 clusters, then 3 clusters and so on.

What is a dendogram in hierarchical clustering?

- tree like structure that shows how data points are grouped together at different levels of similarity Connecting a & b is called dendrites in tree gram, vertical line Distance between a & b is called arches in tree gram, or horizontal line eg. points on a graph a, b, c, d, e, f, g, h, Cluster a & b, cluster, c & d, cluster e & f, cluster g & h, etc.

What are neural networks and 3 layers?

- machine learning algorithms modeled after humans 1. Input layer (data received) (eg. Friends look at puzzle piece and pass it along) 2. Hidden layer (where learning takes place) (figure out how puzzle pieces fit together, talk to each other, share ideas, etc) 3. Output layer (exports information) (makes final guess on puzzle, repeats until they get it right)

What’s the difference between neural networks and deep learning?

- deep learning has many more hidden layers (huge team of friends with many groups, each learning a small part of the puzzle) (at least 3 hidden layers, usually 10-20 hidden layers)

What size of data set is support vector machine best for, is SVM affected by outliers, and is SVM for supervised or unsupervised learning?

- best suited for small to medium size data sets - unaffected by outliers that plot beyond the support vectors - supervised learning

What is formula for calculating new weight in neural networks?

old weight - (learning rate * partial derivative)

What is formula for calculating IDF (inverse document frequency)?

(particular word occurrence/ corpus) = DF IDF = log (1/df)

Machine Learning Flashcards

(34 cards)