Machine Learning Flashcards
(24 cards)
What is Machine Learning?
Machine Learning allows computers to learn from data and make predictions or decisions without explicit programming.
What are features in ML?
Features are the input variables (columns) used to make predictions.
What is the label in ML?
The label is the target variable you want to predict.
What are observations?
Observations are the rows or records in your dataset.
What are the three types of Machine Learning?
Supervised Learning, Unsupervised Learning, Reinforcement Learning.
What is Supervised Learning?
A type of ML where the label (output) is known and provided during training.
What is a Regression problem?
A supervised learning task where the label is continuous (e.g., price, temperature).
Give examples of regression algorithms.
Linear Regression, Support Vector Regressor, Decision Tree Regressor, Random Forest Regressor.
What is a Classification problem?
A supervised learning task where the label is categorical (e.g., spam or not, class A/B/C).
Give examples of classification algorithms.
Logistic Regression, Decision Tree Classifier, Random Forest, KNN, Naive Bayes, SVM.
What is Unsupervised Learning?
A type of ML where the model learns patterns or structure from data without known labels.
Give examples of unsupervised algorithms.
K-Means, Hierarchical Clustering, Apriori Algorithm, Anomaly Detection.
What is Reinforcement Learning?
A type of ML where the model learns through rewards and penalties via trial and error.
What is the first step in the ML pipeline?
Identify the problem type: supervised, unsupervised, or reinforcement.
How can you gather data for ML?
Via surveys, company databases, web scraping, interviews.
What happens during data preprocessing?
Clean column names, remove nulls, handle outliers, fix duplicates.
What tools are used in data preprocessing?
Pandas, Excel.
What is the purpose of Exploratory Data Analysis (EDA)?
To understand data distribution, structure, and spot issues before modeling.
What are common EDA methods?
df.head(), df.info(), df.describe(), df.shape, value_counts(), visualization.
How do you split data for training and testing?
Use train_test_split() from sklearn.model_selection.
How do you evaluate a regression model?
Using metrics like MAE, MSE, and RMSE.
How do you evaluate a classification model?
Using accuracy, precision, recall, F1-score, and confusion matrix.
What can you do if your model performs poorly?
Get more data, clean data better, try feature engineering, tune hyperparameters, or switch models.