Technical Flashcards

(89 cards)

1
Q

What Are the Different Types of Machine Learning?

A
  1. Supervised Learning
  2. Unsupervised Learning
  3. Reinforcement Learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is:

Supervised Learning

A

In supervised machine learning, a model makes predictions or decisions based on past or labeled data.

Labeled data refers to sets of data that are given tags or labels, and thus made more meaningful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is:

Unsupervised Learning

A

In unsupervised learning, we don’t have labeled data. A model can identify patterns, anomalies, and relationships in the input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can the model learn using Reinforcement Learning?

A

Using reinforcement learning, the model can learn based on the rewards it received for its previous action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Overfitting?

A

The Overfitting is a situation that occurs when a model learns the training set too well, taking up random fluctuations in the training data as concepts. These impact the model’s ability to generalize and don’t apply to new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you avoid Overfitting?

A
  • Regularization.
  • Making a simple model.
  • Cross-validation methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Regularization?

A

It involves a cost term for the features involved with the objective function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a ‘Training Set’ in a Machine Learning model?

A

It is a labeled dataset used to train the model by providing examples for it to learn patterns and relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a ‘Test Set’ in a Machine Learning model?

A

It is a dataset used to test the accuracy of the model’s predictions, typically without labels during prediction, to evaluate its performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a typical split ratio for Training and Test sets?

A

Usually 70% of data is used for training and 30% for testing, though it can vary based on preferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why should you separate the Test Set before training the model?

A

To avoid biased testing results and ensure the model is evaluated on unseen data for accurate performance measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you handle missing or corrupted data in a dataset?

A

By dropping those rows or columns, or replacing them with a placeholder value using methods like isnull(), dropna(), or fillna() in Pandas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what type of model tends to work better?

A

When the training set is small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which classifier works best when the training set is large?

A

Naive Bayes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a Confusion Matrix?

A

A table used to measure the performance of an algorithm by comparing actual and predicted values in supervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the two parameters?

A

In a Confusion Matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is accuracy calculated using a Confusion Matrix?

A

Accuracy = (Sum of diagonal values) / (Total observations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a False Positive?

A

A case where the model predicts a positive outcome, but the actual outcome is negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a False Negative?

A

A case where the model predicts a negative outcome, but the actual outcome is positive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the total observation count in a confusion matrix with values 12

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the three stages of building a machine learning model?

A

Model Building, Model Testing, and Applying the Model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What happens during the ‘Model Building’ stage?

A

Choose a suitable algorithm and train it according to the requirement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is done in the ‘Model Testing’ stage?

A

Check the accuracy of the model using test data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is done in the ‘Applying the Model’ stage?

A

Make changes after testing and deploy the final model for real-time projects, while periodically checking and updating it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is Deep Learning?
A subset of machine learning that uses artificial neural networks to enable systems to learn like humans, with multiple 'deep' layers.
26
What is a key difference between machine learning and deep learning regarding feature engineering?
In machine learning, features are manually selected, while in deep learning, the model automatically determines important features.
27
How much training data does machine learning typically need?
A small amount of data.
28
How much training data does deep learning typically need?
A large amount of data.
29
What type of systems does deep learning require?
High-end machines with significant computing power.
30
How does machine learning typically solve problems?
By breaking the problem into parts and solving them individually before combining the results.
31
How does deep learning typically solve problems?
In an end-to-end manner.
32
What are some business applications of supervised machine learning?
Email Spam Detection, Healthcare Diagnosis, Sentiment Analysis, and Fraud Detection.
33
How is supervised machine learning used in Email Spam Detection?
By training a model on labeled emails categorized as spam or not spam.
34
How is supervised machine learning applied in Healthcare Diagnosis?
By training a model on labeled images to detect diseases.
35
What is Sentiment Analysis?
Using algorithms to analyze documents and determine if their sentiment is positive, neutral, or negative.
36
How does supervised machine learning help in Fraud Detection?
By training a model to recognize suspicious patterns to identify possible fraud cases.
37
What is Semi-supervised Machine Learning?
A learning method where the training data contains a small amount of labeled data and a large amount of unlabeled data.
38
How does supervised learning differ from semi-supervised learning?
Supervised learning uses completely labeled data, while semi-supervised learning uses a mix of labeled and mostly unlabeled data.
39
What are the two main techniques used in Unsupervised Machine Learning?
Clustering and Association.
40
What is Clustering in Unsupervised Learning?
Dividing data into subsets (clusters) where data points in each cluster are similar to each other.
41
What is an example use case of Clustering?
Grouping customers based on purchasing behavior for targeted marketing.
42
What is Association in Unsupervised Learning?
Identifying patterns of association between different variables or items.
43
What is an example of Association rule application?
E-commerce websites suggesting other items based on your previous purchases and other customers' habits.
44
What is the difference between Supervised and Unsupervised Machine Learning?
Supervised learning uses labeled data for training, while unsupervised learning uses unlabeled data and lets the algorithm find patterns on its own.
45
What is Inductive Machine Learning?
A learning method that observes instances based on principles to draw conclusions.
46
What is an example of Inductive Machine Learning?
Explaining to a child to avoid fire by showing a video where fire causes damage.
47
What is Deductive Machine Learning?
A learning method that concludes from direct experiences.
48
What is an example of Deductive Machine Learning?
Letting a child touch fire, and after getting burned, they learn it’s dangerous.
49
What type of learning is K-Means?
Unsupervised learning.
50
What type of algorithm is K-Means?
A clustering algorithm.
51
What type of learning is KNN (K-Nearest Neighbors)?
Supervised learning.
52
What type of algorithm is KNN?
A classification algorithm.
53
How does K-Means work?
It groups data points into K clusters where points within each cluster are similar.
54
How does KNN work?
It classifies an unlabeled observation based on the majority class of its K nearest neighbors.
55
Why is the Naive Bayes Classifier called 'naive'?
Because it assumes that all features are independent of each other given the class label.
56
Give an example explaining the 'naive' assumption in Naive Bayes.
A fruit might be classified as a cherry if it’s red and round, assuming these features are independent of each other, even if other fruits share them too.
57
When should you use Classification over Regression?
When your target variable is categorical, such as predicting yes/no, gender, or animal breed.
58
When should you use Regression over Classification?
When your target variable is continuous, like estimating sales, prices, or rainfall.
59
What is a Random Forest?
A supervised machine learning algorithm that builds multiple decision trees during training and outputs the majority decision for classification problems.
60
What is Bias in a Machine Learning model?
The error introduced when a model makes assumptions about data, causing predicted values to be far from actual values.
61
What issue does High Bias cause?
Underfitting — the model misses important relationships between features and target outputs.
62
What is Variance in a Machine Learning model?
The amount a model’s predictions would change if trained on different data.
63
What issue does High Variance cause?
Overfitting — the model captures random noise in the training data instead of the actual pattern.
64
What is the trade-off between Bias and Variance?
Making a model more complex reduces bias but increases variance. The goal is to balance both to minimize total error.
65
What happens in a model with High Bias and Low Variance?
It will be consistent but inaccurate on average (underfitting).
66
What happens in a model with Low Bias and High Variance?
It will be accurate on training data but inconsistent across different datasets (overfitting).
67
What is Precision in a classification model?
The ratio of true positive predictions to the total predicted positives. Precision = TP / (TP + FP)
68
What is Recall in a classification model?
The ratio of true positive predictions to the actual total positives. Recall = TP / (TP + FN)
69
What is a Decision Tree Classification?
A supervised algorithm that builds a tree-like model by splitting the dataset into subsets based on feature values, handling both categorical and numerical data.
70
How does a Decision Tree work?
It breaks the dataset into smaller subsets recursively, developing a tree structure with decision nodes and branches based on feature conditions.
71
What is Pruning in Decision Trees?
A technique to reduce the size of decision trees by removing sections that provide little power, to reduce complexity and prevent overfitting.
72
How is Pruning performed in a Decision Tree?
It can be done top-down from the root or bottom-up starting from the leaf nodes.
73
What is Reduced Error Pruning?
A pruning method where nodes are replaced with their most popular class starting at the leaves, and the change is kept if accuracy is not affected.
74
What are the advantages of Pruning?
It simplifies the model and improves speed while reducing overfitting.
75
What is Logistic Regression?
A classification algorithm that predicts a binary outcome (0 or 1) based on independent variables.
76
How does Logistic Regression decide between 0 and 1?
Using a threshold value, typically 0.5 — values above 0.5 are considered 1, and below 0.5 are considered 0.
77
What is the K Nearest Neighbor (KNN) Algorithm?
A classification algorithm that assigns a new data point to the class most common among its K nearest neighbors.
78
What happens when multiple classes are found in KNN's nearest neighbors?
The new data point is assigned to the class with the majority vote among the K neighbors.
79
How is the value of 'K' chosen in KNN?
It is an integer value greater than 1, often selected based on experimentation or cross-validation.
80
Give a real-world example of KNN classification.
Classifying a black ball based on whether its five nearest neighbors are more like tennis balls, basketballs, or footballs — and assigning it to the majority class.
81
What is a Type I Error in hypothesis testing?
When the null hypothesis is true, but we reject it.
82
What is a Type II Error in hypothesis testing?
When the null hypothesis is false, but we accept it.
83
What is Correlation?
A measure of how strongly two random variables are related, with values ranging from -1 to +1.
84
What is the value range for Correlation?
Between -1 and +1.
85
What is Covariance?
A measure indicating the direction of the linear relationship between two random variables.
86
What is the value range for Covariance?
It can range from negative infinity to positive infinity (-∞ to +∞).
87
What are Support Vectors in SVM?
Data points nearest to the hyperplane that influence its position and orientation in a Support Vector Machine.
88
Why are Support Vectors important in SVM?
Because removing them would alter the position of the hyperplane and change the model’s decision boundary.
89