Class Five Flashcards

1
Q

What is Boosting in machine learning?

A

Boosting is a machine learning technique that combines multiple weak learners (models) to create a strong learner. It sequentially trains models, giving more weight to misclassified instances to improve overall prediction accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Gradient Boosting?

A

Gradient Boosting is a boosting algorithm that builds an ensemble of weak prediction models in a stage-wise manner, where each new model corrects the errors made by the previous models by minimizing a loss function using gradient descent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the advantages of Gradient Boosting?

A

Advantages of Gradient Boosting include high prediction accuracy, handling of complex data relationships, and ability to capture interactions between features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the limitations of Gradient Boosting?

A

Limitations of Gradient Boosting include potential overfitting if the model is too complex, sensitivity to noisy data, and longer training time compared to other algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Support Vector Machine (SVM)?

A

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. It finds an optimal hyperplane that separates data points of different classes with the maximum margin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the advantages of Support Vector Machines (SVM)?

A

Advantages of SVM include effective in high-dimensional spaces, robust against overfitting, and versatility with different kernel functions.

  • Easy training. No local optimal.
  • Scales well
  • Trade-off between classifier complexity and error can be controlled
    explicitly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the limitations of Support Vector Machines (SVM)?

A

Limitations of SVM include sensitivity to the choice of kernel function and hyperparameters, computational complexity for large datasets, and difficulty in handling noisy or overlapping data.
* Weakness: Efficiency depends on choosing kernel function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the C parameter in SVM?

A
  • C is trade-off between training error and flatness of solution.
  • Tells how much outliers are considered in calculation.
  • Aim: Keep training error small but need to generalize as well.
  • Larger C means less training error but risks losing generalization.
  • Smaller C means classifier is flat.
  • Grid search can be used to estimate C.
  • RBF-SVM: two parameters (C and gamma (the radius of RBF))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is sampling error?

A

Sampling error is the difference between the characteristics observed in a sample and the true characteristics of the population it represents. It arises due to random sampling variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is sampling bias?

A

Sampling bias occurs when the sample used in a study or analysis is not representative of the entire population, leading to systematic errors and inaccurate generalizations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Type 1 and Type 2 errors?

A

Type 1 error, also known as a false positive, occurs when a true null hypothesis is incorrectly rejected. Type 2 error, or false negative, happens when a false null hypothesis is incorrectly retained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a p-value?

A

A p-value is a statistical measure that helps determine the significance of results in hypothesis testing. It represents the probability of observing the data or more extreme results if the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the limitations of p-values?

A

Limitations of p-values include reliance on arbitrary thresholds for significance, susceptibility to sample size effects, and potential misinterpretation leading to erroneous conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can sampling errors be reduced?

A

Sampling errors can be reduced by increasing the sample size, ensuring random sampling, and minimizing non-response rates to obtain a more representative sample of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can sampling bias be addressed?

A

Sampling bias can be addressed by using appropriate sampling techniques (e.g., stratified sampling), ensuring diverse and unbiased participant selection, and accounting for potential biases in data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can Type 1 and Type 2 errors be controlled

A

Type 1 and Type 2 errors can be controlled by adjusting the significance level (alpha) for hypothesis testing, increasing the sample size to improve statistical power, and conducting thorough statistical analyses to minimize errors.