Advanced Flashcards
(18 cards)
What is machine learning in the context of building mathematical models?
Machine learning automates the process of building mathematical models out of data.
What is linear regression?
Linear regression is a linear model that fits the best fit line through data to establish the relationship between independent and dependent variables.
What is the equation of the best fit line in linear regression?
The equation can be given as: Y = ax1 + bx2 + c, where a and b are coefficients, and c is the constant.
How does linear regression minimize errors?
Linear regression tries to fit the line so that the predicted values are closer to the observed values.
How does linear regression automate model building?
It automatically finds the best fit line that has the minimum error.
What is the train-validation split method?
In this method, the training set is divided into two parts: one for training and the other for validating model performance.
What is K-fold cross-validation?
K-fold cross-validation divides the training set into k-folds, using each fold for testing while the others are used for training.
What is correlation?
Correlation is a statistical measure that expresses the strength of a linear relationship between two quantitative variables.
What is multicollinearity?
Multicollinearity occurs when two or more variables have a strong linear relationship, making it hard for models to distinguish their individual effects.
How can multicollinearity be detected?
It can be detected using the Variance Inflation Factor (VIF).
How would you explain a Decision Tree to a non-tech person?
A decision tree is an inverted tree representation that mimics human decision-making by representing possible solutions based on conditions.
What is pre-pruning in decision trees?
Pre-pruning restricts the decision tree before it grows fully by bounding its depth.
What is post-pruning in decision trees?
Post-pruning allows the tree to grow fully and then prunes sub-trees that do not provide significant information.
How does a Random Forest model differ from using ‘n’ decision trees?
A Random Forest uses bootstrapping in rows and columns, creating different datasets for each tree, while ‘n’ decision trees use the same training data.
What is the elbow method in K-Means clustering?
The elbow method involves iterating over a range of K values and calculating the within-cluster sum of squares (WCSS) to identify the optimal number of clusters.
Why is understanding the bias-variance trade-off important?
It helps balance model complexity to avoid underfitting (high bias) and overfitting (high variance) for better performance.
Is the bias-variance trade-off applicable to Deep Neural Networks?
Yes, but it may not strictly apply as neural networks can handle increased complexity and data better than traditional algorithms.
How do neural networks compare to other non-linear ML algorithms in decision boundaries?
Neural networks create the most complex decision boundaries due to their hierarchical nature, while Decision Trees create piecewise linear splits.