ML-Midterm Flashcards
(37 cards)
What is Machine Learning?
machine learning is about modeling based on a specific hypothesis function, ideally in the form of a density function that describes the data.
What is supervised learning and unsupervised learning ?
One has label the other one does not
What is a model ?
model is an approximation of a system
and
predict the behaviour of it.
gradient descent
gradient descent can find parameters to minimize the loss of the training data
Explain the k-fold cross validation procedure and explain what it is used for ?
Procedure: divide data in k even sets choose 1 set as validation set and k-1 as the training set, then alter the validation set and repeat the operation k times.
It is use for validate the accuracy of trained model.
What is Training set, testing set and validation set ?
Training Set: this data set is used to adjust the weights on the neural network.
Validation Set: this data set is used to minimize overfitting. Then you can use this information to turn your hyper parameter.
Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.
What is hyper parameter?
parameter of the training algorithm, such as learning rates, momentum, or maximum number of iterations
What is information leakage or information contamination ?
you use the test data to train the model and use them to test your model.
What is support vector machine ?
Maximum margin classifier
What is hyperplane in SVM ?
hyperplane is the dividing or separating plane between the two classes
Why SVM called support vectors ?
????
What is Soft margin classifier ?
Allow some overlap of the data points.
Why use kernel ? kernel trick?
Transform the data to higher dimension so it is linearly separable.
To avoid calculate the dot product in the feature space.
What is the purpose of regularization? give example and why it helps overfitting.
Prevent overfitting
Ridge regression we add L2 regularization
to control the complexity of model. It penalizes the features with less influence on the model.
What is Batch gradient decent ?
batch gradient decent use all the data to train.
What is Stochastic gradient decent ?
Stochastic gradient decent use only single data at a time.
How does momentum helps in linear regression.
Momentum helps to overcome the local minimum.
What is Ridge regression ?
We add L2 regularization as a penalty term when updating the weights. Weight decay.
What is a random variable ?
Different every time.
Follows a specific probability density function
Different distribution, uniform, bimodel, multimodel distribution?
one peak , two peak and multiple peak.
What is Bayes theorem ?
p(x|y) = p(y|x)p(x)/p(y)
prior knowledge p(x)
evidence called the likelihood p(y|x) (observed)
posterior distribution p(x|y)
Explain what is maximum likelihood principle ?
Given a parameterize hypothesis function p(y|x; w), we will chose as parameters the values which make the data y most likely under this assumption
Likelihood function ? Maximum (log) likelihood
maximizing the log-likelihood function is equivalent to minimizing a quadratic error term
Why LMS regression is equivalent to MLE for Gaussian data ?
Because the linear dependence of the mean and constant variance.