machine learning Flashcards
(35 cards)
what is accuracy?
correct prediction/ total
what is precision?
TP/(TP+FP)
what is recall?
TP/TP+FP
what is ROC AUC?
ROC curve: receiver operating characteristic curve - plot of TPR (yaxis) and FPR (xaxis) help decide best threshold
AUC: area under the curve higher better help see which categorisation is better
what is entropy?
sum(pxlog(1/px))
What does linear regression solve?
Used for regression problems
How does linear regression work?
It works by fitting a linear equation to observed data. The steps to perform linear regression are:
- First, the sum of squared residuals is calculated.
- Then, this sum is minimized to find the best fit line.
what are the parameters of linear regression?
-
fit_intercept
: Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations. -
normalize
: This parameter is ignored when - -fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm.
What does Naive Bayes solve?
classification problems
How does Naive Bayes work?
It works based on Bayes’ theorem with the assumption of independence between every pair of features. Naive Bayes classifiers work well in many real-world situations such as document classification and spam filtering.
What are the parameters of NB?
-
priors
: Prior probabilities of the classes. If specified the priors are not adjusted according to the data. -
var_smoothing
: Portion of the largest variance of all features that is added to variances for calculation stability.
What does SVM solve?
regression and classification problems
How does SVM work?
It works by finding a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points.
What are the parameters in SVM?
-
C
: Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. Larger C –> narrower margin as it penalise wrong classification more.
-kernel
: Specifies the kernel type to be used in the algorithm. It could be ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable.
-degree
: Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.
-gamma
: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
What is Logistic Regression used for?
binary classification problems
How does Logistic regression work?
It works by using a logistic function to model a binary dependent variable.
What are the parameters of Logistic Regression?
-
penalty
: Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver.
-C
: Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
-fit_intercept
: Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
What is K-Means used for?
clustering problems
How does K-Means work?
It works by partitioning n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
What are the parameters of K-Means?
-
n_clusters
: The number of clusters to form as well as the number of centroids to generate.
-init
: Method for initialization, defaults to ‘k-means++’.
-n_init
: Number of time the k-means algorithm will be run with different centroid seeds.
What is DBSCAN used for?
clustering problems
What is the mechanism of DBSCAN?
It works by defining a cluster as a maximal set of density-connected points. It discovers clusters of arbitrary shape in spatial databases with noise.
What are the parameters of DBSCAN?
-
eps
: The maximum distance between two samples for one to be considered as in the neighborhood of the other.
-min_samples
: The number of samples (or total weight) in a neighborhood for a point to be considered as a core point.
What is baggin and boosting used for?
both regression and classification problems