teste 2 Flashcards

Question

What is Gradient descent and it's goal

Answer 1

Gradient descent is the minimum of the derivative of the MSE.

Answer 2

Stochastic (gd): a point in the GD is chosen at random in each step. It is fast, and good for redundant data. Batch(gd): all samples are tested per iteration MInibatch(gd): sub set per iteration

Answer 3

see images on step size

Answer 4

Ridge regression Lasso

Answer 5

In ridge regression, some bias is applied to the final loss function of the trained data. This bias, when applied to the testing data, will in the long term, provide better fittings to the data. This is a way to avoid overfitting.

Answer 6

a linear regression model is dependent on the training data. However the training dataset is not representative of the set/samples. So a small amount of bias is introduced in the error function- causing the function not a align with the trainingset perfectly. ---Make model shit during training As the iterations of fitting go on, the final fit should be better than the simple one. This fights overfitting. We do not want Mse to be zero all the time.

Answer 7

MAKINg sure that both mse and the slope have the same units controls the severity of the penalty to the mse. decreased the slope.

Answer 8

Lasso: least absolute shrinkage and selection operator regression The penalty is not squared, but abs, it is very similar to ridge, but it can achieve slope 0.

Answer 9

to erase contribuition from useless parameters in the determination of y

Answer 10

it is ml technique uselfuu to avoid overfitting and underfitting it determines the balance between overfitting and underfitting by determining the number of steps to take. A plot is usually created, Loss/Iteration . The training set always decreases in loss with the number of iterations. At each step the model should be testing with the testing data and if the loss starts to increase (in the testing datset) starts to increase then the model should stop.,

Answer 11

preditc if somthing is (T of F / 0 or 1) the cost function takes into account both options

Answer 12

No it is not

Answer 13

GINI and Entropy

Answer 14

1-sum(square of probabilities)

Answer 15

Tree Knn Voronoi SVM

Answer 16

it is a trick used when a way to divide data with a line is not possible. What is done is the mapping is changed and then divided and then reverted

Answer 17

n of parameters is fixed in order to predict class

Answer 18

simple, fast, less data, contrained and has poor fits

Answer 19

as probabiliadeds de classificação apriori são gaussinaas, e o n de para,metros é fixo

Answer 20

um método generativo de ML em a divisão de informação pode ser feita com gaussianas? Utiliza a silverman's rule para determinara espessura das gaussianas

Answer 21

Processo de classificação baseado na maximazação de classificações corretas. \ Baseado no teorema p(y|x) *p(x) = p(x|y)*p(y)

Answer 22

com o aumento de dimencionamento, o aumento do n de dados aumento exponecialmente

Answer 23

é como o bayes classification, mas as features sao dadas como independentes, ou seja a probabiliade aprioir de um set de dados é dada pela probabilidade apriori de dado 1*dado2*dado..... p(x|class ) = p(x1|class)*p(x2|class)*....

Answer 24

maximinse p(y|x)

Answer 25

combining ^y combining p(y|x) ensamble

Answer 26

bagging(pasting) random forest boosting stacking

Answer 27

the set is devided and a classifier is applied to each set

Answer 28

selection of samples taht dindr workn and boosting the second round of classifiers with those.

Answer 29

train classifeir on the prediction o fseleveal classisfiers

Answer 30

no known classifications are available the data is the only information

Answer 31

Filter method and wrapper method and embebed

Answer 32

they filter out the features taht are not distinctive (if two features are highly correlated, then they are excluded)

Answer 33

checaks the data correlation (not the correlation between two feautes) like the filter method but for data. If the data on a feautre is very widespread than that feaure is not good! Features are ranked

Answer 34

there are tow modes: lets talk foawrd each feature is tested with the classifeir. the best features i then added to a list next the classifer is ran again against all the features with the best feature from before. until therror is no longer decreasinf

Answer 35

formation of groups through the adjusment of the groups centrod via many iterattins. done with k-means

Answer 36

minimixação da distancia dos pontos à centriod que muda com as iterações

Answer 37

as posições dos centroides podem depender da posição inicial randomizda e o numeor de clusters selcionado iniclamente pode vir a fazer merda

Answer 38

clustering hierarquico

Answer 39

that bubble shit man

Answer 40

A Gaussian mixture model (GMM) is a probabilistic model that assumes that the instances were generated from a mixture of several Gaussian distributions whose parameters are unknown.

Answer 41

obriga datasets a aser as imbalanced as the data itself.

Answer 42

computer explains the reason be3hind its decisions

Answer 43

computer selects data and the user classifies it

Answer 44

ML is trained with new data set

teste 2 Flashcards

(69 cards)