Exam Flashcards

Question

Bias

Answer 1

due to over-simplifications.

Answer 2

due to over-complication. overly complex model will be unable to perfectly generalize and correctly predict the target variable

Answer 3

algorithm assigns each data point to a class based on the class of its nearest points

Answer 4

provides information on different aspects of the classifier

Answer 5

proportion of correct positive (event) predictions to all positive predictions Therefore = TP/(TP+FP)

Answer 6

recall for class x indicates the proportion of correct positive predictions to all true positive cases. = TP/(TP+FN)

Answer 7

the harmonic mean of the precision and recall for each class

Answer 8

support indicates the number of each class we had in our testing data

Answer 9

number of correct predictions over the total number of predictions

Answer 10

the ROC curve is a plot that depicts how the true positive rate changes with respect to the false positive rate

Answer 11

close to 0

Answer 12

close to 1

Answer 13

helps us bring all features into the same scale

Answer 14

the result will be often mapped to a binary outcome

Answer 15

supervised learning of the classification type

Answer 16

Always be positive always be between 0 and 1

Answer 17

is the probability of that event over its complement.

Answer 18

the odds could be any non-negative value

Answer 19

are the estimators of the model. also called predicted weights or the coefficients for each of the features

Answer 20

understanding what the data entails and identify anomalies, missing values, inconsistencies, etc.

Answer 21

activities include imputing missing values, removing missing values, addressing outliers, fixing variables that have inconsistent data

Answer 22

bringing data into a structured form used for the analysis

Answer 23

data may need to be transformed rescaled or normalized

Answer 24

if data is not provided to you we need to collect it

Answer 25

each member of the population has the exact same probability of being selected in the sample

Answer 26

members of the population are selected based on a system (set of rules)

Answer 27

population is divided into homogeneous slices (strata). Within each slice simple random sampling is performed and he results are combined (reduces sampling bias and improves accuracy of sampling)

Answer 28

the population is divided into subgroups, such that each cluster is a good representative of the population.

Answer 29

Q1 - 1.5IQR

Answer 30

Q3 - 1.5IQR

Answer 31

it is smaller than the lower fence or larger than the upper fence.

Answer 32

we do one-hot encoding, variables created using one-hot will be used in place of the categorical variable

Answer 33

each category of the categorical variable is assigned a number based on some order

Answer 34

is a mathematical relationship between the features of a problem and the target variable that is to be predicted.

Answer 35

is a parametric method, requires a response variable (target) and one or multiple predictor variables (features)

Answer 36

produces a line that minimizes the sum of squared error

Answer 37

y is the actual value of the target variable y-hat is the predicted value of the target variable

Answer 38

is the difference between y and y hat

Answer 39

coefficient of determination

Answer 40

Mean Squared Error

Answer 41

Root Mean Squared Error

Answer 42

is an indicator that determines the goodness of our model's fit to the data, always between 0 and 1, a higher value is preferred

Answer 43

measure that evaluates the average of the squared deviation between the values of the target and the predicted values of the target. Smaller values of MSE are preferred. a value of 0 is ideal but not possible

Answer 44

RMSE is the average amount of deviation of data points from the regression line

Answer 45

explicitly accounts for the number of explanatory variables. It is common to use adjusted R^2 for model selection because it imposes a penalty for any additional explanatory variable that is included in the analysis. Only increases when a new variable is added to the model that contributes to the prediction.

Answer 46

the repeated splitting of nodes until we reach pure subsets is the building block of the classification and regression trees (CART) algorithm

Answer 47

the decision tree is a classification tree

Answer 48

the decision tree is a regression tree

Answer 49

measures the degree of impurity of a set of classes in the target variable

Answer 50

randomly pick k centroids from the sample points as initial cluster centers

Answer 51

assign each sample to the nearest centroid

Answer 52

Move the centroids to the center of the samples that were assigned to it

Answer 53

repeat step 2 and 3 until maximum number of iterations is reached

Answer 54

find the value of k, where the decrease in inertia slows down as k increases.

Answer 55

sum of squared distances between data points in each cluster and their cluster centre