final exam Flashcards
The classification trees classification algorithm:
- estimates how likely data point is to be a member of one group or the other depending on what group the data points nearest to it are in
- uses a tree like structure to illustrate the choices available for each possible decision and its estimated outcome by showing them as separate branches of the tree
- predicts the prob that an instance is member of a certain class by basing the technique on the bayes thm
- utilizes an equation based ont he ordinary least squares ression that can predict the prob of the possible categorical outcoes
2
The naive Bayes classification algorithm:
- estimates how likely data point is to be a member of one group or the other depending on what group the data points nearest to it are in
- uses a tree like structure to illustrate the choices available for each possible decision and its estimated outcome by showing them as separate branches of the tree
- predicts the prob that an instance is member of a certain class by basing the technique on the bayes thm
- utilizes an equation based ont he ordinary least squares ression that can predict the prob of the possible categorical outcoes
3
the knn classification alg:
- estimates how likely data point is to be a member of one group or the other depending on what group the data points nearest to it are in
- uses a tree like structure to illustrate the choices available for each possible decision and its estimated outcome by showing them as separate branches of the tree
- predicts the prob that an instance is member of a certain class by basing the technique on the bayes thm
- utilizes an equation based ont he ordinary least squares ression that can predict the prob of the possible categorical outcoes
1
classification algorithms that do not use assumptions abt the structure of teh data are ___ algorithms
data driven
a good use of classification alg would be:
- estimating the net profit for dishwashers for a major manufacturer
- identifying the seasonal salws for wood stoves over the last 3 yrs
- forecasting sales for a new product
- upselling or cross selling to cuts thru an online store when a cust makes a purchase
4
in a CART model classification rules are extracted from
the decision tree
the knn techique is what type of technique
a classification technique
in setting up the knn model:
- the user allows XLminer to select the optimal value of k
- the optimal k is set by the user at 10
- the data is normalized in order to take into account the categorical variables
- it is necessary to set an optimal value for k
1
Below are the 8 actual values of the target variable in the training position:
(0,0,0,1,1,1,1,1)
What is the entropy of the target variable?
-5/8 log2(5/8)-3/8 log2(3/8)
5/8 log2(5/8)-3/8 log2(3/8)
-3/8 log2(3/8)+5/8 log2(3/8)
-5/8 log2(3/8)+log2(5/8)
1
Classification programs are distinguished from estimation problems in that
- classification problems require the output attribute to be numerical
- classification problems require the output attribute to be categorical
- classification problems do not allow an output attribute
- classification problems are designed to predict future outcomes
2
Which statement is true about the decision tree attribute selection process:
- a categorical attribute may appear in a tree node several times but a numeric attribute may appear at most once
- a numeric attribute may appear in several tree nodes but a categorical attribute may appear at most once
- both numeric and categorical may appear in several tree nodes
- numeric and categorical attributes may appear in at most 1 tree node
2
What is the ensemble enhancement that is a method of creating psudo-data from the data in an og data set? partitioning overfitting sampling bagging
bagging
What is the ensemble enhancement that is an iterative technique that adjusts the weight of any record based upon the last classification bootstrapping boosting sampling bagging
boosing
What is the most often used ensemble enhancement
bagging
What are the 3 most popular methods for creating ensembles?
- sampling, summarizing, random forest
- bagging, boosting, random forest
- bagging, boosting, clustering
- overfitting, clustering, sampling
2
What is one benefit of using an ensemble model?
- it better establishes the relationship bw 1 dep. varaible and multiple ind. variables
- it strengthens the relationship bw the multiple ind. var
- it reduces the number of errors that results
- it is more efficient at adding and removing predictors
3
What is the most common uses of clustering algorithms?
- to min variance and bias error
- to segment cust
- to determine how effectively the model can reorder the data set
- to validate the data set
2
in logit P/(1-p) represents:
the odds of sucess
In a naive bayes model it is necessary that:
-all attributes are categorical
-to partition the data into 3 parts (training, validation, scoring)
-to set cutoff values to less than .75
to have a continuous target variable
1 (ie gender, blood type); can never have cont. variables
Generally, an ensemble method works better, if the individual base model have _____
Assume each indiv. base models have accuracy greater than 50%
-less correlation among predictors
-high correlation amond predictors
-correlation does not have any impact on ensemble output
-none of the above
1
a dendogram is used w which analytics algorithsm? text mining clustering ensemble models all of the above
clustering
What is a bootstrap?
- procedure that allows the data scientists to reduce the dimensions of the training data set
- this is one of many classification type algorithms
- it is a procesure for aggregating many attributes into a few attributes
- it is based on repeatedly and systematically sampling w/out replacement from the data
4
what is clustering
- ensemble algorithm for improving the accuracy of classification models
- could be thought of as a set of nested algorithms whose purpose is to choose weak learners
- it is the process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another
- none of the above
3
Which of the following are not types of clustering?
- k means
- hierarchal
- agglomerative
- splitting
4