turning term exam 1 Flashcards
(40 cards)
data mining is a
process
the following stage in data mining involves digging beneath the surface to uncover the structure of the business problem and the data that are available and then match them to one or more data mining tasks for which we may have substantial science and technology to apply
data understanding
the following is not an example of machine learning tasks
calculating the annual profit or loss
” a computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T as measured by P improves with experience E
the training data is referred to as
E
CRISP-DM is a codification for the data mining process which starts with the following stage
business and data understanding
supervised machine learning methods include the following
none of the above
business data science requires the combination of the know-how in the following
business domain expertise, mathematics, statistics, computer science
the diagram above depicts the following type of machine learning systems
unsupervised
the following machine learning algorithm attempts to find associations between entities based on transactions involving them
co-occurrence grouping
the following is not mentioned in chapter 1 of the data science for business book
IBM
in the diagrams the amount of shading corresponds to
total entropy
based on the diagrams which single attribute would you select to spilt between edible and poisonous mushrooms
spore print color
entropy is a measure of
purity
a basket contains 10 apples and nothing else a bowl contains 5 cherries and nothing else the entropy values of the set of apples in the basket and the set of cherries in a bowl are
0 and 0 respectively
a supervised segmentation with tree structured modeling can be done by recursively selecting the best attribute from multiple attributes based on their
information gain
in the diagram which are considered as nodes
all of the above (employed; balance and age; class write off and class not write off)
the following is an alternative to the entropy measure of information
gini impurity
in a decision tree a terminal node is also known as a
leaf
the cellular phone churn prediction problem discussed toward the end of chapter 3 uses a historical data set 20000 customer to measure the accuracy of the tree model the authors used a training set consisting of
50% customers who churned and 50% customers who did not churn
an instance is also called a
feature vector
the objective function of support vector machines is based on the idea that
the wider the bar is between the classes, the better
with linear regressions the goal is to find a model that gives the
minimum sum of squared errors
a model is a BLANK of reality created to serve a purpose
sampled representation
the following describes the parametric learning approach
start by specifying the structure of the model and then continue with calculating the best parameter values given a particular set of training data