Chapter 2 Flashcards
(49 cards)
Data science is a concept used to tackle big data and includes data cleansing, preparation, processing, and ______.
analysis
A data scientist gathers data from multiple sources and applies ______ ______ to extract critical information from the collected data sets.
machine learning
The key idea is to convert real-world problem into well-defined ______ ______ problem, so that it can be solved using machine learning.
data science
Data mining is the process of analyzing vast amounts of data from various sources to extract ______ ______.
useful information
Data mining is done through the discovery of previously unknown patterns, correlations, and ______, which can then be used to predict future outcomes.
anomalies
In binary classification, there are only ______ possible classes (or labels) for each instance.
two
An example of binary classification is ______ ______ where an email is either spam or not spam.
Spam detection
In multiclass classification, there are more than ______ possible classes for each instance.
two
An example of multiclass classification is ______ ______ where handwritten digits (0-9) need to be classified into one of the ten classes.
Digit recognition
The key difference between binary and multiclass classification regarding the number of classes is that binary has 2 classes, while multiclass has ______ or more.
3
In multiclass classification, the output is a ______ of ______, with each probability corresponding to a different class.
vector,probabilities
Supervised learning discovers patterns in the data that relate data attributes with a ______ (______) attribute.
target,class
In supervised learning, each data point in the training set has an associated ______ or ______.
label,output
Two examples of tasks in supervised learning are ______ and ______.
Classification,Regression
Unsupervised learning involves training a model on a dataset without any ______ ______.
labeled responses
The goal of unsupervised learning is to discover underlying ______, ______, or ______ in the data.
patterns,structures,relationships
Two examples of tasks in unsupervised learning are ______ and ______ ______.
Clustering,Dimensionality Reduction
Supervised learning has a ______ mechanism, while unsupervised learning does not.
feedback
Supervised learning is used for ______, while unsupervised learning is used for ______.
prediction,analysis
Algorithms for supervised learning include decision trees, logistic regressions, and ______ ______ ______.
support vector machine
Algorithms for unsupervised learning include k-means clustering, hierarchical clustering, and ______ ______.
apriori algorithm
A ______ Problem in data science asks ‘How much? How many?’
Regression
A ______ Problem in data science asks ‘Is it type A or type B or type C?’
Classification
A ______ Problem in data science asks ‘How is the data organized?’
Clustering