1.1.2 Introduction to Data Science - Business Problems and Data Science Solutions Flashcards

Question 1

Q

Describe when each type of data mining algorithm, such as classification, regression, similarity matching, clustering, co-occurrence grouping, profiling, link-prediction, data reduction, and causal modeling, should be used.

Answer

A

Classification: When you want to predict what classes and individual belongs to
Regression: When you want to estimate or predict a numerical variable for each individual
Similarity matching: When you want to identify similar individuals based on data known about them
Clustering: When you want to group individuals in a population together by their similarity, but not driven by any specific purpose
Co-occurrence grouping: When you want to find associations between entities based on transactions involving them
Profiling: When you want to characterize the typical behavior of an individual, group, or population
Link-prediction: When you want to predict connections between data items
Data reduction: When you want to take a large data set and replace it with a smaller set for efficiency reasons
Causal modeling: When you want to understand what events or actions actually influence others

Question 2

Q

Explain the differences between regression and classification.

Answer

A

Regression estimates or predicts a numerical value of a variable; classification groups items (e.g. people, companies) into classes and has a categorical target

Question 3

Q

Contrast supervised learning with unsupervised learning.

Answer

A

Supervised learning has a specific target (e.g. will a customer leave after contract expires?) while unsupervised learning has no specific target or purpose stated (e.g. do customers fall into different groups?

Question 4

Q

List the algorithms that can be used for supervised and unsupervised learning.

Answer

A

Supervised: Regression, classification
Unsupervised: clustering

Question 5

Q

Contrast data mining with the use of data mining results.

Answer

A

Data mining: mining data to produce a model
Use of data mining result: Applying the model to new data

Question 6

Q

List and describe the steps used in Cross Industry Standard Process for Data Mining (CRISP-DM).

Answer

A

1. Business Understanding- business problem should be modelled as one or more data science problems
2. Data Understanding - strenghts/weaknesses and cost benefit analysis of the data
3. Data Preparation - often data needs to be manupulated to produce better results (e.g. in tabular form)
4. Modelling- model or pattern that identifies consistencies in data
5. Evaluation - assess data mining results for validity
6. Deployment - Use data mining result to receive return on investment

Question 7

Q

Explain the reason for having an iterative process involved in CRISP-DM.

Question 8

Q

Describe the characteristics of credit card and Medicare fraud.

Answer

A

Nearly all credit card fraud is caught (either by customer or cc-company), thus credit card transactions have reliable labels (good for supervised techniques)

Medicaire fraud is more complicated and requires unsupervised learning approaches such as profiling, clustering.

Question 9

Q

List the reasons for deploying the data mining system itself rather than the models produced by a data mining system.

Answer

A

the world may change faster than the data science team can adapt with fraud and intrusion detection
a business has too many modeling tasks or their data science team to manually curate each model individually.

Brainscape's Knowledge GenomeTM

1.1.2 Introduction to Data Science - Business Problems and Data Science Solutions Flashcards

Brainscape's Knowledge Genome^TM