Module 3 Flashcards

1
Q

What are some PROs & CONs of KDD / Data Mining?

A

o When you use data mining you may not understand why the correlation is there
o Pros
• No need to have a hypothesis first
• Not dependent on single expert
• Can process a higher volume of information than any human being could, enables usage of a more comprehensive data set
• Enables machine aided predictions
• Can reduce data complexity prior human analysis
o Cons
• Depends heavily on the data set used
• Noise in the data set can throw one off
• Based on historical data
• If the future context changes, then performance can drop
• The underlying basic rule may never be discovered
• More complex to understand
• Issue of mistrust- often people don’t trust it, computer and scientist don’t really know it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the key objectives to your BI solutions?

A
  • Make information available faster
  • Reduce the intention span of the user
  • Make information as accessible as possible
  • Minimize the attention users have to invest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the primary functions of the BI Frontend?

A
  • Hides some of the complexities- star schema
  • Help users find what they are looking for
  • Automated suggestions- What makes sense to analyze next
  • Optimal visualization
  • Were to investigate next
  • Formatting and selling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For what could you use clustering?

A

o Customer Segmentation, How similar are transactions
o Customer Segmentation
o Behavioral Analysis
o Batch Failure Analysis
o Impact of Customer Incentives
o Patient recruitment optimization
o Often used as part of predictive analytics- Which clusters are successful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

o How do you differentiate the BI frontend and the DWH

A
  • Demarcation line is at the reporting layer and logical mapping.
  • Logical Mapping- Translate the technical keys into a language business user will understand.
  • DWH stores, aggregates, and brings to it together
  • BI Frontend- Userface to end user and IT person, allow you to interact with the DWH and make something meaningful about it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

o 3 Main types of BI Frontends

A
  • Dashboards- Trigger action, mainly for executives and upper level MGMT. Use dashboards whenever you need to create a visually engaging experience. Like a traffic light.
  • Discovery & Analysis- Deliver engaging information to users when they need it. Track key performance indicators
  • Reporting- Share information for those who want to consume it.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

o What would you consider when deciding your BI strategy? (Single vs. Multiple breed approach)

A
  • Price

* All the ones listed above in the blue. Important tho!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is KDD (Knowledge Discovery in Databases)

A

• Find patterns or correlations in data. It’s the non-trivial process of identifying valid, novel, and potentially useful patterns in data. Data Mining is a step in the KDD process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

o How is the process of gaining insight different between data mining and a traditional hypothesis driven approach?

A
  • Traditional approach- always start with a hypothesis

* Data Mining Approach- No hypothesis in the beginning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does data mining do?

A
  • Groups of data records- Cluster analysis
  • Unusual Records- Anomaly Detection
  • Dependencies- Association Rule Mining
  • Input data and may be used in further analysis in machine learning and predictive analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When would you use it instead of the traditional approach? Data mining

A

When would you use it instead of the traditional approach?
• You have statistically relevant data volumes
• Data likely to contain interesting correlations (cause and effect)
• Decent data quality
• Problem is too complex to formulate hypothesis that can be validated with a reasonable resource investment
• Cause and effect change fast (manual analysis becomes impractical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly