Midterm Flashcards

(54 cards)

1
Q

Data Analytics

A

The process of evaluating data with the purpose of drawing conclusions to address business questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Big Data

A

Data sets that are too large and complex for business’s existing systems to handle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

IMPACT cycle steps

A

Identify business Questions- what is our target
Master the Data- ETL, How to access data, reliability, data normalization, clean data
Perform the test plan- Pick methods to approach data
Address and Refine Results- reasons why results happened
Communicate insights
Track outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

8 methods of approaching data

A

Classification
Regression
Clustering
Similarity matching
Profiling- characterizing typical behavior of a group
Link prediction-predicting a relationship btwn 2 items
Data reduction- focusing only on critical items
Co-occurrence grouping- “customers also bought…”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

4 V’s of Big Data

A
Volume = Amount of data + Population 
Velocity = changes
Variety = different types of data, structured or unstructured
Veracity = reliability of accuracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

5 steps of ETL process (mastering the data)

A

Determine the purpose and scope of data request
Obtain the data- data request form or obtain yourself
Validate the data for integrity and completeness
Clean the data- find out why data is missing
Load the data in preparation for data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

4 steps of validating data

A

Compare number of extracted records to number of database records
Calculate and compare min, max, avg, med
Convert and validate Date/Time fields
Compare string limits for text fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ways to deal with missing data

A

Drop data (if too many aren’t dropped)
Average the missing data
Impute- assume someone who is in a cluster has the same data as others in the cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

3 types of missing data

A

MCAR (completely at random)
MAR- pattern within the missing data but doesn’t impact analysis
MNAR- (not at random) pattern that can directly impact analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

flat file

A

a means of storing data in one place (Excel spreadsheet) as opposed to multiple tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

4 ways to clean data

A

Remove headings and subtotals
Clean leading zeroes
Format negative numbers (parentheses)
Correct inconsistencies across data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Similarity matching

A

Identifying similar individuals based on data known about them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Clustering

A

Dividing individuals into groups in a useful or meaningful way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Co-occurrence grouping

A

Discovering associations between individuals based on transactions involving them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Profiling

A

Characterizing typical behavior of an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Link prediction

A

Predicting a relationship between two data items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Unsupervised approach

A

Exploring the data for potential patterns of interest, no specific target
Profiling, co-occurrence, data validation, fuzzy cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Supervised approach

A

Using historical data to predict a future outcome

Classification, similarity matching, regression, QCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Target

A

An expected attribute or value that we want to evaluate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Class

A

A manually assigned category applied to a record based on an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Fuzzy cluster

A

Like a venn diagram, there are some people in between clusters or could belong to either cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Configurational vs Regression

A

Configurational- idea that multiple different paths lead to job desirability
Regression- There is only one path to success

23
Q

2 methods to data reduction

A

Data- focus on which records to focus on and which to drop

Variable- focus on which fields to focus on and which to drop

24
Q

(Protiviti) 4 types of transformations

A

Customer Engagement
Digitizing Products and services
Better informed decisions
Performance measurement

25
(Protiviti) Digital maturity scale
1. Digital skeptic 2. Digital Beginners (most companies) 3. Digital Followers (most companies) 4. Digital Expert 5. Digital Leader
26
(Protiviti) Process mining
Takes individual process steps, pulls them together, and organizes them visually. Started in 2008
27
(Protiviti) Flappy path
Most common path for a transaction
28
(Protiviti) #1 risk
Existing operations meeting performance expectations, competing against "born digital" risk
29
(EY) Data drives action steps
``` Opinion Data Metrics Analysis Insight Action ```
30
(Cloud computing) Infrastructure as a Service
Cloud based service, businesses can borrow data resources instead of creating them and put them on the cloud • Pros: flexible, cost efficient, automatic backups • Cons: low security, less internal resources
31
(Cloud computing) Platform as a Service
3rd party provides hardware and software • Azure, Amazon web-services • Pros: IaaS and scalable, easy deployment Cons: can't change languages, data migration issues, can't change vendors
32
(Cloud computing) Software as a Service
Cloud based applications, EY's client Portal, Oracle NetSuite, QuickBooks Enterprise • Pros: security, no installations, automatic backups, no commitments Cons: vendor disappearance, only accessible through internet, security breach
33
Dark Data/Dark analytics
Unused collected data that is usually unstructured, difficult and costly to analyze • Examples: audio, video, facial expressions • 90% of company data is dark data
34
Dark data in accounting
Collects data from transactions, account details, A/P Retained for compliance reasons, but can be risky for security risks
35
Purpose of a non key attribute
To provide business information
36
Benefits of a normalized relational database
Completeness No redundancy Business rules are enforced
37
A data dictionary helps database administrators _____
Maintain databases
38
Data visualization
Means of communicating data, primarily through imagery, that is both readable and recognizable Blends the art of design with the science of data
39
Reasons to relate visualization to a story
Memorable: Stories will make it easier for the audience to connect and remember the information you are trying to convey. Relatable: Stories lead to emotional coupling. Both the storyteller and the audience go through and relate to the same experience. Lead to action: Research shows that storytelling can engage parts of the brain that lead to action.
40
2 types of data visualization
Exploratory- for problems that have not been defined and allows people to explore and understand data. Reader driven Explanatory- shows specific relationships in data and shows audience what they need to know. Author driven
41
Gestalt psychology
Describes how our mind organizes individual visual elements into groups to make sense of an entire visual. Used to highlight important patterns
42
Gestalt principles
``` Proximity Similarity Enclosure Connection Symmetry Continuity/closure Figure and ground ```
43
Tufte philosophy
“Excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency.”
44
Tufte principles
Data-to-ink ratio- remove background, borders, redundant labels, colors
45
Preattentive attributes: Emphasis
Form Position Motion Color
46
Preattentive attributes: Quantity
``` Position (most accurate) Length Slope Angle Area Volume Color (least accurate) ```
47
Narrative framework for data stories
Visual design -showing your story Messaging- telling your story Interactivity- How you engage your audience
48
Custom development
No ERP packages, had to custom develop them. Advantage: we get exactly what we ask for. Disadvantage: some criteria may be missed, takes a long time.
49
Tightly Coupled
Software that works together bundled into a package. 1990-2000 Advantage: Designed to work together Disadvantage: Stuff we don't need, more costly and more risky
50
Loosely Coupled
Software that works together, but we can customize bundle. 2000- present Advantage: get what we want, designed to work together
51
Best of Breed
We can choose the best software for each module. Disadvantage: not designed to work together.
52
3 types of software packages
Tier 1: Oracle/SAP. Native integration- CRM, SCM, and F&A are designed to work together. Disadvantage is it is a very long process Tier 2: Workday, JDA, Lawson. All SaaS; you are leasing the software instead of buying it. Tier 3: Quicken/QuickBook, PeachTree
53
3 types of software deviation approaches
Configuration = 100% of companies do this. Date, fiscal year. Customization lite = Reports, Screens. 95% of companies do this. Customization heavy = Changing the underlying code. 20% of companies like to do this. Very high risk. Greg will not work with companies who do it.
54
Service providers
BlueChip: PwC, KPMG, EY, Deloitte, IBM. Work with Tier 1 customers MidTier: HD, SAIC Focused: Oracle Consulting, SAP Consulting Foreign: W Pro, TCS. W Pro -US and TCS - US have been developed which are based in the U.S.