Data analytics lifecycle Flashcards

1
Q

What are the main reasons to use frameworks?

A

efficient use of time
nothing gets forgotten
scale projects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why use frameworks in data science?

A

acts as a guide
ensure focus is on ds not bi
needs a collaborative approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the 2 key project roles that get a sponsor presentation?

A

Business user

project sponsor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the 2 key project roles that get the code and technical documents?

A

data engineer

data scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the 2 key project roles that get an analyst presentation?

A

BI analyst

Database administrator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the 6 key project roles?

A
business user
project sponsor
project manager
bi analyst
data engineer
database administrator
ds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the data lifecycle? (6 phases)

A
discovery
data prep
model planning
model building 
communicate results
operationalise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In discovery what are the seven main areas?

A
learn business domain
learn from the past
resources 
frame the problem
interviewing 
formulate initial hypothesis
identify data sources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
In discovery learn the domain - what do you not need to do?
A)determine amount of domain knowledge 
B) determine general analytic problem
C) decide what technique to use
D)if you have no idea. Conduct research.
A

C) decide what technique to use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In discovery learn from the past what do you need to do?

A

have there been any previous attempts

why did they fail?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

who is a business user?

A

someone who benefits from end results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

who is the project sponsor?

A

person responsible for genesis of the project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

who is a project manager?

A

ensure key milestones are met

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

who is the BI analyst?

A

business domain expert

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

who is the data engineer?

A

deep technical skills

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

who is the DBA?

A

provisions and configures database

17
Q

who is the DS?

A

SME for techniques for overall analytic objectives being met

18
Q

what is crisp DM?

A

cross-industry process for data mining

19
Q

what are the 6 phases of CRISP-DM?

A
business understanding
data understanding
data prep
modeling
evaluation
deployment
20
Q

In discovery resources what do you need to access

A

available tech
data
people
time

21
Q

In discovery frame the problem what are the objectives

A

What is the goal
What is the failure criterion
Identify the success criteria

22
Q

In discovery formulate initial hypotheses what do you need to do? (2)?

A

gather and assess hypothesis

data exploration to inform discussions

23
Q

In discovery identify data sources what do you need to do? (4)

A

aggregate sources
review the raw data
determine the structures and tools
scope the kind of data needed

24
Q

How big is an analytical sandbox?

25
In data prep what are the phases?(5)
``` prepare sandbox perform ELT familiarise with the data data conditioning survey and visualise ```
26
in model planning what are the phases? (6)
``` determine methods techniques and workflow data exploration variable selection model selection test & train ```
27
how much time is spent in data prep? a) 50% b) 60% c) 70%
70%
28
what should you do in communicate results?
make recommendations compare results identify key findings
29
what should you do in operationalise?
run a pilot assess benefits implement model
30
why run a pilot?
make sure the model is robust
31
what type of tools can be used for phase 2?
SQL Hadoop MapReduce
32
what type of tools can be used for phase 4?
R SQL SAS