M3 U1 - Data Science Lifecycle - Q1 Flashcards

1
Q

Describe the major parts of the data science lifecycle

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Business understanding. Who’s involved?

A

This must include:

  • defining business and analytical objectives
  • identifying data sources.

Members involved: The client and data science team are involved in this step to ensure that the analytic solutions meets the business objectives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Data Acquisition

A

This process involves obtaining data from various sources and may also require setting up a data collection task and infrastructure. Data preparation techniques are employed to ensure the data is useful for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Data Preparation.

A

This is the process of cleaning and transforming raw data prior to processing and analysis. This needs to be done carefully as assumptions made here may influence, or even limit, the use of the data during analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Data Exploration and Cleaning. (4)

A

Includes:

  • Identifying variables
  • Conducting uni-variate and multi-variate analysis
  • Identifying outliers, anomalies and missing values
  • Feature creation and selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s the purpose of Feature Engineering

A

It’s needed to prepare proper datasets that are compatible with the suitable algorithms, and to improve the performance of models by leveraging domain knowledge to capture the signal of interest in the features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define generalize

A

The ability to match the training performance on unseen test data is referred to as the models ability to generalize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

At what stage in the DS Lifecycle do you identify the business objectives of a data science project?

A

Business Understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The process of using transforming raw data into informative properties that represent the business problem you are trying to solve is called:

A

Feature Engineering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the roles on a typical data science team? (7)

A
  • Data Scientist.
  • Data Engineer.
  • Solutions Architect.
  • Machine Learning (ML) Engineer.
  • Data/Business Analyst.
  • Software Engineer.
  • Domain Experts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data scientist

A

This role involves solving business tasks using machine learning model development and statistical techniques. This individual identifies trends and patterns within the data and makes predictions based on trends. The data scientist will write code to support the data analysis and model building process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data engineer

A

The Data Engineer specializes in data structures and algorithms, as well as in working with data through the operation of databases and other large repositories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Solutions architect

A

This is a customer facing role that ensures end-to-end customer deployment for company-related data services. The Solutions Architect interacts with clients to design, coordinate, and execute solution prototypes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ML Engineer

A
  • performs modeling and software engineering tasks
  • This individual spends a considerable amount of time programming and creating ML solutions but must also have strong statistical skills.
  • different from the data scientist in that she is further away from the domain-side of the project.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data/business analyst

A
  • Has data gathering, analysis, and visualization skills.
  • Compared to data scientists, they are typically firmly rooted in the business domain and less technically proficient in systems programming and advanced machine learning.
  • Like the data scientist, she provides insights from data to inform decision making.
  • Develops key performance indicators and utilizes business intelligence and analytics tools.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Software engineer

A

The Software Engineer handles the alignment between the business objectives and solution and is responsible for integrating the implemented data-driven system into the appropriate applications within the enterprise.

17
Q

Domain expert

A

Also known as subject matter experts, they are the actors who know the most about the problem on the business side. Their role is to define the framework for the data science project, and hence they are a key participant in the process. The domain expert will translate business needs and characteristics to the data scientists, and eventually assess the solution as successful or not from the perspective of whether it achieved the business objective.

18
Q

What’s involved in modeling?

A

This multi-step process involves feature engineering, algorithm selection, model training and evaluation.