Communicating Expectations to the Business Flashcards

1
Q

You are a data scientist looking at a set of historical data for possible use for model training. What quality would you be looking for in the data?

That it holds a minimum of 20,000 rows of data

That it contains data representative of the target population

That it contains personally identifiable information

That it is properly formatted as an OLAP cube

A

That it contains data representative of the target population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

You need data input on a project. When is the best time to bring in an SME?

At the end of the project

Only when problems arise

When verification is needed

At the beginning of the project

A

At the beginning of the project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What kind of barrier does personally identifiable data present to data use?

Formatting

Speed of query

Legal

Licensing

A

Legal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You are working on a project that will generate insights on potential popularity of compact cars. Your team has licensed a data set of automotive sales for training a model. Upon further inspection of the data, you find that it is made up of diesel truck sales. What is your next step?

Run the data set through a synthetic data generation utility

Transform the existing data set with discovery-transitioning-utils

Alter the model the solution will be using

Source a different set of data closer to the target population

A

Source a different set of data closer to the target population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What should you do when facing a potential ethical barrier to using data in a solution?

Use EDA tools to identify alternatives

Consult a legal specialist

Locate another algorithm for the model

Find another data source

A

Consult a legal specialist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What kind of tool allows you to anonymize data but maintain character and richness?

Data mining tools

Machine learning models

Synthetic data utilities

Exploratory data analysis tools

A

Synthetic data utilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Your team has reached the end of their exploratory data analysis and come to the conclusion that the data set will require significant cleaning and feature engineering. The funding for initial data analysis has been exhausted. What should you do?

Use a data set that is adjacent to the project’s domain

Ask the stakeholders for a go/no-go decision and budget for the next phase

Search for public but free data in the domain of the project

Locate a new data source from a data broker and restart exploratory data analysis

A

Ask the stakeholders for a go/no-go decision and budget for the next phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the minimum number of data sets typically needed for a data science/machine learning solution?

One for training, one for analysis

Two for training, two for analysis

Data set for analysis

Two for training, one for analysis

A

One for training, one for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does correlation point to in a data science/machine learning solution?

The cross validation of a data point from another data set

The causation of one data point by a preceding data point

The probability of causation of a data point by another data point

The relationship between two data points

A

The relationship between two data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What method do exploratory data analysis tools primarily use?

XML output

Visualization

Encoded data

CSV

A

Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly