Communicating Expectations to the Business Flashcards Preview

DP-100 - PS > Communicating Expectations to the Business > Flashcards

Flashcards in Communicating Expectations to the Business Deck (10)
Loading flashcards...
1

You are a data scientist looking at a set of historical data for possible use for model training. What quality would you be looking for in the data?

That it holds a minimum of 20,000 rows of data

That it contains data representative of the target population

That it contains personally identifiable information

That it is properly formatted as an OLAP cube

That it contains data representative of the target population

2

You need data input on a project. When is the best time to bring in an SME?

At the end of the project

Only when problems arise

When verification is needed

At the beginning of the project

At the beginning of the project

3

What kind of barrier does personally identifiable data present to data use?

Formatting

Speed of query

Legal

Licensing

Legal

4

You are working on a project that will generate insights on potential popularity of compact cars. Your team has licensed a data set of automotive sales for training a model. Upon further inspection of the data, you find that it is made up of diesel truck sales. What is your next step?

Run the data set through a synthetic data generation utility

Transform the existing data set with discovery-transitioning-utils

Alter the model the solution will be using

Source a different set of data closer to the target population

Source a different set of data closer to the target population

5

What should you do when facing a potential ethical barrier to using data in a solution?

Use EDA tools to identify alternatives

Consult a legal specialist

Locate another algorithm for the model

Find another data source

Consult a legal specialist

6

What kind of tool allows you to anonymize data but maintain character and richness?

Data mining tools

Machine learning models

Synthetic data utilities

Exploratory data analysis tools

Synthetic data utilities

7

Your team has reached the end of their exploratory data analysis and come to the conclusion that the data set will require significant cleaning and feature engineering. The funding for initial data analysis has been exhausted. What should you do?

Use a data set that is adjacent to the project's domain

Ask the stakeholders for a go/no-go decision and budget for the next phase

Search for public but free data in the domain of the project

Locate a new data source from a data broker and restart exploratory data analysis

Ask the stakeholders for a go/no-go decision and budget for the next phase

8

What is the minimum number of data sets typically needed for a data science/machine learning solution?

One for training, one for analysis

Two for training, two for analysis

Data set for analysis

Two for training, one for analysis

One for training, one for analysis

9

What does correlation point to in a data science/machine learning solution?

The cross validation of a data point from another data set

The causation of one data point by a preceding data point

The probability of causation of a data point by another data point

The relationship between two data points

The relationship between two data points

10

What method do exploratory data analysis tools primarily use?

XML output

Visualization

Encoded data

CSV

Visualization