Representing, Processing, and Preparing Data Flashcards Preview

DP-100 - PS > Representing, Processing, and Preparing Data > Flashcards

Flashcards in Representing, Processing, and Preparing Data Deck (10)
Loading flashcards...
1

You are looking for fast prototyping and do not want to use code. What tool is a good choice for you to explore and work with data?

- AutoML
- Python and Pandas
- Spark
- Excel Spreadsheet

- Excel Spreadsheet

2

What is a weakness of mean substitution as an imputation technique for missing data?

It reduces the strength of correlations that exist in the data.
It increases the strength of correlations that exist in the data.
It reduces bias in the data.
It increases bias in the data.

It reduces the strength of correlations that exist in the data.

3

What is standardization applied to?

- Rows in a data set
- Individual features
- A feature vector
- A three-dimensional matrix

- Individual features

4

Which scaler subtracts the median from each data point?

- RobustScaler
- Max-abs scaler
- Min-max scaler
- StandardScaler

Robust scaler

5

Which of the following measures of dispersion is most robust (least vulnerable) to outliers?

Range
Inter-quartile range (IQR)
Median
Variance

Inter-quartile range (IQR)

6

Which operation is helpful in simplifying the calculation of cosine similarity?

Standardization
Box-Cox transformation
Power transformation
Normalization

Normalization

7

Two vectors are oriented at 90 degrees to each other. What is their cosine similarity?

1
-1
90
0

0

8

What is the practice of combining many disparate servers, each of limited capacity and running generic hardware called?

Vertical scaling

Horizontal scaling

Data warehousing

Online analytical processing (OLAP)

Horizontal scaling

9

What are the two sets of statistical tools that a data analyst can use?

Descriptive statistics and inferential statistics

Alternating statistics and data statistics

Inferential statistics and data statistics

Alternating statistics and descriptive statistics

Descriptive statistics and inferential statistics

10

Which of the following is not a valid imputation technique to deal with missing data?

Fill in the mean of the data set.

Fill in values from within the range.

Interpolate values using a model.

Last observation carried forward.

Fill in values from within the range.