Combining and Shaping Data Flashcards Preview

DP-100 - PS > Combining and Shaping Data > Flashcards

Flashcards in Combining and Shaping Data Deck (10)
Loading flashcards...
1

Some of the features of transactional processing and analytical processing are given below. Which is true?

Due to the sheer volume, variety, and the velocity at which big companies generate data, big data processing requires the setting up of a distributed cluster that has multiple machines.

Transactional processing involves analyzing large batches of data whereas analytical processing involves analyzing individual entries in a data set.

Transactional processing is performed in a traditional elational database management system (RDBMS) whereas analytical processing is performed in a data warehouse.

Although the objectives of transactional processing and analytical processing are completely different, both of these objectives can be achieved by the same database system even with huge volumes of data.

Transactional processing is performed in a traditional elational database management system (RDBMS) whereas analytical processing is performed in a data warehouse.

2

What is the practice of combining many disparate servers, each of limited capacity and running generic hardware called?

Vertical scaling

Horizontal scaling

Online analytical processing (OLAP)

Data warehousing

Horizontal scaling

3

You are operating on a stream of data with timestamps using a stream processing system and you want to divide your input data into fixed window sizes based on time intervals that overlap. Which window will you use?

Tumbling window

Sliding window

Count window

Global window

Sliding window

4

Window operations can only be performed on what kinds of data?

Any kind of data

Data associated with timestamps

Streaming data

Batch data

Data associated with timestamps

5

Which of the following best describes the operation of a left outer join?

A: Each record in the right table will be present in the result, either with a matched record from the left table, or padded with null.

B. Each record in both the left and right tables will be present in the result, either with a matched record from the other table, or padded with null.

C. Each record in the left table will be present in the result, either with a matched record from the right table, or padded with null.

D. Each record in the left table will be present in the result, matched once with each record in the table on the right.

left outer join

C. Each record in the left table will be present in the result, either with a matched record from the right table, or padded with null.

6

What best describes the operation of an inner join?

Each record in the left table will be present in the result, either with a matched record from the right table, or padded with null.

Each record in both the left and right tables will be present in the result, either with a matched record from the other table, or padded with null.

Each record in the tables that matches (joins) a record in the other table will be present in the result.

7

The cross join of a table containing N rows with itself will contain how many rows?

N

NxN

2N

1

NxN

8

When would you choose to represent data in wide format?

When the schema for your data is not defined up front and may change

When the schema, once defined, does not change

When you have dense data with a strict predefined schema

When your data is very small

When the schema for your data is not defined up front and may change

9

Which of the following are valid techniques you might use to cope with the presence of outliers in the data set?

Set to mean and cap or floor outliers

Cap or floor outliers alone

Delete outlier values and cap or floor outliers

Delete outlier values, set to mean, and cap or floor outliers

Delete outlier values, set to mean, and cap or floor outliers

10

Which of the following will yield a more balanced, albeit more biased data set?

Oversampling of the least common label

Overfitting

Oversampling of the most common label

Undersampling of the least common label

Oversampling of the least common label