Combining and Shaping Data Flashcards

1
Q

Some of the features of transactional processing and analytical processing are given below. Which is true?

Due to the sheer volume, variety, and the velocity at which big companies generate data, big data processing requires the setting up of a distributed cluster that has multiple machines.

Transactional processing involves analyzing large batches of data whereas analytical processing involves analyzing individual entries in a data set.

Transactional processing is performed in a traditional elational database management system (RDBMS) whereas analytical processing is performed in a data warehouse.

Although the objectives of transactional processing and analytical processing are completely different, both of these objectives can be achieved by the same database system even with huge volumes of data.

A

Transactional processing is performed in a traditional elational database management system (RDBMS) whereas analytical processing is performed in a data warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the practice of combining many disparate servers, each of limited capacity and running generic hardware called?

Vertical scaling

Horizontal scaling

Online analytical processing (OLAP)

Data warehousing

A

Horizontal scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You are operating on a stream of data with timestamps using a stream processing system and you want to divide your input data into fixed window sizes based on time intervals that overlap. Which window will you use?

Tumbling window

Sliding window

Count window

Global window

A

Sliding window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Window operations can only be performed on what kinds of data?

Any kind of data

Data associated with timestamps

Streaming data

Batch data

A

Data associated with timestamps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following best describes the operation of a left outer join?

A: Each record in the right table will be present in the result, either with a matched record from the left table, or padded with null.

B. Each record in both the left and right tables will be present in the result, either with a matched record from the other table, or padded with null.

C. Each record in the left table will be present in the result, either with a matched record from the right table, or padded with null.

D. Each record in the left table will be present in the result, matched once with each record in the table on the right.

A

left outer join

C. Each record in the left table will be present in the result, either with a matched record from the right table, or padded with null.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What best describes the operation of an inner join?

Each record in the left table will be present in the result, either with a matched record from the right table, or padded with null.

Each record in both the left and right tables will be present in the result, either with a matched record from the other table, or padded with null.

A

Each record in the tables that matches (joins) a record in the other table will be present in the result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The cross join of a table containing N rows with itself will contain how many rows?

N

NxN

2N

1

A

NxN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When would you choose to represent data in wide format?

When the schema for your data is not defined up front and may change

When the schema, once defined, does not change

When you have dense data with a strict predefined schema

When your data is very small

A

When the schema for your data is not defined up front and may change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following are valid techniques you might use to cope with the presence of outliers in the data set?

Set to mean and cap or floor outliers

Cap or floor outliers alone

Delete outlier values and cap or floor outliers

Delete outlier values, set to mean, and cap or floor outliers

A

Delete outlier values, set to mean, and cap or floor outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following will yield a more balanced, albeit more biased data set?

Oversampling of the least common label

Overfitting

Oversampling of the most common label

Undersampling of the least common label

A

Oversampling of the least common label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly