Chapter 7&8 Knowledge Testers Flashcards

1
Q

What is the difference between syntax and data models?

A

Syntax - physical
Data Models - logical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can trees be used to model denormalized data?

A

In JSON and XML, they can be represented logically as trees, allowing for nesting that tables cannot accommodate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What syntaxes relate to data models?

A

CSV to tables, XML/JSON to trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do trees modeling XML have labels on the nodes, while trees modeling JSON have labels on edges?

A

JSON info items do not know with which key they are associated, while XML elements know their names.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name a few data models for XML.

A

(Infoset, PSVI, JDM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can you sketch a tree representing an XML or JSON document?

A

For XML, it includes a document node, elements, attributes, text; for JSON, keys are on edges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between an atomic type and a structured type?

A

Atomic is string, number, boolean; structured is a collection of elements like a row in a table or a JSON object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the lexical space of an atomic type?

A

The representation of the number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the value space of an atomic type?

A

The value or meaning of the atomic type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Give examples of type cardinalities and their associated symbols.

A
  • exactly once: implicit optional
  • 0 or 1: ?
  • any amount: 0+
  • one or more: +
  • 1+: +
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between well-formedness and validity?

A

Well-formed means it can be compiled; validity means it can be validated against a schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Name further data modeling technologies and formats for tree-like data.

A

Parquet, Avro, protocol buffers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Can JSON data be represented as a DataFrame?

A

Yes, if the schema has no open object types and no heterogeneity in field types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Are DataFrames more efficient than JSON?

A

True, time-wise and space-wise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the map and shuffle patterns in large-scale data processing.

A
  • Map: splitting pokemons among people to count each type
  • Shuffle: cutting up the dictionary and assigning it to designated people
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the physical architecture of MapReduce.

A

Centralized architecture with JobTracker and TaskTrackers.

17
Q

What is the difference between a map function and a map task?

A
  • Function: performs mapping
  • Task: an assignment for every input split
18
Q

What is a map slot in MapReduce?

A

Resources to compute the mapping, one slot is one CPU core with allocated memory.

19
Q

What is a reduce function in MapReduce?

A

A function that takes key-value pairs and returns intermediate key-value pairs.

20
Q

What is a combine function in MapReduce?

A

A function that takes one or more intermediate key-value pairs and returns 0, 1, or more intermediate key-value pairs.

21
Q

How does combining improve MapReduce’s performance?

A

It stores intermediate values, making reduce quicker.

22
Q

What assumptions are behind reusing the reduce function as a combine function?

A

The reduce function is commutative and associative.

23
Q

How can a combine function be designed to speed up a MapReduce job?

A

By computing an average and keeping track of weights in the output.

24
Q

Why does MapReduce perform well on a distributed file system?

A

It brings the query to the data, reducing the need to transfer data.

25
What bottleneck suggests the use of MapReduce?
If the bottleneck is the speed of reading and writing data from the disk.
26
How do MapReduce splits differ from HDFS blocks?
MapReduce uses shards that are larger than HDFS blocks, allowing for parallelism.
27
What does the Java API of MapReduce look like at a high level?
User defines a mapper class and a reducer class for the functions.