Week 4 - Parallel Data Architecture Flashcards by Adam Cadiedux

Two types of Parallel database system

1) Pipeline Parallelism

2) Partition Parallelism

How well did you know this?

Not at all

Perfectly

What is Pipeline Parallelism

Many machines each doing on set in a milt-step process

How well did you know this?

Not at all

Perfectly

What is Partition Parallelism

Many machines doing the same thing to different pieces of data

How well did you know this?

Not at all

Perfectly

What is Speed up?

More resources means proportionally less time for a given amount of data 45 degree angle

How well did you know this?

Not at all

Perfectly

What is scale-up?

If resources increased in proportion to increased data size,time, is constant (no diminishing returns )

How well did you know this?

Not at all

Perfectly

When is scale up used in parallel databases?

1) To implement parallelism in databases for faster processing.
2) To have the same performance levels when workloads increase.
3) To break the processing in a sequential manner.

2) To have the same performance levels when workloads increase.

How well did you know this?

Not at all

Perfectly

Shared Memory (SMP) means

multiple CPUs that can run things in parallel but they share the same memory space.

How well did you know this?

Not at all

Perfectly

Shared Disk

In the shared disk architecture, you have multiple CPUs and

each one has its own memory space.

How well did you know this?

Not at all

Perfectly

Shared Nothing

For the shared nothing architecture, multiple CPUs have their own memory space, not only that, they also have their own secondary storage

How well did you know this?

Not at all

Perfectly

How do machines communicate using the share nothing

only way the machines communicate with each other is through the network

How well did you know this?

Not at all

Perfectly

Advantage of Shared Memory

Easy to program

How well did you know this?

Not at all

Perfectly

2 Disadvantage of Shared Memory

1) expensive to build

2) Difficult to scale

How well did you know this?

Not at all

Perfectly

2 Advantage of sShared Nothing

1) cheaper to build

2) easier to scale up

How well did you know this?

Not at all

Perfectly

Disadvantage of Shared Nothing

Harder to program

How well did you know this?

Not at all

Perfectly

Intra-operator Parallelism

Get all machines working to computer a give operation

scan,sort,join

How well did you know this?

Not at all

Perfectly

Inter-operator Parallelism

Study These Flashcards

each operator may run concurrently on a different site

exploits pipelining

Inter-query Parallelism

Study These Flashcards

different queries run on different sites

3 Types of data partitioning

Study These Flashcards

1) Range
2) Hash
3) Round Robin

Range Partitioning means

Study These Flashcards

Partitioning data on a machine and doing the processing on that machine (Partitioning based on logical sort of data) Like by Age

Hash Partitioning means

Study These Flashcards

range partitioning runs a hash function,
and the hash function will decide which tuple,
or Retiro in the table will be assigned to which partition.

Round Robin Partitioning means

Study These Flashcards

For each row in the table, you assign it to the first partition.
The second row you assign it to the second partition. And so on, and so forth.

3 Items Parallel Sorting

Study These Flashcards

1) scan in parallel and range-partition as you go (sort attribute)
2) As tuples come in, begin “local” sorting on each
3) Resulting data is stored and range-partitioned

Parallel Sorting Problem

Study These Flashcards

skew!

Some partitions will have more data than others, unbalanced load

Parallel Sorting Solution:

Study These Flashcards

sample the data at start to determine partition points (find data distribution so data can be sorted evenly in partitions)

2 types of Parallel join

1) Nested loop | 2) Sort Merge (plain merge join)

Nested loop 2 items

1) Each outer tuple must be compared with each inner tuple that might join 2) Easy for range Partitioning on join cols, hard otherwise

Sort Merge (plain merge join) 2 items

1) Sorting give range partitioning | 2) Merging partitioned tables is local

Complex Queries:Inter-Operator parallelism 2 items

1) Pipeline between operators | 2) Bushy Trees

What is the high-level query processing language used by database management systems? 1) SQL 2) HTML 3) XML 4) PL

1) SQL

Which of the following cannot be a goal in a query processing? 1) Maximizing solution space 2) Minimizing processing time 3) Maximizing throughput 4) Minimizing transfers among distributed sites

1) Maximizing solution space

Which of the following search algorithms takes the longest processing time? 1) Exhaustive search 2) Heuristic algorithm 3) Simulated annealing 4) Genetic algorithm

1) Exhaustive search

What is the correct order of tasks in a typical distributed query processing? 1) Decomposition, Localization, Optimization 2) Decomposition, Optimization, Localization 3) Localization, Decomposition, Optimization 4) Optimization, Decomposition, Localization

1) Decomposition, Localization, Optimization

What is the correct order of tasks in the decomposition step of the distributed query processing? 1) Normalization, Eliminating Redundancy, Algebraic Rewriting 2) Normalization, Algebraic Rewriting, Eliminating Redundancy 3) Eliminating Redundancy, Normalization, Algebraic Rewriting 4) Eliminating Redundancy, Algebraic Rewriting, Normalization

1) Normalization, Eliminating Redundancy, Algebraic Rewriting

Week 4 - Parallel Data Architecture Flashcards

(33 cards)