Data Partitioning Flashcards

Question 1

Q

What is data partitioning?

Answer

A

Dividing a large database into smaller, independent, manageable parts such as partitions or shards

Question 2

Q

What problem does data partitioning solve?

Answer

A

Partitioning the data into independently manageable nodes allows large scale operations to be distributed across all of the available nodes.

Question 3

Q

What are the three major partitioning methods?

Answer

A

Horizontal partitioning, vertical partitioning, and hybrid partitioning.

Question 4

Q

What is horizontal partitioning? What’s an example of it or its potential downside?

Answer

A

Horizontal partitioning is the method of sharding the rows of a table and distributing those shards across multiple instances to support parallelism.

Sharding based on geographic location to improve performance (data locality), but this could potentially lead to imbalanced servers.

Question 5

Q

What is vertical partitioning? What’s an example of it?

Answer

A

Vertical partitioning is the method of sharding a database based on its table columns.

Consider an e-commerce platform that shares user columns on one instance, order history on another, etc. to optimize more focused queries.

Question 6

Q

What is hybrid partitioning? Whats an example of it?

Answer

A

Basically applying concepts of sharding both vertically and horizontally.

An example would be an e-commerce site that horizontally shards based on geographic region, but still vertically shards families of data to optimize queries.

Question 7

Q

What are some common partitioning approaches? What does each do?

Answer

A

Key-hash partitioning, list partitioning, round-robin, and composite.

Key-hash simply hashes a given defined key modulo the number of available shards.

List will define a list of values with each shard and route data/queries to them.

Round-robin ensures a uniform distribution.

Composite applies one of more other schemes such as performing a list followed by a hash.

Question 8

Q

What three common problems can data partitioning can cause? How can they be addressed?

Answer

A

Joins, referential integrity, and rebalancing.

For joins, the data is spread across the shards, so joins are slow. Denormalization can help with this by consolidating this data into a single table.

For referential integrity, key-based constraints are not supported at the shard-level and would need to be handled by the application. Sometimes scheduled processes can help “clean up” these types of issues.

For rebalancing, any changes to a partitioning scheme may require a rebalancing across partitions. This would generally cause downtime which can be mitigated by a directory-level partitioner (but that has other side effects).

Data Partitioning Flashcards

(8 cards)