Data Partitioning Flashcards

(8 cards)

1
Q

What is data partitioning?

A

Dividing a large database into smaller, independent, manageable parts such as partitions or shards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What problem does data partitioning solve?

A

Partitioning the data into independently manageable nodes allows large scale operations to be distributed across all of the available nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three major partitioning methods?

A

Horizontal partitioning, vertical partitioning, and hybrid partitioning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is horizontal partitioning? What’s an example of it or its potential downside?

A

Horizontal partitioning is the method of sharding the rows of a table and distributing those shards across multiple instances to support parallelism.

Sharding based on geographic location to improve performance (data locality), but this could potentially lead to imbalanced servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is vertical partitioning? What’s an example of it?

A

Vertical partitioning is the method of sharding a database based on its table columns.

Consider an e-commerce platform that shares user columns on one instance, order history on another, etc. to optimize more focused queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is hybrid partitioning? Whats an example of it?

A

Basically applying concepts of sharding both vertically and horizontally.

An example would be an e-commerce site that horizontally shards based on geographic region, but still vertically shards families of data to optimize queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some common partitioning approaches? What does each do?

A

Key-hash partitioning, list partitioning, round-robin, and composite.

Key-hash simply hashes a given defined key modulo the number of available shards.

List will define a list of values with each shard and route data/queries to them.

Round-robin ensures a uniform distribution.

Composite applies one of more other schemes such as performing a list followed by a hash.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What three common problems can data partitioning can cause? How can they be addressed?

A

Joins, referential integrity, and rebalancing.

For joins, the data is spread across the shards, so joins are slow. Denormalization can help with this by consolidating this data into a single table.

For referential integrity, key-based constraints are not supported at the shard-level and would need to be handled by the application. Sometimes scheduled processes can help “clean up” these types of issues.

For rebalancing, any changes to a partitioning scheme may require a rebalancing across partitions. This would generally cause downtime which can be mitigated by a directory-level partitioner (but that has other side effects).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly