Fowler Video Flashcards

(136 cards)

1
Q

What primary pain sparked NoSQL?

A

Need for horizontal scale beyond single-node RDBMS limits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In what year did the #nosql hashtag appear?

A

2009

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

List five common traits of NoSQL databases according to Fowler.

A
  • Non-relational
  • Cluster-friendly
  • Open-source
  • Web-era culture
  • Schemaless
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the definition of an aggregate?

A

Collection of related data read/written as a single unit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the transaction boundary in aggregate stores?

A

The aggregate itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does ACID stand for?

A
  • Atomicity
  • Consistency
  • Isolation
  • Durability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does BASE stand for?

A
  • Basically-Available
  • Soft-state
  • Eventual consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What trade-off does the CAP theorem present during a partition?

A

Must give up either Consistency or Availability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which CAP corner do most key-value stores pick?

A

AP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the key–value use-case sweet-spot?

A

Ultra-fast look-ups by ID and caching.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give an example of a document store product.

A

MongoDB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Give an example of a wide column-family product.

A

Cassandra.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give an example of a graph database product.

A

Neo4j.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What makes graph DB traversals fast?

A

Index-free adjacency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does impedance mismatch refer to?

A

Object models versus relational tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the definition of sharding?

A

Partitioning data across nodes by key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is replication?

A

Maintaining copies of data on multiple nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a benefit of leader–follower replication?

A

Fewer update conflicts; predictable writes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a benefit of peer-to-peer replication?

A

No single write bottleneck.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does eventual consistency mean?

A

Replicas converge given enough time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is read-your-writes consistency?

A

A client can immediately read its own write.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the quorum read formula in Cassandra?

A

R + W > N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does the map phase in Map-Reduce do?

A

Emits key-value pairs from each record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does the reduce phase do?

A

Aggregates values per key into summary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the purpose of a materialized view?
Store pre-computed query results to speed reads.
26
What is the use of a version stamp?
Detect concurrent updates.
27
What is the advantage of a vector clock?
Shows causality across multiple nodes.
28
Why is 'schemaless' considered a myth?
Code assumes implicit schema when reading.
29
Why is schema migration in NoSQL still needed?
Code evolves; old documents must align.
30
What do aggregate-oriented DBs struggle with?
Cross-aggregate relationships.
31
When do graph DBs excel?
Queries hop many relationships.
32
What is polyglot persistence?
Using multiple data models in one system.
33
When should one stick with SQL according to Fowler?
Need for multi-row ACID & rich ad-hoc joins.
34
What are the two main drivers for NoSQL adoption?
* Programmer productivity * Performance/scale
35
Give an example of an AP store with tunable consistency.
Cassandra.
36
Give an example of a CP store.
HBase.
37
What is the term for merged duplicate data in aggregates?
Denormalization.
38
What is the risk of denormalization?
Update-side data duplication.
39
In what context is the 'unnatural act' quote used?
Distributed relational JOINs.
40
Why is vertical scaling considered costly?
Bigger hardware rises exponentially in price.
41
Who were the earliest mainstream web pioneers of NoSQL?
* Amazon (Dynamo) * Google (Bigtable)
42
What does it mean that key–value data is 'opaque'?
Store doesn’t understand value content.
43
What does it mean that document data is 'transparent'?
Store can index fields inside.
44
What is a 'row key' in a column-family store?
Primary identifier for grouped columns.
45
What is the graph term for table row equivalent?
Node.
46
What are the two distribution styles named by Fowler?
* Sharding * Replication
47
What does soft-state refer to?
Data may change/expire without explicit update.
48
What typical business domain is okay with eventual consistency?
Social media 'likes' count.
49
What domain is unlikely to accept eventual consistency?
Banking ledger balances.
50
From which DDD concept does aggregate design come?
Aggregate root.
51
What does ORM stand for?
Object-Relational Mapping.
52
Why are ORMs less useful in NoSQL?
Schemas are flexible; queries are API-specific.
53
What is the benefit of a hash-based shard key?
Uniform distribution of writes.
54
What is the hot-spot risk with an ordered shard key?
Sequential keys pile writes on one node.
55
What is the synergy between CQRS and NoSQL?
Separate write log and read-optimized view stores.
56
What does 'cluster-friendly' mean?
Designed to run on many cheap nodes.
57
How long was the impedance mismatch problem tolerated before web scale?
~20 years.
58
What does Fowler call a relational DB integration approach?
Integration database antipattern.
59
Why does polyglot persistence increase ops complexity?
More tech stacks to monitor & patch.
60
What is the BASE vs ACID latency trade-off?
BASE favors low latency over strict consistency.
61
What are some write conflict resolution strategies?
* Last-write-wins * Merge-on-read * App logic
62
What is a consistency vs durability trade-off example?
Async fsync for faster writes but risk loss.
63
Why doesn’t eventual consistency mean inconsistent?
Guarantees converge; window is bounded.
64
What is the advantage of Map-Reduce incremental update?
Avoids full recompute of materialized view.
65
What is the key–value TTL feature used for?
Automatic cache expiry.
66
What is a caveat of document DB secondary index?
Each extra index slows writes.
67
Column-family stores are inspired by which Google paper?
Bigtable.
68
What open-source project was inspired by Dynamo?
Cassandra.
69
Give an example of a graph query language.
Cypher (Neo4j).
70
What does the 'P' in CAP mean?
Loss of communication between nodes.
71
What is an advantage of 'schema-on-read'?
Cheaper writes; flexible ingest.
72
What is an advantage of 'schema-on-write'?
Guaranteed integrity at insert time.
73
What type of conflict does peer-to-peer replication encounter?
Divergent updates.
74
What are the risks of leader-follower replication?
* Leader hotspot * Single-write failure point
75
What is the purpose of a two-phase commit?
Coordinate multi-node ACID transaction.
76
What is a drawback of 2PC?
Locks resources; risk of blocking on failure.
77
Why does Fowler dislike distributed SQL JOINs?
Require shipping large result sets across network.
78
What is the aggregate update frequency heuristic?
90% of operations should hit one aggregate.
79
When are multi-aggregate transactions acceptable in NoSQL?
Rare, high-value workflows.
80
What is a 'wide row' in column-family good for?
Time-series sensor readings.
81
What is the benefit of graph DB property 'index-free adjacency'?
O(1) edge traversals.
82
What is the downside of key–value 'opaque' data?
Server-side filtering impossible without loading value.
83
What format do document DBs often use internally?
BSON.
84
Why does being schemaless accelerate development?
Skip DDL cycles for each change.
85
Why can being schemaless hurt analytics?
Harder to guarantee column presence/types.
86
What must a Map-Reduce reducer be?
Associative & commutative for parallelism.
87
What does BASE 'soft-state' mean?
State may change between read & write due to eventual propagation.
88
What is Fowler’s recommended first step toward NoSQL in a legacy app?
Add a key–value cache layer.
89
What is a polyglot persistence anti-pattern?
Choosing new DB type for novelty, not need.
90
What was the NoSQL ecosystem maturity status in 2012?
Rapidly evolving, tooling catching up.
91
What is an example of an aggregate-ignorant DB?
Relational.
92
What does Fowler call the future beyond NoSQL?
Polyglot persistence landscape.
93
How does sharding improve write throughput?
Parallelizing writes across nodes.
94
How does replication improve read throughput?
Serving reads from multiple copies.
95
What is the difference between logical consistency and replication consistency?
Correctness of data vs sync across replicas.
96
What does a vector clock size grow with?
Number of replicas.
97
What is the incremental update approach for materialized views?
Only process changed aggregates.
98
What is the key–value pattern for session storage?
Store session token → serialized user state.
99
What is a good fit example domain for a graph DB?
Fraud detection paths.
100
What is Fowler’s closing mantra?
"Use both wisely."
101
What are the ideal use cases for key-value stores like Redis, Dynamo, and Memcached?
Session state, leaderboards, write-through caching ## Footnote Key-value stores excel due to constant-time hash look-ups and in-memory datasets.
102
What is a drawback of the 'opaque value' design in key-value stores?
Every filter or sort must happen in the application layer.
103
How does MongoDB's BSON differ from CouchDB's JSON?
MongoDB uses BSON while CouchDB uses pure JSON.
104
What advantages do flexible sub-document querying and multi-key compound indexes provide in document databases?
They allow developers to persist an entire object graph with one write and run server-side analytics.
105
What is a caution regarding secondary indexes in document databases?
Each secondary index slows writes.
106
What are sparse or partial indexes used for in document databases?
Essential for heterogeneous documents.
107
What architecture does Cassandra use, and what are its components?
Log-Structured Merge-Tree architecture, including SSTables, memtables, compaction, and bloom filters.
108
How does tunable consistency in Cassandra affect the CAP triangle?
It allows you to dial the CAP triangle on a per-query basis.
109
What is a common use case for time-series workloads in Cassandra?
Wide rows partitioned by device-ID + day.
110
What is the pattern-matching syntax used in Cypher for graph databases?
MATCH (u:User)-[:FOLLOWS]->(v:User)
111
What is a significant benefit of using graph databases for high-fan-out social queries?
They reduce complex n-way JOINs to near-linear graph hops.
112
How does Google’s Spanner achieve globally strong consistency?
By using true time (GPS/CDMA + atomic clocks).
113
What type of consistency do most business domains need?
Contextual consistency.
114
What are the two deployment topologies discussed for operational setups?
Shard-then-replicate and replicate-then-shard.
115
What is the write amplification issue associated with LSM trees?
It refers to the increased amount of data written relative to the original write.
116
What is the purpose of anti-entropy repair jobs in distributed databases?
To ensure consistency and data integrity across nodes.
117
What must backups in a distributed system be coordinated with to avoid logical corruption?
A snapshot timestamp on every node.
118
What strategy is used for schema evolution in aggregate stores?
Rolling forward-compatible changes by versioning a root property.
119
What is the suggested approach for testing in distributed systems?
Using Jepsen-style nemesis chaos suites.
120
What are some common misunderstandings that lead to NoSQL disasters?
Misunderstood defaults like Cassandra’s read_repair_chance and MongoDB’s old MMAPv1 write locks.
121
What security measures are important for distributed databases?
Per-tenant encryption, TLS-on-wire, row-level ACLs.
122
What is Fowler’s concluding mantra regarding database usage?
Use both wisely.
123
What should be the first step in selecting a database according to Fowler?
Start with the access patterns (latency, throughput, query flexibility, relationship depth).
124
What is the recommended strategy for BI workloads?
Extract aggregates into a columnar warehouse instead of live-querying the operational store.
125
What does Fowler suggest regarding observability in production systems?
Automate observability (histograms, percentile dashboards, anomaly alerts) long before production traffic surges.
126
What is Fowler’s golden rule regarding database selection?
Stay with SQL for multi-row ACID transactions and flexible, ad-hoc JOIN-heavy queries; use NoSQL for single aggregates that can be denormalized.
127
Which type of database is recommended for applications needing ledger accuracy and multi-row transactions?
Relational engine (e.g., PostgreSQL, SQL Server, MySQL, Oracle)
128
What are the benefits of using a relational database for certain applications?
Point-in-time consistency, sophisticated optimizers, and a data model that excels under JOINs.
129
When is it appropriate to use NoSQL databases?
When requests can be satisfied by grabbing single blobs and a few seconds of replica drift is tolerable.
130
List examples of aggregate-oriented NoSQL stores.
* Redis * Dynamo-style key-value stores * MongoDB * Cosmos * Cassandra * Scylla * Neo4j
131
What does vertical scaling refer to?
Using a single box to handle growth without adding a new tech stack.
132
What is a potential downside of sharding in NoSQL?
It can lead to the 'unnatural act' of performing relational JOINs across nodes.
133
How do team skills influence database choice?
Familiarity with SQL and ORM reduces cognitive load, while rapid schema changes may push towards schemaless JSON.
134
What is Fowler’s pragmatic recipe for database architecture?
Begin with a relational core, monitor for bottlenecks, and carve out specific slices for NoSQL when necessary.
135
What should be kept in its own columnar warehouse according to Fowler?
Business Intelligence (BI)
136
Complete the sound-bite: 'Relational across nodes is an unnatural act; use aggregates and NoSQL only when they let you ______.'
dodge that cost, otherwise stick with SQL.