Data Intensive Ch1 - Reliable, Scalable and Maintainable Applications Flashcards

1
Q

Pillars of reliability

A

System should continue to work CORRECTLY (correct function at desired performance) even in the face of ADVERSITY

Tolerating:
Hardware faults
Software faults
Human error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pillars of scalability

A

As system GROWS in data volume, traffic volume, complexity there should be reasonable ways of dealing with that growth

Measuring load
Measuring performance
Latency percentiles
Throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Maintainability

A

Over time many folks will work with the system (engineering and operations) and they should be able to do so PRODUCTIVELY

Operability
Simplicity
Evolvability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the different between data and compute intensive apps?

A
Data intensive are rarely limited by CPU power
The challenges are:
Amount of data
Complexity of data
Speed oat which data is changing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is data intensive apps is typically built?

A

From standard building blocks providing commonly needed functionality like:
Database for storing data
Caches for storing result of expensive operations or to speed up reads
Search indexes to allow look up data in various ways
Stream procesing to send an asyc message to another process
Batch processing to periodically crunch large data

Elements are so obvious nobody ever thinks about writing them from the scratch
BUT each of the blocks is provided in different variants of different characteristics and different apps have different requirements..
Combining tools can be difficult if requirement is to do something single tool cannot do alone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Database and message queue have some superficial similarity - both store data for some time. So what is different?

A

Access patterns to data -> different performance characteristics -> different underlying implementation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Context map

A

Fig 1-1 p5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Factors the incluence design of data systems

A
Skill & Exp of people involved
Legacy system dependencies
Time scale for delivery
Org's tolerance of diff kinds of risk
Regulatory constraints
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does working correctly mean in context of reliability?

A

App performs the function user expects
It tolerates user making mistakes or using software in unexpected ways
Performance is good enough for the use case, under expected load
System prevents any unauthorized access and abuse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Things that can go wrong are called…

System that anticipate them and copes with them is called

A

Faults
fault-tolerant or resilient

Fault-tolerant does not mean it can tolerate ANY fault

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Fault vs failure

A

Fault - one component of system deviates from its spec

Failure - system as whole stops providing the required service to user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hardware errors

A

Usually thought of as random and independent from each other
Failure of disk on one machine usually does not imply failure on another machine (could be the case if server racks temp goes up)

Redundancy of disks (hardware components) was enough until recently
Single machine failure was rare so multi-machine redundancy was not needed

As data volume grows, apps began using more machines which increases probability of hardware faults
Cloud platforms commonly do not guarantee single-machine reliability
Hence movement towards sysytems tolerating loss of entire machines in addition to hardware redundancy

Examples:
hard disks cras
faulty RAM
power grid blackout
Cable is unplugged
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Software errors

A

Bug which causes app instance to crash on given input
Runaway process that eats up shared resource like RAN or network bandwith
External service dependency slows down or crashes or returns corrupted responses (SAM JWKS hello hello!)
Cascading failures - small fault in one component propagates faults in another and another etc

Usually lie dormant until triggered by unusual circumstances
Usually reveal some assumption about apps environment which USUALLY is true

Remedies: analysis, testing, process isolation, monitoring and alerts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly