Overall Flashcards Preview

Distributed System > Overall > Flashcards

Flashcards in Overall Deck (24)
Loading flashcards...

Data System Categories

Database (SQL)
Message Bus (Kfafa)
Cache (Redis)
Logging (Logstash)

The interface is blurring e.g. all data system can provide SQL style "query" , or Database level reliability


What is Distributed System?

The functionality of an application is distributed to different components within a system. It is the result of many patterns.

Distribution of Functionality (components)
Reuse/Share/Common Design/Language/Independent
Monolithic vs Micro-services
Divide and Conquer
Separation of Concern
Do one thing and do one thing well

Allow created a new, special-purpose data
system/Application from smaller, general-purpose components (Redis, Kafka, mySQL, ElasticSearch)


Fault vs Failure

Faults (HW or SW) refer to issues of component/s. System should still be functioning due to fault-tolerant.

Failure refers to system failure (down time)

Testing should purposely introduce faults to test system's fault tolerance (Netflix Chaos Monkey)


Load Parameters

Numbers that describe system loads
1. CCR (concurrent requests to the service)

Define various load parameters of your system and how to handle them efficiently


Response Time =
Delays (Queues, Network) +
Latency (Waiting in idle) +
Service time

Response varies even for a same request

We therefore need to think of response
time not as a single number, but as a distribution of values that you can measure (histogram)

Long Tailed Distribution .... the small percentage having a long response time

Small percentage * large customer base ==> big number of customers having long response time!


Load Parameters

Numbers that describe system loads
1. CCR (concurrent requests to the service)
2. Read/Write ratio/Volume etc

Define various load parameters of your system and how to handle them efficiently

An architecture that scales well for a particular application is built around assumptions
of which operations will be common and which will be rare—the load parameters.


Long Tail Distribution
Tail Latency (Latency of P999)
Large Latency for small number of requests

User making most request would be affected by Tail Latency, and these are heavy users==> most valuable usrs



Define the response time at p50 (median), p99 or p999


Long Tail Distribution
Tail Latency (Latency of P999)
Large Latency for small number of requests
Tail Latency Amplification

User making most request would be affected by Tail Latency, and these are heavy users==> most valuable users


Head of Line Blocking

A VM can process N tasks a the same time (limited by CPUs, threads etc).

The tasks at the head of the queue taking a long time (due to its large dataset), contributing to the "latency" time of the waiting tasks, even though the service time for the waiting tasks would be quick (small datasets)

Dedicated a certain number of threads for long executing tasks and rest for small tasks?


Scale Cube

X: Horizontal scale / scale by duplication /Scale Up|Out
Y: Scale by decomposition / Microservices /Distribution
Z: Scale by data/network partition


A big ball of mud --- over complicated software (high collaborators)
Low maintainability

Large state space
#ifdef or if(edge_case)
tangled dependencies (loop, mesh)
hacks/workaround here and there
inconsistent naming/convention
Hidden assumptions


Remove "accidental complexity", the complexity not from the problem but from implementation of the solution

Design patterns
Clean interface


Service mesh
--an infrastructure layer that address the cross-cutter concerns for multiple microservices

The more microservices an application is made of, the more the application needs a service mesh layer.

Without a service mesh,
... each microservice implements business logic and cross cutting concerns (CCC) such as logging, caching, security, load balancing by itself.

With a service mesh,
... many CCCs like traffic metrics, routing, and encryption are moved out of the microservice and into a proxy. business logic and business metrics stay in the microservices. Incoming and outgoing requests are transparently routed through the proxies. In addition to a layer of proxies (data plane), a service mesh adds a so-called control plane. It distributes configuration updates to all proxies and receives metrics collected by the proxies for further processing, e.g. by a monitoring infrastructure such as Prometheus.


WHAT: Observability vs (>) Monitoring

Observability: the ability of asking questions (splunk queries) from outside to understand the inside of the system.

3 Pillars:


[ORIGIN] Twitter engineering team said a few years ago on their blog, the 3 pillars of observability are
Metrics (alerting)
Traces (across distributed systems and services)
Logs (aggregation/analytics)




HOW: Observability

one tool, one place: (all logs in raw format sent to one central location, log aggregation)

app instrumentation (OpenCencus)

Multiple tools: datadog for one service, plunk for another service etc.



Site Reliability Engineering:

Service Level Indicators/Objects/Agreement


(See more about Functional in Testing class)
Functional Programming basically is to realize complicated feature by composing the a set of state-less functional cores together.

Examples of Functional architecture in embedded system?

***Math equations is an example of functional programming
The Math.add()/Minus()/Divide()/Multiply() are basic fuctional cores. They can be used to compose more complicated equations:

Math.multiply(Math.minus(Math.add(1.2), 3), 4)

We can see functions are "chained" or "pipelined".

***GST or MPIPE pipeline is another example of functional programming

Each PLUGIN or CALCULATOR is a functional CORE
1) the plugins has NO internal state or hidden output.
2) Branching and other kinds of complexity may come into play at pipeline level, not at plugin/element level.
(E.g. use Tee element for branching)


Function Core == Pure Function

A pure function is deterministic (a given input always results in the same output) and has no side effects (such as mutation or I/O)

Functional Core is the decision making core with no internal state. It purely make decisions ("data transformation") on the input data, and output the decisions ("transformed" data).


Pure function has no mutable internal states...eventually it still need to interact with external states (database, file IO)

Functional programmers try to keep this kind of nondeterminism at the edges of the pipeline as much as possible

The "Edges" of pipeline is the "Source Element" or "Sink Element"

This is very similar to GST pipeline where each element is functional, and source/sink elements ("edges" of pipeline is where the. IO happens)


Entity vs Value object

Entities have their own intrinsic (meaningful) identity, value objects don’t.

The notion of identity equality refers to entities; the notion of structural equality refers to value objects; the notion of reference equality refers to both. (Entity equals on ID, Value equals on its member values)

Entities have a history; value objects have a zero lifespan.

A value object should always belong to one or several entities, it can’t live by its own (i.e. it has no meaning on its own)

Value objects should be immutable; entities are almost always mutable.

To recognize a value object in your domain model, mentally replace it with an integer.

Value objects shouldn’t have their own tables in the database.

Always prefer value objects (Immutable) over entities (mutable) in your domain model.

Entity has a lifespan, usually a record/row in the database
Value Object does not have life span itself. It must be owned by Entity. For example it is a value in a record.

Entity has an unique Entity ID
Value does not have an Entity ID

Value is immutable. Entity is mutable

Person {
Address {

University {
Address {

Person is an entity, it has unique ID, and it has life span:
its "age" and "address" changes over time for a person.

Address is a value. An address itself has no point of having an ID (which would make it an entity) as address
1) does not change
2) has no meaning on its own (whose address?)

A University is an entity


DTO Shape

The "structure" of a Data Transfer Object.

Here a two shapes of same Person Object

Person Person
{ {
name, name,
street, Address {
state, street,
zip state,
} zip,

If you Person Table has

which "shape" will return the DTO to your client (webpage)?

If you have two clients (two websites), one may prefer one shape and the other may prefer another shape, which one you do you return? you can only return one shape, but serving two clients. One client may not use the DTO directly.


In MVC + DTO + DTO Adapter
Controller is consider "application logic"
Model is considered "domain logic"

In hexagonal architecture, application service layer is consider controller, and domain layer is consider domain logic.

controller/application logic should be the one working with "out of process" dependency collaborators (database, message bus) while the model/domain logic should be one working on business decision only (pure function, functional core)

The "view" is the client/webpage
DTO is the data transferred between View/client and Controller

Controller is the entity that bridges the client the database. The DTO bridges the data shape between client and the database.

DTO is the data contract (vs API contract) between client and controller.

Without DTO, the view/client would access the data of same shape as in database, thus creating a coupling of front-end to backend. (I.e. a db schema change may result in a front end change)

DTO Adapter adapts data from DB shape into the needed DTO shape

In summary, the only argument against using DTOs is the additional work required to write and manage the number of resulting DTO classes and DTO adapters. It is not, however, a simple matter of a programmer's laziness. In large projects, decoupling presentation from the service layer with DTO costs you hundreds of new classes


ClientDTO Entity Object DB

To use DTO or Entity Directly?

DTO downside: may incurr a lot of new CTO classes.
trivial, but pain to maintain.

Use Entity directly as the contract between client and controller/service layer: Simple, but coupling

Whether to use DTOs or not is not a point easy to generalize. To be effective, the final decision should always be made looking at the particulars of the project. In the end, a mixed approach is probably what you'll be doing most of the time. Personally, I tend to use entities as much as I can. This happens not because I'm against purity and clean design, but for a simpler matter of pragmatism. With an entity model that accounts for only 10 entities and a few use cases, using DTOs all the way through doesn't pose any significant problem. And you get neat design and low coupling. However, with hundreds of entities and use cases, the real number of classes to write, maintain, and test ominously approaches the order of thousands. Any possible reduction of complexity that fulfills requirements is more than welcome.