Highload Application Flashcards
(42 cards)
What is Kafka?
Kafka is an open source software which provides a framework for storing, reading and analysing streaming data.
Something like Redis but with database-level reliability
What is Memcached?
Memcached is an open source, high-performance, distributed memory caching system intended to speed up dynamic web applications by reducing the database load. It is a key-value dictionary of strings, objects, etc., stored in the memory, resulting from database calls, API calls, or page rendering.
( Tools for caching )
What is ElasticSearch?
Elasticsearch is a real-time distributed and open source full-text search and analytics engine.
What is Solr?
Solr is a scalable, ready to deploy, search/storage engine optimized to search large volumes of text-centric data
What is Reliability?
The system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error)
What is Maintainability?
Over time, many different people will work on the system (engineering and operations, both maintaining current behavior and adapting the system to new use cases), and they should all be able to work on it productively
What kind of errors can break Relibity?
- Hardware error ( database broken, turn of light etc… )
- Program error (Infinity recursion, cascade errors)
- Human factor ( Accidently remove something important)
What examples of scalability workload params do you know?
- Number of requests to webserver per second
- Number of read/write database request per second
- Number of active user in the chat
What is Hadoop?
Hadoop is an open-source software framework with ability to store and process huge amounts of any kind of data, quickly.
What is MapReduce?
MapReduce is a module in the Apache Hadoop open source ecosystem. We use MapReduce to write scalable applications that can do parallel processing to process a large amount of data on a large cluster of commodity hardware servers.
What is a rolling upgrade?
A rolling upgrade is an upgrade of a software version, performed without a noticeable down-time or other disruption of service. ( we have a load balancer and roll upgrade one by one on each server )
What is Shared-nothing architecture?
Shared Nothing Architecture (SNA) is a distributed computing architecture that consists of multiple separated nodes that don’t share resources. The nodes are independent and self-sufficient as they have their own disk space and memory. In such a system, the data set/workload is split into smaller sets (nodes) distributed into different parts of the system. Each node has its own memory, storage, and independent input/output interfaces.
What is replication?
Replication is the continuous copying of data changes from one database (publisher) to another database (subscriber).
What is a database table partitioning (секционирование/шардинг)?
Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan.
What replication strategies do you know?
- single-leader ( main node send changes to others)
- multi-leader ( several main nodes send changes to others)
- leaderless ( send data to all nodes together)
What are the differences between synchronous, asynchronous and semi-synchronous replication?
- synchronous replication waits untill all child nodes receive all updated info and then send succcess status
- asynchronous replication doesn’t wait
- semi-synchronous replication ( works synchronous only with one node and asynchronous with others)
How to add one more child node without downtime and losing data?
- Create leader db snapshot
- Move the snapshot to child db
- Register all changes on leader db since making snapshot
- Apply these changes to child db
What replication data sending strategies do you know?
- Statement-based replication (SBR)
- Write-Ahead Logging (WAL)/ Streaming Replication
- Logical replication
- Trigger replication
What is Statement-based replication (SBR) ? What are pros/cons?
Binary log stores the SQL statements used to change databases on the master server. The slave reads this data and reexecutes these SQL statements to produce a copy of the master database.
Problems
- Rand and Time.now function inside the statement
- Auto incremented columns
What is Write-Ahead Logging (WAL) replication/ Streaming Replication?
WAL stands for Write-Ahead Logging. It is the standard protocol being used to ensure that all the changes made to the database are being logged properly in their order of occurrence. ( we send low level data to replica to restore data )
What is Logical replication?
Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). We use the term logical in contrast to physical replication, which uses exact block addresses and byte-by-byte replication.
What is trigger replication?
This replication allows you to run trigger and handle data on the application side. It’s useful if you use a different DB and you need your custom logic.
What is replication lag?
A replication lag is the cost of delay for transaction(s) or operation(s) calculated by its time difference of execution between the primary/master against the standby/slave node. ( When we have differences between main and child nodes)
What is read-after-write consistency?
Read-after-write consistency is the ability to view changes (read data) right after making those changes (write data). For example, if you have a user profile and you change your bio on the profile, you should see the updated bio if you refresh the page. There should be no delay during which the old bio shows up.