622 final Flashcards

Question

Baseline MongoDB classified as PA/EC

Answer 1

Without partition, system guarantees reads and writes to be consistent (EC), but MongoDB’s data replication makes it PA: If the master node fails or is partitioned from the rest of the system, “it stores all writes that have been sent to the master node but not yet replicated in a local rollback directory.” “Meanwhile, the rest of the system elects a new master to remain available for reads and writes.” “Therefore, the state of the old master and the state of the new master become inconsistent until the system repairs the failure and uses the rollback directory to reconcile the states, which is a manual process today.”

Answer 2

if a partition occurs, prefer consistency (giving up availability), else (normal case) prefer lower latency (giving up consistency)

Answer 3

set of mechanisms and policiesincorporated in a system for purposes of mitigating the risks from threats

Answer 4

potential event that could compromise a security requirement

Answer 5

realization of a threat

Answer 6

a characteristic or flaw in system design or implementation, or in the security procedures, that, if exploited, could result in a security compromise

Answer 7

expected losses in case of attack, times chances of attack - acquisition - engagement

Answer 8

development & maintenance complex software infrastructure product licensing bigger &/or dedicatedmachines (e.g., firewalls) degraded performance encryption, authentication, authorization…

Answer 9

secure channels authentication -communicating parties are who they say they are -confidentiality and integrity: third parties cannot interfere in the communication access control authorization (authorization != authentication) -which entities can access which assets and operations accountability -always knowing who did what non-repudiation -entities cannot get away with denying actions

Answer 10

“Set of all security mechanisms in a (distributed) computer system that are needed to enforce a security policy, and thus need to be trusted” [3rd edition pg 507] Smaller the TCB, the better Less that must be trusted & less to review)

Answer 11

Trusted != trustworthy Prefer a trustworthy one

Answer 12

aka shared secret key M = DK (EK (M)) M is the data, D is decrypt, E is encrypt, and k is the key computationally efficient

Answer 13

aka public key/private key M=DK- (EK+ (M))=DK+ (EK- (M)) K+ is public key, and k- is private key computationally expensive

Answer 14

S = H(M), or S = MACK (M) S, aka digest, is a unique representation of data such that change to the data will change the representation fixed size and independent of size of M collision resistant (both weak and strong) computationally efficient

Answer 15

cannot be separated

Answer 16

Subject -> (request for operation) -> Reference Monitor -> (authorized request) -> Object

Answer 17

Each object maintains a list of the access rights of all subjects & their rights Distributes access control matrix column-wise There may be a “default” set of rights

Answer 18

Each subject maintains a list of the objects it controls & the rights over them Distributes access control matrix row-wise

Answer 19

in a system with N nodes, each node needs tokeep track of N-1 pair-wise symmetric keysthe number of distinct keys in the system is ~N 2 with asymmetric encryption each nodehas one public key that everyone else usesthe number of distinct keys in the system is ~N

Answer 20

a CA stores and makes available pairs

Answer 21

Each user and program should operate using fewest privileges possible Limits the damage from an accident, error, or attack Reduces number of potential interactions among privileged programs Unintentional, unwanted, or improper uses of privilege less likely Extend to the internals of a program: only smallest portion of program which needs those privileges should have them

Answer 22

Protection system's design should be simple and small as possible “techniques such as line-by-line inspection of software and physical examination of hardware that implements protection mechanisms are necessary. For such techniques to be successful, a small and simple design is essential.‘” Aka “KISS” principle (``keep it simple, stupid'')

Answer 23

The protection mechanism must not depend on attacker ignorance Instead, the mechanism should be public Depend on secrecy of relatively few (and easily changeable) items like passwords or private keys An open design makes extensive public scrutiny possible Makes it possible for users to convince themselves system is adequate Not realistic to maintain secrecy for distributed system

Answer 24

Every access attempt must be checked; position the mechanism so it cannot be subverted.

Answer 25

The default should be denial of service, and the protection scheme should then identify conditions under which access is permitted

Answer 26

Ideally, access to objects should depend on more than one condition, so that defeating one protection system won't enable complete access

Answer 27

Minimize the amount and use of shared mechanisms (e.g. use of the /tmp or /var/tmp directories) Shared objects provide potentially dangerous channels for information flow and unintended interactions

Answer 28

The human interface must be designed for ease of use so users will routinely and automatically use the protection mechanisms correctly Mistakes will be reduced if the security mechanisms closely match the user's mental image of his or her protection goals

Answer 29

Key application of least privilege – no interface, no problem Do users need direct access to database? Maybe not

Answer 30

narrowly identifies what is allowed; all other things forbidden – do this

Answer 31

list of what is forbidden (all other things permitted) – do not use blacklists (in general)

Answer 32

use prepared statements

Answer 33

select languages / libraries to eliminate/reduce risk

Answer 34

encode responses

Answer 35

Use salted hashes with key-stretching

Answer 36

Prevention best, but attackers resourceful Also work to detect / contain /respond to reduce damage of successful attack

Answer 37

Grounds for justified confidence that a claim has been/will be achieved Claim(s): Top-level claim(s) for a property of a system or product Arguments: Systematic argumentation justifying this claim Evidence/assumptions: evidence & explicit assumptions underlying argument

Answer 38

Organized into layers, call down to lower layers Protocol stacks, application layering (stacks)

Answer 39

Objects have methods that are called

Answer 40

Huge collection of resources (e.g., REST)

Answer 41

Processes can public & subscribe to notifications

Answer 42

Client-server: Can assign different parts to client vs. server Multitiered: Server can be client of another server

Answer 43

Structured peer-to-peer: nodes organized into a specific topology (“overlay network”) Unstructured peer-to-peer: Each node maintains ad hoc list of neighbors Hierarchically organized peer-to-peer – special nodes maintain an index (for searching) or broker

Answer 44

Edge servers placed “at the edge” of network Clients access Internet through edge servers Edge servers serve (some) content directly – optimize access E.g., Content Distribution Networks (CDNs)

Answer 45

Decentralized collaboration (often after client/server startup)

Answer 46

Term “mobility” has two meanings Physical mobility - things move in physical space, so physical location changes (Logical) code migration, aka “code mobility” or “mobile code” – code to be executed moves from one compute element to another Both can happen together From here on, we focus on (logical) code migration

Answer 47

local and remote interaction have different semantics in terms of latency, access to memory, partial failure and concurrency

Answer 48

distributed systems can dynamically change and move the execution units (software components)

Answer 49

Weak mobility Send some code to be executed remotely, start at “beginning” Link and execute code downloaded from a remote location E.g., Java Applets, client-side Javascript. Easier to implement! Strong mobility Execution units move among locations carrying their execution state with them (not just “start at the beginning”) E.g., software agents, virtual machine migration

Answer 50

Migration initiated by target machine

Answer 51

Migration initiated by machine where code currently resides or is being executed

Answer 52

Deployment and upgrade of distributed applications REV and MA: “maintenance agents” go across the network to detect the current configuration/settings, and remove/install components COD: lazy and automated propagation of changes Customization of services Service interfaces are usually defined statically and a priori A server can be customized according to the specification of the code sent to it E.g., sending a Postscript to a printer server Support for disconnected operations Code mobility to decrease network traffic, or in anticipation of network disconnection

Answer 53

Dynamic Reconfiguration Adaptivity Asynchronous Interaction Context-Awareness Lightweight Middleware

Answer 54

sophisticated algorithms over large dictionaries high quality translation efficient algorithms over most common instances fast translation

Answer 55

MAPE-K loop: Monitoring, Analysis, Planning, and Execution - Knowledge

Answer 56

Every user has a private key and a public key (numbers) Everyone knows user’s public key Private key is the user’s secret, never shared with anyone Public key is uniquely determined by the private key Virtually impossible to compute private key from public key Can be used for encryption and digital signatures

Answer 57

User wants to send a message and prove that he wrote it Takes message and private key and performs a computation to create a signature Recipient compares the signature against the message and the user’s known public key Only the user who possesses the private key can sign messages, does not need to share the private key

Answer 58

Takes arbitrary data and transforms it to a 256-bit number The hash of random data is essentially a random number

Answer 59

A transaction structure for specifying and changing ownership A p2p network for propagating, verifying and storing transaction data A proof-of-work system (hashing, “mining”) for: Synchronizing transactions Determining initial distribution of coins

Answer 60

The fundamental building block of Bitcoin is a “coin” A coin is characterized by: Unique ID Quantity (denomination) – arbitrary number with 8 decimal places Owner Coins can be split and merged If Alice wants to send bitcoins to Bob, she will merge some of her coins and split the result between her and Bob

Answer 61

The owner of a coin is identified by an “address” Each address is associated with a private key To use a coin, the owner must provide a digital signature with the associated private key (ECDSA) The process where coins are merged and split is called a “transaction”

Answer 62

A transaction can have any number of inputs and outputs An output specifies a receiving address and amount An input references a previous unspent output The total value of all inputs must be at least the total value of all outputs The transaction is identified by a hash of its data The hash must be signed by the private key corresponding to every input address An address is a hash of an ECDSA public key More generally, an output specifies a script with the conditions to allow spending it

Answer 63

Using the same output (“coin”) to pay 2 different recipients No agreement on who is the “true” recipient One recipient will be out of his coins (presumably after providing some product) Some way to determine order of transactions is needed Traditional solution: Central authority Naïve decentralized solutions have vulnerabilities The first working decentralized solution is the blockchain

Answer 64

Transactions are grouped into blocks Blocks are confirmed with proof of work A transaction is considered final if it is included in a confirmed block Each block references a previous block to form a chain In case of conflict, the transaction with more compute power spent on confirmation wins Attacks require having more compute power than the rest of the network

Answer 65

Transactions are organized in a Merkle tree with a resulting root hash The block header consists of the Merkle root, the hash of the previous block, other metadata, and a nonce The block is identified by the SHA-256 hash of its header A block is valid only if its hash is lower than the target

Answer 66

A block with given data and nonce has a very low probability of being valid Miners try different nonces and compute the resulting hash (billions of tries per second) until they match the target, and release the resulting block The existence of a block which includes a transaction proves that computational work has been done by a node which considers this transaction valid Each block references the previous one. Each transaction gets increasingly more powerful proof of work In case of competing branches, the one with the most proof of work is selected A transaction “buried” under several blocks is very hard to revert mistakenly or maliciously Reverting a transaction requires catching up with the computation of the honest network, which is unlikely without greater hashrate Any change to a transaction invalidates all proof of work Hash target is adjusted every 2016 blocks (roughly 2 weeks) so that on average one block is found every 10 minutes

Answer 67

Every block is allowed one special “generation transaction” A generation transaction has a single special input, and any number of outputs Value of input: New coins + tx fees New coins: 50∙2^(−⌊𝐻/210000⌋ ) (starts at 50 BTC per block and halves roughly once every 4 years) Incentivizes securing the network by hashing Robust way to determine initial distribution

Answer 68

nodes make up the bitcoin network anybody can connect to other nodes and download the blockchain nodes listen to transactions and check if they are valid valid transactions are forwarded and stored, invalid ones rejected

Answer 69

take the hash of the previous block and all transactions. guess a random number, hash it. published with the block to other nodes the miner is allowed to send themselves 25 bitcoins from nowhere miners mine at the margin and need to pay electricity, creating a liquid market for bitcoins

Answer 70

only 21 million will ever be created. divisible into 100 million satoshis cannot be easily seized or censored ignorant to borders or location transaction clears within 10-60 minutes programmable and permissionless

Answer 71

can only handle 7 transactiosn per second fungibility cannot be guaranteed unexpected behavior of bitcoin software information security in a horrible state minig consumes vast amounts of energy attractive for criminals strong fluctuations in value

Answer 72

Although the accepted chain can be considered a list, the block chain is best represented with a tree. The longest path represents the accepted chain. A participant choosing to extend an existing path in the block chain indicates a vote towards consensus on that path. The longer the path, the more computation was expended building it.

Answer 73

Contracts are the main building blocks of Ethereum. A contract is a computer program that lives inside the distributed Ethereum network and has its own ether balance, memory and code. Every time you send a transaction to a contract, it executes its code, which can store data, send transactions and interact with other contracts. Contracts are maintained by the network, without any central ownership or control. Contracts are written in languages instantly familiar to any programmer and powered by Ether, Ethereum's cryptofuel.

Answer 74

“a portable, extensible open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.”

Answer 75

A container image is “a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings”* A container (e.g., as implemented by Docker) is a (mostly) isolated process running a container image isolates “software from its surroundings”* Advantages of containers: “containerized software will always run the same, regardless of the environment.”* Faster start & lower resource vs. VMs (shared kernel)

Answer 76

Kubernetes Docker Swarm Apache Mesos

Answer 77

Just declare “what you want” K8s notices & fixes differences of reality vs. request Automatically handles computer crashes, etc.

Answer 78

Pets: Servers… treated as indispensable or unique systems that can never be down. Typically they are manually built & managed” “Cattle: Arrays of >2 servers, that are built using automated tools, and are designed for failure, where [no 1..3] are irreplaceable. Typically, during failure events no human intervention is required as the array exhibits attributes of “routing around failures” by restarting failed servers or replicating data…”

Answer 79

Worker node, aka node: a VM or physical machine that has the services necessary to run pods and is managed by the K8s master components

Answer 80

group of 1+ containers with shared storage/network & a spec for how to run the containers Always co-located and co-scheduled, and run in a shared context Defined by a “Podspec” An application-specific “logical host” - contains one or more application containers which are relatively tightly coupled Each pod had its own IP address & runs on a worker node Routinely destroyed/created (not necessarily same IP address)

Answer 81

primary “node agent” that runs on each node takes a set of PodSpecs and ensures that the containers described in those PodSpecs are running and healthy

Answer 82

Services REST operations and provides the frontend to the cluster’s shared state through which all other components interact validates and configures data for the api objects which include pods, services, replicationcontrollers, and others

Answer 83

a command line interface for running commands against Kubernetes clusters

Answer 84

Kubernetes controller manager: “a daemon that embeds the core control loops shipped with Kubernetes” Control loop = non-terminating loop that regulates the state of the system

Answer 85

schedules pods onto nodes If a pod hasn’t been assigned a node, assign it Does not run pods – schedules them to be run

Answer 86

“Consistent and highly-available [distributed] key-value store used as Kubernetes’ backing store for all cluster data” Globally available Implemented using the Raft algorithm (with some tweaks, discussed later)

Answer 87

A deployment = declarative update for Pods and ReplicaSets Simplified: declare the number of replicas & the template for the pods Makes it easy to deploy replicas, change # of replicas

Answer 88

Kubernetes Service: an abstraction which defines a logical set of Pods and a policy by which to access them sometimes called a micro-service Set of Pods targeted by a Service is (usually) determined by a Label Selector

Answer 89

kube-proxy: Kubernetes network proxy runs on each node Reflects services as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding or round robin TCP and UDP forwarding across a set of backends

Answer 90

Kubernetes is often used with Prometheus Prometheus = an open-source systems monitoring and alerting toolkit

Answer 91

Docker: Popular container system There are others (e.g., rkt) Open Container Initiative (OCI) established by Dockers & others defined 2 specs: Runtime Specification (runtime-spec): outlines how to run a “filesystem bundle” that is unpacked on disk. Image Specification (image-spec) An OCI implementation: downloads an OCI Image unpacks it into an OCI Runtime filesystem bundle… that is run by an OCI Runtime

Answer 92

FM spec language can reduce requirement ambiguity FM can prove “always” or “never”

Answer 93

Many backoff approaches exist to limit costs. One grouping is:* Level 0: Formal specification created, then program informally developed from it. “Formal methods lite” Level 1: Level 0, + prove some select properties or formal refinement from specification towards program Level 2: Fully prove claims, mechanically checked Any of the above can apply to a subset of components or properties

Answer 94

define expressions with just: Boolean variables Operators: and (∧); or (∨); not (¬) Parentheses allowed Predicate logic too limited for many problems Useful in some cases, e.g., dependency analysis Can be used to implement more sophisticated systems Provides the basis for FOL & other logic systems

Answer 95

FOL widely used mathematical logic Aka “first-order predicate logic” or “first-order predicate calculus” Basis of most FM software languages for specs In traditional FOL, every expression is either a: Term (an object / “non-boolean”): a variable, a constant, or a function f(term1, term2, …) Typically there’s a “domain of discourse” (aka “universe”), the set of entities over which variables may range. E.G., “integers” or “real numbers” Formula (a truth value / “boolean”): see next slide, includes predicates (“functions” that return truth value) Can have variables & constants (must distinguish) Prolog convention: Variable if begin with uppercase, else constant Math convention: Variables begin with w, x, y, z In FOL, functions & predicates can’t be variables

Answer 96

strongest math notation (+ functions can vary/be objects) Again, can add “theories” about integers, etc. to HOL

Answer 97

provide more capability But tend to be harder to automatically analyze Often want “weakest” language that meets needs

Answer 98

FOL formula notation

Answer 99

∀ X expression = “for all” “Loops” over everything X could be, is true iff the expression is always true for all allowed values of X ∃ X expression = “there exists” True iff there is some allowed value of X where the expression is true ∀ X (man(X) → mortal(X)) “For all X, if X is a man, then X is mortal”

Answer 100

Surprisingly, these can BOTH be true at the SAME time: ∀ X (martian(X) → green(X)) “All Martians are green” ∀ X (martian(X) → ¬green(X)) “All Martians are not green” Why? If there are no Martians, both expressions are true martian(X)→… => ¬martian(X) ∨ … => T ∨ … => T Without Martians they all simplify to ∀ X T, which is true

Answer 101

Extension of Hoare logic developed by John C. Reynolds, Peter O'Hearn, Samin Ishtiaq and Hongseok Yang Describes “states” consisting of a: store (“stack-oriented variables”) and a heap (“dynamically-allocated objects”) Defines a set of operations about them “Frame rule” enables local reasoning A program that executes safely in a small state can also execute in any bigger state and that its execution will not affect the additional part of the state when certain conditions proved E.G., Coq “Ynot” library implements separation logic Can apply separation logic concepts using traditional FOL

Answer 102

Verification tools Major verification approaches include: Theorem-provers: Automated & interactive Satisfiability (SAT) solvers: Boolean-only or modulo theories Model-checkers Abstract interpretation / symbolic execution (for programs)

Answer 103

Accepts assumptions (“givens”) & goal in some notation Tries to produce proof of goal, starting from assumptions… Using only a sequence of allowed inference rules & theorems

Answer 104

Every step fully justified by accepted rule “Proof checker” can verify proof - easy to build, enabling separate third-party verification. Supports “Proof carrying code”

Answer 105

Automated (automatic, but typically only FOL & can fail) Interactive (“proof assistant”) (requires far more expertise)

Answer 106

automated tools that: Given predicate logic expressions (boolean variables, and, or, not)… Find variable assignments to make true OR report unsatisfiable

Answer 107

automated tools Given expressions in richer notation beyond predicate logic Typically FOL + “theories” (variables may be integers, reals, etc.) E.g., (x+y ≥ 0) ∧ (y > 0) is satisfiable with integers x=1,y=2 Reports satisfiable (“sat”) (maybe with satisfying variables) or “unsat” or “unknown” (e.g., ran out of time/memory) To determine if “X is always true”, supply “not X”… returns unsat Some can also provide proof (if can’t, how verify results?)

Answer 108

Given a system model, exhaustively check if meets a given requirement Requirement is often narrow property, often temporal requirement System is represented as a finite-state machine (FSM) Exhaustively explore state (conceptually) Clever approaches  orders-of-magnitude faster vs. brute force. E.g.: Symbolically represent FSM, e.g., using binary decision diagrams (BDDs) Abstraction (simplify system for this specific property) Bounded model checking - unroll FSM for fixed number of steps (build on SAT) Only shows true/false for that many steps!! Counterexample guided abstraction refinement (CEGAR)

Answer 109

“general-purpose formal specification language that is particularly useful for describing concurrent and distributed systems. The TLA+ proof language is declarative, hierarchical, and scalable to large system specifications. It provides a consistent abstraction over the various “backend” verifiers.”

Answer 110

mechanically checks TLA+ proofs Copyright Inria and Microsoft Corporation BSD-style OSS license By default it tries to prove it using an SMT solver (by default CVC3), Zenon (tableax-based FOL prover), and then Isabelle’s “auto” tactic Has been used to prove designs of practical systems

Answer 111

): A distributed file system that provides high-throughput access Many other data storage approaches also in use E.G., Apache Cassandra, Apache Hbase, Apache Accumulo (NSA-contributed)

Answer 112

A framework for job scheduling and cluster resource management

Answer 113

A YARN-based system for parallel processing of large data sets Other execution engines increasingly in use, e.g., Spark

Answer 114

Stores large files (typically gigabytes-terabytes) across multiple machines, replicating across multiple hosts File system intentionally not fully POSIX-compliant Write-once-read-many access model for files. A file once created, written, and closed cannot be changed. This assumption simplifies data coherency issues and enables high throughput data access Intend to add support for appending-writes in the future Can rename & remove files

Answer 115

a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster Programmer defines two functions, map & reduce Map(k1,v1) → list(k2,v2). Takes a series of key/value pairs, processes each, generates zero or more output key/value pairs Reduce(k2, list (v2)) → list(v3). Executed once for each unique key k2 in the sorted order; iterate through the values associated with that key and produce zero or more outputs

Answer 116

If defined, runs after Mapper & before Reducer on every node that has run a map task Combiner receives as input all data emitted by the Mapper instances on a given node Combiner output sent to the Reducers, instead of the output from the Mappers Is a "mini-reduce" process which operates only on data generated by one machine

Answer 117

Many problems aren’t easily described as map-reduce Persistence to disk typically slower than in-memory work

Answer 118

Processing engine; instead of just “map” and “reduce”, defines a large set of operations (transformations & actions) Operations can be arbitrarily combined in any order Open source software Supports Java, Scala and Python Original key construct: Resilient Distributed Dataset (RDD) Original construct, so we’ll focus on that first More recently added: DataFrames & DataSets Different APIs for aggregate data

Answer 119

RDDs represent data or transformations on data RDDs can be created from Hadoop InputFormats (such as HDFS files), “parallelize()” datasets, or by transforming other RDDs (you can stack RDDs) Actions can be applied to RDDs; actions force calculations and return values Lazy evaluation: Nothing computed until an action requires it RDDs are best suited for applications that apply the same operation to all elements of a dataset

Answer 120

: Return a new distributed dataset formed by passing each element of the source through a function func.

Answer 121

Return a new dataset formed by selecting those elements of the source on which func returns true union(otherDataset): Return a new dataset that contains the union of the elements in the source dataset and the argument.

Answer 122

Return a new RDD that contains the intersection of elements in the source dataset and the argument.

Answer 123

Return a new dataset that contains the distinct elements of the source dataset

Answer 124

When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.

Answer 125

Aggregate the elements of the dataset using a function func (which takes two arguments and returns one). The function should be commutative and associative so that it can be computed correctly in parallel.

Answer 126

Return all the elements of the dataset as an array at the driver program. This is usually useful after a filter or other operation that returns a sufficiently small subset of the data.

Answer 127

Return the number of elements in the dataset

Answer 128

You can persist (cache) an RDD When you persist an RDD, each node stores any partitions of it that it computes in memory and reuses them in other actions on that dataset (or datasets derived from it) Allows future actions to be much faster (often >10x). Cache is fault-tolerant – Can manually call unpersist()

Answer 129

Unlike an RDD, data organized into named columns, e.g. a table in a relational database. Imposes a structure onto a distributed collection of data, allowing higher-level abstraction

Answer 130

Extension of DataFrame API which provides type-safe, object-oriented programming interface (compile-time error detection)

Answer 131

Python & R don’t have compile-time type safety checks, so only support DataFrame Error detection only at runtime Java & Scala support compile-time type safety checks, so support both DataSet and DataFrame

Answer 132

Performance: Spark normally faster but with caveats Spark can process data in-memory; Hadoop MapReduce persists back to the disk after a map or reduce action Spark generally outperforms MapReduce, but it often needs lots of memory to do well; if there are other resource-demanding services or can’t fit in memory, Spark degrades MapReduce easily runs alongside other services with minor performance differences, & works well with the 1-pass jobs it was designed for Ease of use: Spark is easier to program Data processing: Spark more general Maturity: Spark maturing, Hadoop MapReduce mature

622 final Flashcards

(156 cards)