Synchronization Flashcards

1
Q

Cache Organization

A

caches are organized in fixed-size chunks called cache lines
* on most CPUs a cache line is 64 bytes
* data accesses go through cache, which is transparently managed by the CPU
* caches implement a replacement strategy to evict pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cache Coherency Protocol

A

CPU manages caches and provides the illusion of a single main memory
using a cache coherency protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Performance Implications of Cache Coherency

A
  • reads result in copies of data in local caches
  • writes invalidate cache lines on other cores
  • communication between cores is fairly expensive (happens through shared
    L3 or main memory)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

x86-64 Memory Model

A

x86-64 implements Total Store Order (TSO), which is a strong memory model
* this means that x86 mostly executes the machine code as given
* loads are not reordered with respect to other loads, writes are not reordered
with respect to other writes
* however, writes are buffered (in order to hide the L1 write latency), and reads
are allowed to bypass writes
* a fence or a locked write operations will flush the write buffer (but will not
flush the cache)
* important benefit from TSO: sequentially consistent loads do not require
fences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Concurrent List-Based Set

A
  • keys are stored in a (single-)linked list sorted by key
  • head and tail are always there (unendlichkeit elements)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Coarse-Grained Locking

A
  • use a single lock to protect the entire data structure
    + very easy to implement
    − does not scale at all
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Lock Coupling

A

hold at most two locks at a time
* interleave lock acquisitions/releases pair-wise
* read/write locks allow for concurrent readers
+ easy to implement
+ no restarts
− does not scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Optimistic Lock Coupling

A
  • associate lock with update counter
  • write:
  • acquire lock (exclude other writers)
  • increment counter when unlocking
  • do not acquire locks for nodes that are not modified (traverse like a reader)
  • read:
  • do not acquire locks, proceed optimistically
  • detect concurrent modifications through counters (and restart if necessary)
  • can track changes across multiple nodes (lock coupling)
    + easy to implement
    + scalable
    − restarts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Terminology: Non-Blocking Algorithms

A
  • killing a thread at any point of time should not affect consistency of the data
    structure (this precludes locks)
  • non-blocking data structures may be beneficial for (soft) real-time
    applications
  • classification (from strong to weak):
  • wait-free: every operation is guaranteed to succeed (in a constant number of
    steps)
  • lock-free: overall progress is guaranteed (some operations succeed, while
    others may not finish)
  • obstruction-free: progress is only guaranteed if there is no interference from
    other threads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lock-Free List

A
  • insert and remove are lock-free, contains is wait-free
  • remove: marks node for deletion, but does not physically remove it
  • marker is stored within the next pointer (by stealing a bit of the pointer)
  • contains traverses marked nodes (but needs same check as Lazy variant)
    + contains always succeeds
    + scalable
    − insert/remove may restart
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Lazy

A
  • contains is wait-free
  • insert/remove traverse list only once (as long as there is no contention)
  • add marker to each node for logically deleting a key

+ usually only one traversal necessary
+ scalable
− insert/remove have to lock

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Locks

A
  • locks are usually implemented using atomics
  • waiting and suspending threads is an interesting problem 4
  • tradeoff between speed on ’fast path’ and resource usage on ’slow path’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Memory Reclamation for Lock-Free Data Structures

A

after deleting a node in a lock-free data structure, lock-free readers might
still be accessing that node
* freeing/reusing that node immediately can cause correctness problems
and/or crashes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Reference Counting

A
  • associate every node with an atomic counter
  • effectively results in similar behavior as locks
    + easy to use
    − does not scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Epoch-Based Memory Reclamation

A
  • global epoch counter (like timestamp, but incremented infrequently)
  • per-thread, local epoch counters
  • before every operation: enter epoch by setting local epoch to global epoch
  • after every operation: set local epoch to ∞ indicating that this thread is not
    accessing any node

+ fairly low overhead, scales well
+ no need to change data structure code (simply wrap operations)
− a single slow/blocking thread may prevent any memory reclamation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Traditional Hazard Pointers

A

Pointers for lock free operations

  • observation: most lock-free operations only require a bounded number of
    node pointers at any point in time: hazard pointers
  • during traversal, one can store these into a limited number of thread-local
    locations
  • hazard pointers effectively announce to other threads “I am currently
    accessing this node, do not reclaim it”
  • after storing the hazard pointer, one may need to check reachability again
  • before physically reclaiming a node, check if any thread has a hazard pointer
    referencing that node
  • normally the hazard pointer stores have to be sequentially consistent
    + non-blocking
    − high overhead: stores to HP must be sequentially-consistent
    − invasive: requires some data structure changes
17
Q

Implementing Shared Concurrent Counters

A
  • alternative 1: shared global atomic counter
  • alternative 2: per-thread counters, add up on demand
  • alternative 3: approximate counter storing the logarithm of the counter value
    (incremented probabilistically)
18
Q

ART: lock coupling

A
  • easy to apply to ART
  • modifications only change 1 node and (potentially) its parent
  • additional lock for root note changes
  • can use read/write locks to allow for more concurrency
19
Q

ART: optimistic lock coupling

A
  • fairly straightforward: add lock and version to each node
  • how to detect that root node changed?
    1. extra optimistic lock for root pointer (outside of any node)
    2. always keep the same Node256 as root node (similar to static head element in
    list-based set)
    3. after reading the version of the current root, validate that the root pointer still
    points to this node
20
Q

ART with ROWEX

A

+ wait-free lookups
− much more complicated than OLC, requires invasive changes

21
Q

B tree: lock coupling

A

must detect root node changes (e.g., additional lock)
* structure modification operations may propagate up multiple levels:
* eagerly split full inner nodes during traversal (good solution for fixed-size keys)
* restart operation from the root holding all locks

22
Q

B tree: optimistic lock coupling

A
  • writers usually only lock one leaf node (important for scalability)
  • additional issues due to optimism:
  • infinite loops: one has to ensure that the intra-node (binary) search terminates
    in the presence of concurrent modifications
  • segmentation faults/invalid behavior: a pointer read from a node may be
    invalid (additional check needed)
  • OLC is a good technique for B-trees
23
Q

B-link tree

A
  • B-tree synchronization with only a single lock at a time (no coupling)
  • observation: as long as there are only splits (no merges), the key that is
    being searched may have moved to a right neighbor
  • solution: add next pointers to inner and leaf nodes, operations may have to
    check neighbor(s)
  • fence keys may help determining whether it is necessary to traverse to
    neighbor
  • the B-Link tree idea was very important when data was stored on disk but is
    also sometimes used for in-memory B-tree synchronization (
24
Q

B-tree Contention Split

A

for skewed workloads, lock contention at some lock can become a problem
* contention may be caused by page granularity rather than one very hot entry
* contention split optimization: if lock cannot be acquired on first try, with a
certain probability split node even if it is not full

25
Q

Bw tree

A

Latch-free B+Tree index
→ Threads never need to set latches or block.
Key Idea #1: Deltas
→ No updates in place
→ Reduces cache invalidation.
Key Idea #2: Mapping Table
→ Allows for CAS of physical locations of pages.

26
Q

Bw tree

A

Latch-free B+Tree index
→ Threads never need to set latches or block.
Key Idea #1: Deltas
→ No updates in place
→ Reduces cache invalidation.
Key Idea #2: Mapping Table
→ Allows for CAS of physical locations of pages.

27
Q

bw-tree: structure modifications

A

Split Delta Record
→ Mark that a subset of the base page’s key range is now
located at another page.
→ Use a logical pointer to the new page.
Separator Delta Record
→ Provide a shortcut in the modified page’s parent on what
ranges to find the new page.

28
Q

Lock-Free Hash Table: Split-Ordered List

A

all entries are stored in a single lock-free list (see lock-free list-based set)
sorted by the hash value of the key

29
Q

Recursive Split Ordering

A

key idea: store keys ordered by bit-wise reverse hash

30
Q

Deletion

A
  • problem: deleting a node using CAS pointed to from a bucket does not work
    (because it is also being pointed to from the list)
  • solution: for every split bucket, insert a special sentinel node into the list