Memory consistency Flashcards

1
Q

What is memory consistency?

A

A problem that occurs between different programs when you have multiple cores active.

A contract between the programmer and the hardware.

Is a set of rules that dictates how observable memory operations are performed. This means, that when you have designed a program, how does it translate to actual memory execution.

The programmer must be aware of this set of rules when creating programs.

Hardware designers must also follow these rules when designing out-of-order cores.

Consistency is NOT optional, and is not an optimisation feature.

Consistency models are NOT the same between different ISAs, meaning it is an inate part of the ISA and dictates how a program can e restructured during execution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between cache coherence and memory consistency?

A

Memory consistency is a different multi-core problem, whereas coherence deals with making sure the memory has the correct value.

Consistency is NOT optional, and is not an optimisation feature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Who must follow the memory consistency rules?

A

Hardware designers when designing ooo cores

Programmers when creating programs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is memory consistency needed?

A

Ensures we have synchronisation between programs executing in parallel. When programs are executing across cores, we do not know exactly when things are happening. Memory consistency helps avoiding problems such as race conditions.

Ensures ability to execute atomic operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What would happen to programs if we did not have memory consistency?

A

Programs would suffer from race conditions more, and ambiguity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is observability in memory systems?

A

When executing a program on a core in a multicore system, some memory operations will be visible to other threads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What memory operations are visable?

A

Shared cache line:
- readings from the cache line is visable

All writes are visible as it changes memory values. Even if it is not in a cache line that is shared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What difference in the goals of a SW and HW designer?

A

A SW designer wants to write programs that are correct and optimised. A typical optimisation is introducing parallelism, which comes with new problem areas.

A HW designer wants to execute memory operations when these are ready, but not necessarily in order (long latency, etc.). This means that memory operations might not execute in the program code order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is memory order?

A

Order of executed observable memory operations in the actual core.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is program order?

A

Order of instructions as they appear in the machine code.

Defines the programmers intention for program execution/purpose. This means that if a programmer puts a load before a write, they expect the load to happen before the write.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What limits the reordering of memory operations?

A

Consistency rules and ready instructions.

An instruction will execute when it is ready, if it is legal according to the consistency rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Sequential Consistency model (SC)

A

The baseline

Memory operations are performed in program order.

When having multiple processes, these are ordered in respect to each other - in some arbitrary order.
Within a program, all operations are executed in order. And all programs are ordered in regards to each other, in some arbitrary order.

No processes are performing memory operations at the same time.

Only one valid ordering (translation from program to execution order), when considering a single program/thread. And that is the program order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is sequential consistency at a local level?

A

Every operation executes as ordered in the program code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is sequential consistency at a global level?

A

There are some ordering of processes in regards to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you quickly define a memory consistency model?

A

The translation from program order to execution order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are ordering rules?

A

When talking about consistency models, we have the 4 following rules:

Read followed by read
Read followed by write
Write followed by read
Write followed by write

Notation:
R0, R1: 0 - first operation, 1 - second operation
<p: Before in program order
<m: Before in memory order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are ordering rules for sequential consistency?

A

R0 <p R1 -> R0 <m R1
R0 <p W0 -> R0 <m W0
W0 <p R0 -> W0 <m R0
W0 <p W1 -> W0 <m W1

No reordering of execution. A rule says observable operations must happen in program order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why is talking about SC important?

A

It is trivial ti implement
- all mem-ops are in program order
- all updates become fully visable before a value is read

Very intuitiv, it does what the programmer “expects” will happen

Very low performance - very limited when we can execute mem-ops. Not just when they are ready.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When talking about consistency models, what are valid outcomes?

A

The possible outcomes of a program, based on the consistency model.

20
Q

What happens if we relax one or more of the rules, as they are defined in the SC model?

A

Reduces intuition of going from program to memory order. Can cause wrong outputs or race conditions.

Potentially improves performance. Introduces the ability to potentially issue operations earlier on (when ready).

21
Q

What is a Total Store Order (TSO) consistency model?

A

Relaxing the Write-Read rules. Maintains the store order.

W0 <p R0 -> W0 <m R0 # This does no longer need to apply

Allows for following reads to issue before preceding writes - assuming no dependence.

22
Q

What is a Partial Store Order (PSO) consistency model?

A

Relaxes both Write-Read, and Write-Write rules. No longer guarantee for writes happening in order.

W0 <p R0 -> W0 <m R0
W0 <p W1 -> W0 <m W1

Race conditions can occur here

23
Q

What is a Weak Ordering (WO) / Release Consistency (RC) consistency model?

A

Relaxes all of the rules. Give no guarantees of the execution ordering of memory operations.

Any ordering is legal, unless explicitly synchronised

There are multiple variations of WO, this is just the generic name

24
Q

What consistency model is most wide-spread in personal computers?

A

TSO and x86 ISA

25
Q

Why would we want to relax the Write-Read ordering?

A

Remove the W0 <p R0 -> W0 <m R0 rule

Allows reads to be issued, while writes to different addresses have not yet completed.

Effective way to hide write latency, as writes are not on the critical path. Reads are however, because they fetch new data into the system, whereas writes handles old data.

Note: If W0 <p W1 is still enforced, not all write latency can be hidden.

26
Q

Why would we want to relax Write-Write ordering?

A

Can hide more write latency as writes can be issued while writes to different addresses have not yet completed.

27
Q

What is the advantage of PSO?

A

Both write-read and write-write are relaxed

Can now empty stores from buffer in any order

Must still generate all older addresses to prevent issues with aliasing

28
Q

Why would we want to relax read-write and read-read?

A

Allows for even more MLP

These are most effective when other rules are also abolished

29
Q

What happens if we also relax read-write and read-read?

A

Explicit sunch is always necessary for coordinated program execution

30
Q

Why is a weaker consistency model better for hardware design?

A

Can have more MLP, but makes it difficult for the programmer

31
Q

What are memory barriers?

A

Special operations that enforce an ordering within the CPU

Serialises all memory instructions before and after the barrier

All mem-ops before the barrier completes before the barrier instruction complete. No mem-ops after the barrier are initiated before the barrier has completed

32
Q

What are some downsides with introducing barriers?

A

Force synchronisation - don’t have as liberal ordering as before, so less possible performance benefits (MLP).

They typically force FULL serialisation, which can result in the SC model

33
Q

What is Release Consistency (RC)?

A

Part of group of Weak Orderings (WOs)

Have more fine grained primitives. Can synchronise around specific operations - the acquire and release operations.

These operations are barriers that enforce specific ordering.

With release and acquire, we get a new, bigger set of rules.

34
Q

In RC what does Acquire do?

A

Getting permission from other processors for subsequent memory accesses

Previous memory accesses can be overlapped, but next memory access has to follow a release from somewhere else.

35
Q

In RC what does Release do?

A

Giving permission to other processors for previous memory accesses.

Subsequent memory accesses can be overlapped

36
Q

What is a trend in programming models, in regards to correctness?

A

The programming model guarantee correct execution for data-race free programs

However, if a program contains race conditions, there are no guarantees for execution. Meaning the programmer must implement explicit synchronisation themselves to achieve correctness.

37
Q

Why do we need synchronisation?

A

It is central in all kinds of parallelism
- synch access to resources
- order events from cooperating processes correctly
- used for shared memory programming

38
Q

How is synchronisation implemented in smaller multiprocessor systems?

A

Using uninterrupted instruction(s) atomically accessing a value

This requires special hardware support

Simplifies construction of OS / parallel applications

39
Q

What are the pros and cons of sequential consistency?

A

Pros:
- ensures program order and write atomicity
- intuitive and easy to use

Cons:
- No optimisations and bad performance

40
Q

What are the pros and cons of relaxed consistency?

A

Pros:
- Enables more optimisations and better performance
- Wide variety of models offers maximum flexibility

Cons:
- Does not ensure program order
- Added complexity for programmers and compilers

41
Q

What are atomic operations?

A

Special instructions that are used to guarantee execution semantics. This means that they differ from normal instructions by having this added guarantee about what is happening in the system.

Guarantee execution without interference from other programs.

42
Q

What are two examples of atomic operations?

A

Load-link / store-conditional

Atomic swap

43
Q

What is Load-link / Store-conditional

A

Used in a sequence: first LL then SC

If memory location accessed by LL is written to, SC fails

If contect switch between LL and SC, SC fails

44
Q

How is LL / SC implemented?

A

Using a special link register.

This register contains the address used in LL.

This is reset (to zero) if matching cache block is invalidated or if we get an interrupt.

SC checks if the link register contains the same address. If so, we have atomic execution of LL & SC

The store is conditional, on the link load not being messed with

45
Q

What is atomic exchange (swap)?

A

Swaps value in register for value in memory

If mem = 0, not locked
Mem = 1, locked

Sets the register to 1 -> means the processor wants the lock.
Then perform an exchange operation:
Exchange(register, mem)

If register ends up being 0 we have a success. Mem was 0 and is now 1.

If register ends up being 1 we failed. Mem was 1 and was locked. Still locked because mem is still 1.

The exchange must always be atomic.