Collected Concepts Flashcards

Question

Bus snooping

Answer 1

* A bus attached to every cache * Observes all the transactions on the bus and is able to modify the cache independently of the core * Takes action on seeing pertinent transfers between another core's cache and memory * Able to send messages to other caches

Answer 2

1. Accesses to be made to main memory 2. Messages to be sent between caches 3. Changes between the valid states

Answer 3

* 2 types of messages * Read requests for a cache line * Invalidates of other cache's line * Can be extended beyond 2 cores * Any core with a valid value can respond to read request * Can only be one Dirty and everything else should have Invalid

Answer 4

* Cache coherence implies that all cores see the invalidate at the same time (in the same bus cycle) * The more cores connected to the bus, the more difficult this will get * With more cores: * Bus will get physically longer - signals take longer * Bus will have more capacitance, slowing it down * If the bus cycle is slowed, it impacts the performance * The coherence protocol is a major limitation to the number of cores which can be supported

Answer 5

* Derived protocol from a consideration of uniprocessor cache which include Valid * The result is MSI Protocol * **M**odified - Only valid copy in cache and its different from memory * **S**hared - valid, but other caches might have it and its value is same as memory * **I**nvalid - out of date and can't be used

Answer 6

* Extension to MSI protocol, which is used more widely due to significant effect on bus usage * Shared (Valid) state split into two states * **M**odified * **E**xclusive(unshared) - means there are no other copies in other caches. * **S**hared * **I**nvalid * Results in less use of bus bandwidth

Answer 7

* Further optimisation of MESI * Split Modified: * **M**odified * **O**wned - Cache contains a copy which differs from memory and there are copies in other caches which are tagged as shared * Allows the latest value to be shared without having to write it back to memory

Answer 8

* Coherence scheme without a bus (for example a grid) * Centralised directory which holds information about every value(every cache line) in the memory * Drawbacks * Takes significant number of CPU cycles due to lack of bus communication * Due to longer delays, usually 'handshakes' are needed to work correctly * Unlikely solution for heavy used shared memory

Answer 9

* Each directory entry contains: * Information on which core has a copy * Whether-or-not the copy is dirty * A core wishing to make a memory access may need to query the directory about the state of the line to be accessed

Answer 10

* A place where number of threads "meet up" * When all threads reach it, they can all proceed * But threads need to wait until the last arrives * Very good for data parallelism

Answer 11

* Used to achieve "correctness" * If two pieces of code take a lock on the same object, one should start and complete before the other starts * Results in the normal (sequential) meaning of the chunks of code preserved * Only one thread can lock any particular object at a time

Answer 12

* We have sequential consistency if (both): * Method calls appear to happen in a one-at-a-time sequential order, and * method calls appear to take effect in program order

Answer 13

* Deadlock - two or more competing actions are each waiting for the other to finish, and thus neither ever does. * Mistaken use of conditions (the "Lost WakeUp" Problem)

Answer 14

* Code which depends on obtaining a lock is too large * Results in limited parallelism.

Answer 15

* Code which depends on obtaining a lock is too small * Results in a lot of work obtaining and releasing locks, hence the program is harder to write

Answer 16

* Hardware support construct * A single shared boolean variable S to 'protect' a shared resource * S == 1 means resource is free * S == 0 means resource is in use * Operations (atomic) * wait(S) : wait until S!=0 then S=0 * signal(s): set S to 1 * Maybe cached, hence for 'indivisible' behaviour might require coherence operations in the cache

Answer 17

* Simple solution on older processors * tas R2 * If memory location addressed by R2 is 0 * Set it to 1 and set the processor 'zero' flag * else clear zero flag

Answer 18

* Returns the value of a memory location and increments it * Atomic instruction * Read-Modify-Write instruction (RMW)

Answer 19

* Compare a memory location to a value (in register) and swap in another value if they are equal * Atomic instruction * Read-Modify-Write instruction (RMW)

Answer 20

* Synchronisation mechanism used in modern RISC processors * Uses separate load and store instructions (similar to ordinary load and store) * However, have additional effects on processor state which allow them, as a pair, act atomically * Knowing that anything between LDL and STC has executed atomically is very powerful for certain things

Answer 21

* ldl R1, R2 * Loads R1 with value addressed in memory by R2 * Sets a 'load linked flag' on the core that executes it * Records address in a 'locked address register' on the core which executes it

Answer 22

* stc R1, R2 * Stores the value in R1 into the location addressed in memory by R2 * Only succeeds if the 'load linked flag' is set * Value of 'load linked flag' is returned in R1 * 'Load linked flag' is cleared

Answer 23

* This state is applied on a core which has it set changed if a write (from another core) occurs to the locked address and, hence, an invalidate message is sent * Detected by comparison with 'locked address register' - processor must monitor (snoop) the memory address bus to detect this * This will occur because a write has occurred to the shared variable and, hence, another core has probably got the semaphore

Answer 24

* Waiting for a lock by sitting in the loop

Answer 25

* Used to create parallel programs for shared memory computers * Goals * Parallel performance * Sequential equivalence * Incremental parallelism

Answer 26

* Having identified part of sequential program which would benefit * Annotate it so that multiple threads execute in this parallel region * Join up again at the end * Variables are _shared_ before parallel region, whereas those within are _private_(local to thread) * Code block following the region must be a structured block (has single point of entry(at top) and exit(at bottom))

Answer 27

* Way of sharing work between threads * Process: * Find a time-consuming loop * Restructure, to make iterations independent of each other * Split parts of the 'iteration space' to different threads

Answer 28

* flush - used to ensure memory consistency between threads * critical - indicates a critical section (can only be executed by one thread) * barrier - can be added explicitly * All done using pragmas

Answer 29

* Users can affect the scheduling of loop-iterations to threads (which iterations to give to thread) and whatever to do it statically or dynamically * #pragma omp parallel for schedule(dynamic,10) * Says the iteration space should be broken into chunks of 10 iterations, and each thread takes one chunk at a time and come back for more when done * A static allocation would divide the chunks evenly between the threads at the start

Answer 30

* Most widely used API for parallel programming * Library for C & Fortran * Used to program wide range of architectures * Distributed memory machines * MPPs * Shared memory machines

Answer 31

* Messages are identified by * sending process * receiving process * integer tag * To avoid confusion these are defined with a communication context * Processes are grouped - initially with a single group but can split later * Communication context + process group forms a communicator (MPI\_COMM\_WORLD)

Answer 32

* If there are idle resources then we might as well do something with them speculatively (do work which is useful) * Incorrect speculation uses resources that otherwise would be idle (so no waste, except power)

Answer 33

* A technique for running fully serial program in parallel * Principle: * Divide single-threaded code into separate threads * Run threads in parallel * Detect any problems and handle them

Answer 34

* Each data item in each thread has a tag * Not assessed (N) * Modified (M) * Speculatively loaded (S) * Speculatively loaded and later modified (SM) * In practice more work on each memory access

Answer 35

* Execute serial code and when we encounter a procedure call(method/function) split the execution into two threads * Hardware Support * every core has a cache which can be used as the speculative data buffer * Snooping protocols can be used to perform reading and conflict detection operations * But cache is limited * Process * Procedure body is executed in main thread * Code beyond call is executed speculatively * Threads re-join when call is finished * Validation is done * Success - speculative thread continues as main thread * Failure - code after call is re-executed in old main thread

Answer 36

* Provides implementation of transactions within a parallel data sharing context * Promises to provide a simplified programming model when parallel threads share updateable memory (i.e. to replace locks and barriers) * Performance not good for programs with a significant amount of sharing

Answer 37

* Atomicity (applicable to TM) * Consistency * Isolation (applicable to TM) * Durability

Answer 38

* Indivisible (atomic) * Executes either as a whole or not at all * While executing, none of its changes can be observed from outside * When it completes, any changes will become apparent outside and they will all become apparent at the same time * All the changes are simultanious

Answer 39

* Writing correct code with locks is tricky and transaction should be simpler * Locks are not composable * Users need to know about locks to avoid deadlocks * Transactions shouldn't suffer from this

Answer 40

* Transaction algorithm: * Starts * Reads some shared variables * Possible writes some shared variables * Finally attempts to commit * Succeeds if there is nothing wrong * Fails (abort) if it cannot commit - and then the entire transaction must be repeated

Answer 41

* If a variable it has read has since been written (R-W clash) * If a variable it has written to has been written to (W-W clash)

Answer 42

* Keep track of all the shared variables it has read - its **readset** * Keep track of all the shared variables to it has written - its **writeset** * Uses both to determine clashes with other transactions

Answer 43

* If the transaction writes to a variable and later reads it, it needs to see the value it wrote, not the original value * If the transaction aborts, the original value needs to still be there * Two types to implement: * Eager Versioning/Direct Update * Lazy Versioning/Deferred Update

Answer 44

* Change the shared variable, but keep a private log of the original version * On commit, throw away the log * On abort, restore the original value from log * Trush clash detection to preserve isolation * Efficient if aborts are rare

Answer 45

* Keep a private version of everything shared that a transaction changes * Reads should refer to these private versions * On **Commit**, copy private version values into main memory * On **Abort**, throw away private versions * Efficient if aborts are common (not rare!)

Answer 46

* Process of checking that transactions do not clash * Two ways: * Lazy validation * Eager Validation

Answer 47

* Do the validation when the transaction tried to commit * At this stage all the reads and writes to shared variables are known * Conflict detection needs to do is work out whether it would be OK to commit this transactions result

Answer 48

* Every write operation to a shared variable by a transaction can trigger a check for clashing transactions amongst those currently running * After identifying which clash * Choose which one to abort * Or delay some instead * Probably more work - but aborting transactions which would eventually fail anyway saves wasted effort

Answer 49

Shared variables can only be accessed within transactions

Answer 50

Shared variables can be used outside transactions as well as inside

Answer 51

* Declare sections of the code to be 'atomic' (a transaction) * When in transaction * Keep note of all addresses loaded from (readset) * Buffer all writes, keep local values, do not write them to main shared memory (writeset) * If it's detected that another thread writes to something in the readset (conflict), abort transaction and restart * If end of transaction is reached successfully commit writeset to main memory

Answer 52

* Cache can store both writeset and readset * Need extra tags to cache entries * Modified writeback protocol - writes of transactional data are only made from cache to main memory if transaction commits * Snooping protocol is modified * When writing a transactional variable an invalidate is broadcast as usual * Any core seeing an invalidate which matches one of its readset entries must invalidate all its transactional data, abort the transaction and restart it * When transaction terminates, can commit (flush) its writeset to main memory * Issues * Limited cache size - limited readset/writeset * Risk of livelock

Answer 53

* Large number of optimisations * As yet no practical implementations, but simulated with encouraging results.

Answer 54

* Each core still has a local cache in which it stores its readset during transaction, with tags to differentiate from other cached data * Writes are stored in a separate buffer (can be RAM-based queue) * Writes to shared data only occur in transactions - buffered locally and there is no need to broadcast invalidations * Transaction reaching its end successfully must commit its data to main memory * before doing this, broadcasts all the address in its write buffer to all other cores * if they are executing a transaction, must compare all the address in the packet with their readset in cache * If there are any matches - associated comparing transaction must abort and restart * No two transactions must be trying to cmiit conflicting data at the same time * Central 'permission to commit' resource could be a bottleneck * Lazy validation in place * Has no livelock * Less synchronisation due to inter-core communication only at transaction commit

Answer 55

the consistency of shared resource data that ends up stored in multiple local caches

Collected Concepts Flashcards

(79 cards)