Parallel Processing Flashcards

Question

what is compiler analysis

Answer 1

identifies shared data that could cause inefficiencies

Answer 2

prevents unsafe data from being cached

Answer 3

disbale caching for all shared variables

Answer 4

allow caching except during critical periods

Answer 5

reduces hardware complexity but may lead to inefficient cache utilisation

Answer 6

detect & resolve inconsistencies dynamically at run time

Answer 7

1. more efficient cache usage 2. transparent to programmers & compilers 3. improves system performance by handling inconsistencies only when necessary

Answer 8

1. directory protocols 2. snoopy protocols

Answer 9

centralised tracking of cache states

Answer 10

each cache monitors memory transactions

Answer 11

1. tracking cache state 2. exclusive access request 3. invalidation process 4. handling read requests

Answer 12

a centralised controller in main memory maintains a directory with cache state information

Answer 13

before modifying a cache line, a processor must request exclusive access from the controller

Answer 14

the controller ensures coherence by invalidating copies other cache before granting access

Answer 15

when another processor requests a line that is exclusively held, the controller triggers a write-back to memory

Answer 16

- efficient for large systems - large scale multiprocessor systems with complex instructions to prevent inconsistencies

Answer 17

1. distributed coherence management 2. broadcast mechanism 3. write-invalidate protocol 4. write-update protocol

Answer 18

each cache controller monitors memory transactions to maintain consistency

Answer 19

updates to a shared cache line are announced to all caches via a broadcast

Answer 20

- multiple caches can read a shared line - a write operation invalidates copies in other caches before processing - ensures only one writer at a time

Answer 21

- allows multiple readres & writers - updates are broadcasted so all caches holding the copy can modify it

Answer 22

- write-invalidate protocol - snoopy based cache coherence

Answer 23

M - modified E - exclusive S - shared I - invalid

Answer 24

modified exclusive shared

Answer 25

- processor requests a cache line that is not in the local cache - bus snooping = other caches check if they have the requested line

Answer 26

1. exclusive -> shared - another cache has a clean copy & downgrades it to shhared 2. shared -> shared - multiple caches have a clean copy 3. modified -> shared - a modified cache provides the line 4. invalid -> exclusive - if no cache has the line, fetched from meproy

Answer 27

1. read hit occurs: processor finds the requested data in local cache 2. direct access: data is read immediately without accessing the bus 3. no state change: the cache line remains in its current state

Answer 28

- processor issues a read-with-intent-to-modify request - cache line is fetched from memory and marked as modified

Answer 29

1. another cache has a modified copy 2. no modified copy exists

Answer 30

- that cache writes the modified line back to memory - it invalidates its copy, allowing the initiating processor to fetch and modify the line

Answer 31

the processor reads the line from memory, modifies it, and invalidates shared/exclusive copies in other caches

Answer 32

1. shared 2. exclusive 3. modified

Answer 33

- processor requests exclusive ownership - other caches invalidate their shared copies - the line transitions to modifies and the update occurs

Answer 34

- the processor already owns the line exclusively - it transitions from exclusive to modified and updates the data

Answer 35

- the processor already has exclusive ownership and has modifies the line - it simply updated the data without any state change

Answer 36

L1 caches do not directly connect to the bus, so they cannot use snoopy protocols like L2 cache

Answer 37

extend coherence protocol to L1 caches - L1 cache lines include state bits to track L2 cache states - use a write-through policy: - writes to L1 are forwarded to L2 - ensures L2 caches remain updated and visible to the coherence protocol

Answer 38

f x IPC where: - f = clock frequency - IPC = instructions per cycle

Answer 39

1. ^ f 2. ^ IPC

Answer 40

faster cycles per second = more instructions executed

Answer 41

1. instruction pipelining - overlap execution stages 2. employ superscalar architectures with multiple pipelines 3. optimise instruction scheduling to execute instructions out of order 4. predict and pre-execute instructions to avoid stalls

Answer 42

- divides the instruction stream into multiple threads - threads executed in parallel for ^ efficiency - enables ^ instruction-level parallelism - low added hardware & power cost

Answer 43

- a running instance of a program - owns resources: memory, files, I/O devices - has an execution state

Answer 44

- a lightweight unit of execution within a process - shares process resources - has its own execution context - PC, stack

Answer 45

involves saving/restoring all process data

Answer 46

only updates thread-specific data

Answer 47

1. process 2. thread - sequence of instructions 3. instruction

Answer 48

1. process - microsoft word 2. thread - spell checking, auto-saving 3. instruction - store 'A' at memory location

Answer 49

1. interleaved multithreading (fine-grained) 2. blocked multithreading (coarse-grained) 3. simultaneous multithreading (SMT) 4. chip multiprocessing (multicore)

Answer 50

- multiple thread contexts handled simultaneously - switches to a different thread at the end of every clock cycle - hides latency by keeping the processor bus - if thread is blocked due to data dependencies or memory delays -> other ready thread selected

Answer 51

in order processors to ^ efficiency

Answer 52

- in interleaved multithreading, thread switches occur with zero cycles delay - this is possible because there are no control or data dependencies between threads - interleaved multithreading improves processor utilisation but a single thread takes longer to complete -multiple threads compete for cache resources, increasing the probability of cache misses

Answer 53

- executes instructions from a single thread continuously - thread runs until a long-latency event occurs, then processor switches to another thread - v processor idle time due to stalls - v thread switching

Answer 54

in order processors that would otherwise stalls

Answer 55

in blocked multithreading, a thread may require one cycle, as a fetched instruction could trigger the switch & need to be discarded

Answer 56

- allows multiple threads to issue instructions in the same cycle - each cycle, instructions from different threads are scheduled in parallel - maximises CPU resource utilisation - threads execute simultaneously

Answer 57

modern CPUs

Answer 58

- multiple cores integrated on a single chip - each core executed its own thread simultaneously - threads assigned to separate cores - ^ performance without ^ pipeline complexity - efficient use of chip logic area for parallel execution

Answer 59

modern CPUs - Intel core, AMD Ryzen, ARM processors

Answer 60

1. single-threaded scalar 2. interleaved mulithreading scalar 3. blocked multithreading scalar

Answer 61

- superscalar processors can issue multiple instructions per cycle - instructions from a single thread are issued in a cycle - not all execution slots are utilised = inefficent - horizontal loss: > max instructions are issued in a cycle - vertical loss: no instructions are issued in a cycle due to stalls or depedencies

Answer 62

- very long instruction word (VLIW) architecture places multiple instructions in a single instruction word - compiler decides which instructions can execute in parallel at compile time - VLIW CPUs do not schedule instructions dynamically - NO-OPs are used to fill instruction words - v hardware complexity BUT wastes execution slots

Answer 63

- SMT has > level of instruction-level parallelism - CMP each core are limted to executing instructions from 1 thread - CMP has > performance with same instruction issue campability - combination = ^ efficiency

Answer 64

- alternative to SMP for high performance and availability - a cluster consists of multiple interconnected computers (node) working together - the system appears as a single machine to users & applications

Answer 65

server applications & large-scale computing

Answer 66

1. absolute scalability 2. incremental scalability 3. ^ availability 4. superior price/performance

Answer 67

clusters can scale to hundreds or thousands of machines, surpassing standalone systems

Answer 68

nodes can be added gradually, allowing seamless expansion without major upgrades

Answer 69

if a node fails, the system continues running, ensuring fault tolerance and reliability

Answer 70

clusters use commodity hardware to achieve high performance at lower cost than a single large machine

Answer 71

1. no shared disk 2. shared disk

Answer 72

- interconnection is a high-speed link used for message exchange - link can be a shared LAN or dedicated interconnection facility - some cluster nodes may also connect to external networks for client communication - each computer can be a multiprocessor, improving both performance & availability

Answer 73

- allows multiple nodes to access a common disk subsystem - nodes are still interconnected via a message link for coordination - shared disk = RAID system to prevent single points of failure - RAID or similar redundant storage ensures high availability in case of disk failures

Answer 74

1. passive standby 2. active secondary 3. seperate servers 4. servers connected to disks 5. servers share disks

Answer 75

secondary server takes over on failure

Answer 76

secondary server also handles processing

Answer 77

each server has its own disks; data is copied from priamry to secondary

Answer 78

servers are cabled to the same disks, with ownerhsip transfer on failure

Answer 79

multiple servers access the same disk

Answer 80

1. failure managament 2. load balancing 3. parallelising computation

Answer 81

1. highly available clusters 2. fault-tolerance clusters

Answer 82

ensures a high probability that resources remain in service - if a system or disk fails, queries in progress are lost - lost queries can be retried in another node - no guarantee on the state of partially executes transactions

Answer 83

ensure all resources are always available - use redundant shared disks - implement mechanisms to roll back uncommitted transactions and commit completed ones

Answer 84

- distributes workloads evenly and prevents bottlenecks - clusters should be scalable, integrating new nodes seamlessly - new nodes must be automatically included in scheduling - middleware

Answer 85

- device migration of services between nodes - execution of services across multiple nodes

Answer 86

1. parallelising compiler 2. parallelised application 3. parametric computing

Answer 87

- determines parallel sections at compiler time and distributes them across nodes - performance depends on problem complexity and compiler efficiency - difficult to develop

Answer 88

- programmer writes code explicitly for a cluster, using message passing for data exchange - provides fine control but increases development complexity

Answer 89

- runs the same algorithm multiple times with different parameters - requires parametric processing tools for job management.

Answer 90

- individual components connected via a high-speed LAN or switch - each node can operate independently but works together as a unified system - middleware layer enables cluster functionality - provides a single-system image to users - ensures high availability through load balancing and failure response

Answer 91

an example of cluster implementation - houses multiple server modules in a single chassis - used in data centres to save space & ^ system management - chassis provides power, blade = CPU, memory & storage - v physical footprint & optimise network efficiency in large-scale environments

Answer 92

1. superior scalability 2. higher availability with redundancy 3. more adaptable for future high-peformance needs

Answer 93

all processors access memory with equal latency

Answer 94

- memory access time depends on which region is accessed - different processors experience different memory latencies

Answer 95

- a NUMA system that maintains cache coherence among processors - distinct from SMP & clusters

Answer 96

1. limits due to ^ bus traffic 2. ^ processors = ^ cache-coherence & memory traffic 3. shared bus becomes a bottleneck

Answer 97

clusters & NUMA

Answer 98

- use private memory per node, avoiding shared-memory bottlenecks - software-based coherence enables scalability - fault-tolerant and cost-effective for large scale computing - scales well but lacks global memory view

Answer 99

- provides a global memory view but with variable access latencies - allows scalable multiprocessor systems without shared-bus bottleneck

Answer 100

- composed of multiple SMP-based nodes, each with its own processors and memory - nodes are interconnected via a high-speed interconnect - processors see a single system-wide memory address space

Answer 101

automatic & transparent - local memory acesses - node's memory bus - remote memory accesses - interconnect

Answer 102

- each node contains a directory tracking memory location & cache states - directory ensures consistent data access

Answer 103

1. > parallelism than SMP 2. v bus traffic per node 3. performance issues can be mitigated by: efficient caching, spatial locality & page migration

Answer 104

1. not fully transparent 2. availability concerns

Answer 105

1. broad network access 2. rapid elasticity 3. measured service 4. on-demand self-service 5. resource pooling

Answer 106

accessible over a network via diverse devices

Answer 107

resources scale up or down dynamically based on demand

Answer 108

usage is metered, monitored and reported for transparency

Answer 109

users can provision resources automatically without human intervention

Answer 110

multi-tenant model where resources are dynamically allocated based on demand

Answer 111

1. public 2. private 3. community 4. cloud

Answer 112

open to the public or industry groups, owned by a cloud provider

Answer 113

- cost-effective - minimal managment overhead

Answer 114

security concerns

Answer 115

hosted with an organisations internal IT environment

Answer 116

- greater security - control over data

Answer 117

- higher cost - increased management responsibility

Answer 118

shared by multiple organisations with common requirements

Answer 119

- controlled data exchange - regulatory compliance

Answer 120

- higher cost than public cloud - limited scalability

Answer 121

combination of two or more cloud models, allowing data portability

Answer 122

- securtity & cost benefits of private cloud

Answer 123

complex integration & managment

Parallel Processing Flashcards

Week 2.9 (151 cards)