Parallel Processing Flashcards

Week 2.9 (151 cards)

1
Q

what is a symmetric multiprocessor (SMP)

A

enhances processing power by using multiple processors under a unified OS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

5 key characterisitics of an SMP

A
  1. 2 or more processors of similar capability
  2. Shared main memory and I/O facilities
  3. Equal memory access time for all processors
  4. All processors perform the same task
  5. A single OS manages processors for seamless interaction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

4 advantages of SMP

A
  1. performance
  2. availability
  3. incrementral growth
  4. scaling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

explain how performance is an advantage of
SMP

A

if tasks can be executed in parallel, an SMP system delivers higher performance than a single-processor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

explain how availability is an advantage of SMP

A

failure of a single processor does not halt the system; it continues running at reduced performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

explain how incremental growth is an advantage of SMP

A

performance can be enhanced by adding more processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

explain how scaling is an advantage of SMP

A

vendors can offer systems with different performance levels by varying the number of processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

describe the organisation of SMP

A
  • multiple processors, each with its own CU, ALU, registers & cache
  • shared main memory & I/O devices accessible via interconnection mechanism
  • processors communicate through share memory or direct signals
  • some configurations include private memory & I/O channels for each processor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the time-shared bus in SMP

A
  • a simple & common multiprocessor interconnection mechanism
  • shared control, address & data lines facilitate communication
  • supports DMA transfers
  • multiple processors & I/O modules compete for shared memory access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the 3 key features that support DMA transfers with the time-shared bus

A
  1. addressing - identifies data sources & destinations
  2. arbitration - resolves competing bus access requests
  3. time-sharing - one module controls the bus at a time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

advantages of bus organisation

A
  1. simplicity - same logic as single proccessor
  2. flexibility - easy to expand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the main challenge of bus organisation

A

performance bottleneck
- all memory references go through a shared bus, limiting speed
- caches (L1, L2, L3) reduce bus traffic BUT cache coherence problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

5 design considerations of SMP

A
  1. concurrent execution
  2. scheduling
  3. sychronisation
  4. memory management
  5. fault tolerance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

why is concurrent execution a design consideration of SMP

A

OS must handle multiple processors running the same routines simultaneously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

why is scheduling a design consideration of SMP

A

assigning ready processes to available processors without conflicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

why is synchronisation a design consideration of SMP

A

ensuring orderly access to shared memory & I/O resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

why is memory management a design consideration

A

coordinating paging across processors & utilising multiported memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

why is fault tolerance a design consideration of SMP

A

handling processor failures gracefully to maintain system stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is the problem with cache coherency in multiprocessor systems

A

inconsistent data across caches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is a write-back policy

A

updates only in cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is write-through policy

A

updates both cache & main memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is the solution to inconsistent data across caches

A

cache coherency protocols

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are the 2 types of cache coherence protocols

A
  1. software-based
  2. hardware-based
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what are the 2 types of software-based cache coherence protocols

A
  1. compiler analysis
  2. OS & hardware enforcement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what is compiler analysis
identifies shared data that could cause inefficiencies
26
what is OS & hardware enforcement
prevents unsafe data from being cached
27
what is the basic approach to software-based cache coherence
disbale caching for all shared variables
28
what is the optimised approach to software-based cache coherency
allow caching except during critical periods
29
what is the software-based cache coherency protocols trade-off
reduces hardware complexity but may lead to inefficient cache utilisation
30
what is hardware-based cache coherence
detect & resolve inconsistencies dynamically at run time
31
what are the 3 advantages of hardware-based cache coherence
1. more efficient cache usage 2. transparent to programmers & compilers 3. improves system performance by handling inconsistencies only when necessary
32
what are the 2 types of hardware-based cache coherency protocols
1. directory protocols 2. snoopy protocols
33
define directory protocols
centralised tracking of cache states
34
what are snoopy protocols
each cache monitors memory transactions
35
what are the 4 aspects of directory-based cache coherence
1. tracking cache state 2. exclusive access request 3. invalidation process 4. handling read requests
36
how does directory-based cache coherence track cache states
a centralised controller in main memory maintains a directory with cache state information
37
how is an exclusive access request handled in directory-based cache coherence
before modifying a cache line, a processor must request exclusive access from the controller
38
what is the invalidation process in directory-based cache coherence
the controller ensures coherence by invalidating copies other cache before granting access
39
how does directory-based cache coherence handle read requests
when another processor requests a line that is exclusively held, the controller triggers a write-back to memory
40
when to use directory-based cache coherence
- efficient for large systems - large scale multiprocessor systems with complex instructions to prevent inconsistencies
41
what are the 4 main aspects of snoopy based cache coherence
1. distributed coherence management 2. broadcast mechanism 3. write-invalidate protocol 4. write-update protocol
42
what is distributed coherence managment
each cache controller monitors memory transactions to maintain consistency
43
what is the broadcast mechanism in snoopy cache coherence
updates to a shared cache line are announced to all caches via a broadcast
44
what is a write-invalidate protocol
- multiple caches can read a shared line - a write operation invalidates copies in other caches before processing - ensures only one writer at a time
45
what is a write-update protocol
- allows multiple readres & writers - updates are broadcasted so all caches holding the copy can modify it
46
what is MESI an example of
- write-invalidate protocol - snoopy based cache coherence
47
what does MESI stand for
M - modified E - exclusive S - shared I - invalid
48
what state could a valid cache line be in MESI
modified exclusive shared
49
what happens in a read miss with MESI
- processor requests a cache line that is not in the local cache - bus snooping = other caches check if they have the requested line
50
what are the 4 possible outcomes from a read miss using MESI
1. exclusive -> shared - another cache has a clean copy & downgrades it to shhared 2. shared -> shared - multiple caches have a clean copy 3. modified -> shared - a modified cache provides the line 4. invalid -> exclusive - if no cache has the line, fetched from meproy
51
describe what happens when a read hit occurs with MESI
1. read hit occurs: processor finds the requested data in local cache 2. direct access: data is read immediately without accessing the bus 3. no state change: the cache line remains in its current state
52
describe what happens when a write miss occurs using MESI
- processor issues a read-with-intent-to-modify request - cache line is fetched from memory and marked as modified
53
what are the 2 possible scenarios when a write miss occurs using MESI
1. another cache has a modified copy 2. no modified copy exists
54
what happens when another cache has a modified copy during a write miss using MESI
- that cache writes the modified line back to memory - it invalidates its copy, allowing the initiating processor to fetch and modify the line
55
what happens when no modified copy exists during a write miss using MESI
the processor reads the line from memory, modifies it, and invalidates shared/exclusive copies in other caches
56
what are the 3 states a cache line can be in when a write hit occurs using MESI
1. shared 2. exclusive 3. modified
57
what happens when a cache line is in the shared state during a write hit using MESI
- processor requests exclusive ownership - other caches invalidate their shared copies - the line transitions to modifies and the update occurs
58
what happens when a cache line is in the exclusive state during a write hit using MESI
- the processor already owns the line exclusively - it transitions from exclusive to modified and updates the data
59
what happens when a cache line is in the modified state during a write hit using MESI
- the processor already has exclusive ownership and has modifies the line - it simply updated the data without any state change
60
what is the challenge with L1L2 cache consistency
L1 caches do not directly connect to the bus, so they cannot use snoopy protocols like L2 cache
61
what is the solution to L1-L2 cache consistency
extend coherence protocol to L1 caches - L1 cache lines include state bits to track L2 cache states - use a write-through policy: - writes to L1 are forwarded to L2 - ensures L2 caches remain updated and visible to the coherence protocol
62
how is processor performance measured in multithreading/chip multiprocessors
MIPS rate
63
MIPS rate formula
f x IPC where: - f = clock frequency - IPC = instructions per cycle
64
how to increase processor performance
1. ^ f 2. ^ IPC
65
why does increasing frequency increase processor performance
faster cycles per second = more instructions executed
66
4 ways to increase IPC
1. instruction pipelining - overlap execution stages 2. employ superscalar architectures with multiple pipelines 3. optimise instruction scheduling to execute instructions out of order 4. predict and pre-execute instructions to avoid stalls
67
describe multithreading
- divides the instruction stream into multiple threads - threads executed in parallel for ^ efficiency - enables ^ instruction-level parallelism - low added hardware & power cost
68
what is a process
- a running instance of a program - owns resources: memory, files, I/O devices - has an execution state
69
what is a thread
- a lightweight unit of execution within a process - shares process resources - has its own execution context - PC, stack
70
define process switch
involves saving/restoring all process data
71
define thread switch
only updates thread-specific data
72
what is the hierachy of execution in multithreading
1. process 2. thread - sequence of instructions 3. instruction
73
apply an example to the hierarchy of execution in multithreading
1. process - microsoft word 2. thread - spell checking, auto-saving 3. instruction - store 'A' at memory location
74
what are the 4 multithreading approaches
1. interleaved multithreading (fine-grained) 2. blocked multithreading (coarse-grained) 3. simultaneous multithreading (SMT) 4. chip multiprocessing (multicore)
75
describe interleaved multithreading
- multiple thread contexts handled simultaneously - switches to a different thread at the end of every clock cycle - hides latency by keeping the processor bus - if thread is blocked due to data dependencies or memory delays -> other ready thread selected
76
use of interleaved multithreading
in order processors to ^ efficiency
77
how does thread switching work in interleaved multithreading
- in interleaved multithreading, thread switches occur with zero cycles delay - this is possible because there are no control or data dependencies between threads - interleaved multithreading improves processor utilisation but a single thread takes longer to complete -multiple threads compete for cache resources, increasing the probability of cache misses
78
describe blocked multithreading
- executes instructions from a single thread continuously - thread runs until a long-latency event occurs, then processor switches to another thread - v processor idle time due to stalls - v thread switching
79
use of blocked multithreading
in order processors that would otherwise stalls
80
what happens to thread switching in blocked multithreading
in blocked multithreading, a thread may require one cycle, as a fetched instruction could trigger the switch & need to be discarded
81
what is SMT
- allows multiple threads to issue instructions in the same cycle - each cycle, instructions from different threads are scheduled in parallel - maximises CPU resource utilisation - threads execute simultaneously
82
use of SMT
modern CPUs
83
describe chip multiprocessing
- multiple cores integrated on a single chip - each core executed its own thread simultaneously - threads assigned to separate cores - ^ performance without ^ pipeline complexity - efficient use of chip logic area for parallel execution
84
use of chip multiprocessing
modern CPUs - Intel core, AMD Ryzen, ARM processors
85
what are the 3 possible pipeline architectures that involve multithreading
1. single-threaded scalar 2. interleaved mulithreading scalar 3. blocked multithreading scalar
86
describe single-threaded scalar architecture
- superscalar processors can issue multiple instructions per cycle - instructions from a single thread are issued in a cycle - not all execution slots are utilised = inefficent - horizontal loss: > max instructions are issued in a cycle - vertical loss: no instructions are issued in a cycle due to stalls or depedencies
87
describe interleaved multithreading scalar
88
describe blocked multithreading scalar
- very long instruction word (VLIW) architecture places multiple instructions in a single instruction word - compiler decides which instructions can execute in parallel at compile time - VLIW CPUs do not schedule instructions dynamically - NO-OPs are used to fill instruction words - v hardware complexity BUT wastes execution slots
89
chip multiprocessing vs SMT
- SMT has > level of instruction-level parallelism - CMP each core are limted to executing instructions from 1 thread - CMP has > performance with same instruction issue campability - combination = ^ efficiency
90
define clusters
- alternative to SMP for high performance and availability - a cluster consists of multiple interconnected computers (node) working together - the system appears as a single machine to users & applications
91
use of clusters
server applications & large-scale computing
92
4 benefits of clusters
1. absolute scalability 2. incremental scalability 3. ^ availability 4. superior price/performance
93
how is absolute scalability a benefit of clusters
clusters can scale to hundreds or thousands of machines, surpassing standalone systems
94
how is incremental scalability an advantage of clusters
nodes can be added gradually, allowing seamless expansion without major upgrades
95
how is ^ availability an advantage of clusters
if a node fails, the system continues running, ensuring fault tolerance and reliability
96
how is the superior price/performance an advantage of clusters
clusters use commodity hardware to achieve high performance at lower cost than a single large machine
97
what are the 2 types of configuration in clusters
1. no shared disk 2. shared disk
98
describe a cluster configuration with no shared disk
- interconnection is a high-speed link used for message exchange - link can be a shared LAN or dedicated interconnection facility - some cluster nodes may also connect to external networks for client communication - each computer can be a multiprocessor, improving both performance & availability
99
describe a cluster configuration using a shared disk
- allows multiple nodes to access a common disk subsystem - nodes are still interconnected via a message link for coordination - shared disk = RAID system to prevent single points of failure - RAID or similar redundant storage ensures high availability in case of disk failures
100
what are the 5 clustering methods
1. passive standby 2. active secondary 3. seperate servers 4. servers connected to disks 5. servers share disks
101
describe passive standby cluster method
secondary server takes over on failure
102
describe active secondary cluster method
secondary server also handles processing
103
describe seperate servers cluster method
each server has its own disks; data is copied from priamry to secondary
104
desribe server connected to disks cluster method
servers are cabled to the same disks, with ownerhsip transfer on failure
105
describe servers shared disks cluster method
multiple servers access the same disk
106
what are the 3 OS design issues
1. failure managament 2. load balancing 3. parallelising computation
107
what are the 2 types of clusters that help negate failure management
1. highly available clusters 2. fault-tolerance clusters
108
describe highly available clusters
ensures a high probability that resources remain in service - if a system or disk fails, queries in progress are lost - lost queries can be retried in another node - no guarantee on the state of partially executes transactions
109
describe fault-tolerance clusters
ensure all resources are always available - use redundant shared disks - implement mechanisms to roll back uncommitted transactions and commit completed ones
110
how is load balancing dealt with in clusters
- distributes workloads evenly and prevents bottlenecks - clusters should be scalable, integrating new nodes seamlessly - new nodes must be automatically included in scheduling - middleware
111
what should middleware support for load balancing
- device migration of services between nodes - execution of services across multiple nodes
112
what are the 3 ways to parallelise computation
1. parallelising compiler 2. parallelised application 3. parametric computing
113
describe parallelising the compiler
- determines parallel sections at compiler time and distributes them across nodes - performance depends on problem complexity and compiler efficiency - difficult to develop
114
describe parallelised application
- programmer writes code explicitly for a cluster, using message passing for data exchange - provides fine control but increases development complexity
115
describe parametric computing
- runs the same algorithm multiple times with different parameters - requires parametric processing tools for job management.
116
describe cluster architecture
- individual components connected via a high-speed LAN or switch - each node can operate independently but works together as a unified system - middleware layer enables cluster functionality - provides a single-system image to users - ensures high availability through load balancing and failure response
117
what are blade servers
an example of cluster implementation - houses multiple server modules in a single chassis - used in data centres to save space & ^ system management - chassis provides power, blade = CPU, memory & storage - v physical footprint & optimise network efficiency in large-scale environments
118
advanatges of clusters over SMP
1. superior scalability 2. higher availability with redundancy 3. more adaptable for future high-peformance needs
119
define uniform memory access (UMA)
all processors access memory with equal latency
120
use of UMA
SMP
121
define nonuniform memory access (NUMA)
- memory access time depends on which region is accessed - different processors experience different memory latencies
122
define cache-coherent NUMA (CC-NUMA)
- a NUMA system that maintains cache coherence among processors - distinct from SMP & clusters
123
what are the 3 SMP scalability challenges
1. limits due to ^ bus traffic 2. ^ processors = ^ cache-coherence & memory traffic 3. shared bus becomes a bottleneck
124
what is the solution to SMP scalability issues
clusters & NUMA
125
describe clusters as a solution to SMP scalability issues
- use private memory per node, avoiding shared-memory bottlenecks - software-based coherence enables scalability - fault-tolerant and cost-effective for large scale computing - scales well but lacks global memory view
126
describe NUMA as a hybrid approach to solve SMP scalability issues
- provides a global memory view but with variable access latencies - allows scalable multiprocessor systems without shared-bus bottleneck
127
describe CC-NUMA organisation
- composed of multiple SMP-based nodes, each with its own processors and memory - nodes are interconnected via a high-speed interconnect - processors see a single system-wide memory address space
128
how are memory accesses in CC-NUMA
automatic & transparent - local memory acesses - node's memory bus - remote memory accesses - interconnect
129
how is cache coherency guaranteed in CC-NUMA
- each node contains a directory tracking memory location & cache states - directory ensures consistent data access
130
3 advantages of CC-NUMA
1. > parallelism than SMP 2. v bus traffic per node 3. performance issues can be mitigated by: efficient caching, spatial locality & page migration
131
2 disadvantages of SMP
1. not fully transparent 2. availability concerns
132
5 characteristics of cloud computing
1. broad network access 2. rapid elasticity 3. measured service 4. on-demand self-service 5. resource pooling
133
what does broad network access in cloud computing mean
accessible over a network via diverse devices
134
what does rapid elasticity mean in cloud computing
resources scale up or down dynamically based on demand
135
what does measured service mean in cloud computing
usage is metered, monitored and reported for transparency
136
what does on-demand self-service mean in cloud computing
users can provision resources automatically without human intervention
137
what does resource pooling mean in cloud computing
multi-tenant model where resources are dynamically allocated based on demand
138
what are the 4 cloud computing deployment models
1. public 2. private 3. community 4. cloud
139
describe public cloud
open to the public or industry groups, owned by a cloud provider
140
pros of public cloud
- cost-effective - minimal managment overhead
141
con of public cloud
security concerns
142
describe private cloud
hosted with an organisations internal IT environment
143
pros of private cloud
- greater security - control over data
144
cons of private cloud
- higher cost - increased management responsibility
145
define community cloud
shared by multiple organisations with common requirements
146
pros of community cloud
- controlled data exchange - regulatory compliance
147
cons of community cloud
- higher cost than public cloud - limited scalability
148
define hybrid cloud
combination of two or more cloud models, allowing data portability
149
pros of hybrid cloud
- securtity & cost benefits of private cloud
150
cons of hybrid cloud
complex integration & managment
151