Introduction, Motivation, Problems, Models Flashcards
(32 cards)
Processor performance
Number of (some kind of) instructions that can be carried out per unit of time
Examples:
• FLOPS: FLoating-Point OPerations per Second (IEEE 64-bit)
• IPS: Instructions per Second
Number of cores x Number of Instructions per Clock x Clock Frequency
FLOPS
FLOPS: FLoating-Point OPerations per Second (IEEE 64-bit)
Historically, floating-point operations were expensive. And account for most
of the work in numerical applications
IPS
IPS: Instructions per Second
Moore’s law
Moore‘s “law” (popular version):
Sequential processor performance doubles every 18 months.
Moore’s “Law” is (only) an empirical observation/extrapolation.
ILP
Instruction-level parallelism (ILP) is a measure of how many of the instructions in a computer program can be executed simultaneously.
ILP must not be confused with concurrency, since the first is about parallel execution of a sequence of instructions belonging to a specific thread of execution
[wikipedia]
SMT/Hyperthreading
Hyper-threading (officially called Hyper-Threading Technology or HT Technology and abbreviated as HTT or HT) is Intel’s proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on x86 microprocessors.
[wikipeida]
FPGA, ASIC
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence the term “field-programmable”. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC).
An application-specific integrated circuit (ASIC /ˈeɪsɪk/) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use.
[wikipedia]
Core
Core: The smallest unit capable of executing instructions according
to a program (used to be called “processor”, “processing unit”)
Processor
Processor: Collection of one or more cores on a single chip (as in
“multi-core processor”)
Multi-core vs Many-core
Multi-core: handful of (standard CPU) cores (2-32)
Many-core: a lot of cores, often in special-purpose processors (GPU)
Bridging models
Model costs translate reasonably into/predicts performance on wide
range of actual hardware
Minimum requirement for a good bridging model:
If Algorithm A performs better than Algorithm B in model, then the implementation of Algorithm A performs better than the implementation of Algorithm B on applicable hardware
Good model • abstracts unnecessary detail, • accounts for the essentials, • leads to interesting results, algorithms and lower bounds, • and “bridging” works
MPI
Message Passing Interface (MPI)
OpenMP
OpenMP (Open Multi-Processing)
C/C++ compiler directives like #pragma omp parallel for
Cilk
Cilk, Cilk++ and Cilk Plus are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops and the fork–join idiom.
Example keywords: spawn, sync, thread
[wikipedia]
MapReduce
[wikipedia]
A MapReduce framework (or system) is usually composed of three operations (or steps):
Map: each worker node applies the map function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of the redundant input data is processed. Shuffle: worker nodes redistribute data based on the output keys (produced by the map function), such that all data belonging to one key is located on the same worker node. Reduce: worker nodes now process each group of output data, per key, in parallel.
The canonical MapReduce example counts the appearance of each word in a set of documents:
function map(documentName, String document): for each word w in document: emit (w, 1)
function reduce(word, partialCounts): sum = 0 for each pc in partialCounts: sum += pc emit (word, sum)
Parallel computing
Parallel computing
The discipline of efficiently utilizing dedicated parallel resources (processors, memories, …) to solve individual, given computational problems.
Typical concerns:
Solving (difficult) problems faster!
Specifically:
Parallel resources with significant inter-communication capabilities,
for problems with non-trivial communication and computational
demands
GPGPU
General-purpose computing on graphics processing units
HPC
High-Performance Computing
Distributed computing
Distributed computing
The discipline of making independent, non-dedicated resources
available and cooperate toward solving specified problem
complexes.
Typical concerns:
Correctness, availability, progress, security, integrity, privacy, robustness, fault tolerance, …
Specifically/typically:
No centralized control
Concurrent computing
Concurrent computing:
The discipline of managing and reasoning about interacting processes that may (or may not) progress simultaneously
Typical concerns:
Correctness (often formal), e.g. deadlock-freedom, starvation-
freedom, mutual exclusion, fairness, …
IPC
Interprocess communication (IPC) refers to the mechanisms an operating system provides to allow the processes to manage shared data.
[wikipedia]
Process calculi
Process calculi such as
Ambient calculus Calculus of communicating systems (CCS) Communicating sequential processes (CSP) Join-calculus π-calculus
Parallel vs. Concurrent computing
Concurrent computing: Focus on coordination of access to/usage of shared resources (to solve given, computational problem)
Parallel computing: Focus on dividing given problem (specification,
algorithm, data) into subproblems that can be solved by dedicated
processors (in coordination)
Aspects of parallelization
•Algorithmic: Dividing computation into independent parts that can be executed in parallel. What kinds of shared resources are necessary? Which kinds of coordination? How can overheads be minimized (redundancy, coordination, synchronization)?
•Scheduling/Mapping: Assigning parts of computation to
processors in good order
- Load balancing: (Re)assigning independent parts of computation to processors such that all resources are utilized to the same extent (evenly and efficiently)
- Communication: When must processors communicate? How?
- Synchronization: When must processors agree/wait?
- Linguistic: How are the algorithmic aspects expressed? Concepts (programming model) and concrete expression (programming language, interface, library)
•Pragmatic/practical: How does the actual, parallel machine look?
What is a reasonable, abstract (“bridging”) model?
•Architectural: Which kinds of parallel machines can be realized?
How do they look?