Multicore Flashcards
(37 cards)
Diagram:
Single-Core CPU Chip

Overview of
Multicore Architectures
- Replicate multiple processor cores on a single die
- The cores fit into a single processor socket
- Also called Chip Multi-Processor ( CMP )
- Cores run in parallel
- Within each core, threads are time-sliced (just like on a uniprocessor)
- OS percieves each core as a separate processor
- Scheduler maps threads/processes to different cores
Multicore:
Interaction with the
Operating System
- OS perceives each core as a separate processor
- OS Scheduler maps threads/processes to different cores
- OS is likely multi-threaded itself, scheduling it’s own use of the cores
- Most major OSs support multi-cores today:
- Windows, Linux, Mac OS X, …
Motivation for using
Multicore Processors
- It is difficult to make single-core clock frequencies even higher
- Deeply pipelined circuits:
- heat problems
- Interconnect delays dominate
- difficult design and verification
- large design teams necessary
- Many new applications are multithreaded
- General trend in computer architecture
- Shift towards more parallelism
Instruction Level
Parallelism
- Parallelism at the machine-instruction level
- The processor can
- re-order instructions,
- pipeline instructions
- split instructions into microinstructions
- do aggressive branch prediction
- etc
- Instruction-Level parallelism enabled rapid increases in processor speeds over the last 15 years
Instruction Level
Improvements
- Architectural improvements have become small and incremental:
- Additional circuitry contributes little to application performance
- More likely additional interconnect delays will slow processor’s cycle time, reducing performance for all applications
Thread-Level Parallelism (TLP)
- Parallelism on a more coarse scale
- Server can serve each client in a separate thread
- A computer game can do AI, graphics and physics on three separate threads
- Single-Core superscalar processor cannot fully exploit TLP
- Multi-core architectures are the next step in processor evolution: explicitly exploiting TLP
Multiprocessors:
Definition
Any computer with several processors
Multiprocessors:
Types
Single Instruction Multiple Data (SIMD)
- ex: Modern Graphics Cards
Multiple Instructions, Multiple Data (MIMD)
Multiprocessors:
Memory Types
Shared Memory
In this model, there is one(large) common shared memory for all processors
Distributed Memory
In this model, each processor has its own(small) local memory.
It’s content is not replicated anywhere else.
Processors have some other communication mechanism.
What is a “Multi-Core” Processor?
- A special kind of multiprocessor
- All processors are on the same chip
- Multicore processors are MIMD:
- Different cores execute different threads( Multiple Instructions)
- operating in different parts of memory (Multiple Data)
- Multi-core is a shared memory multiprocessor:
- All cores share the same memory
Types of Applications
that benefit from
Multi-Core Architecture
- Database Servers
- Web Servers
- Compilers
- Multimedia applications
- Scientific applications, CAD/CAM
- In general, applications with Thread-Level parallelism
Simultaneous Multithreading (SMT)
- A technique complementary to multi-core
- Addresses the problem of the processor pipeline getting stalled
- Permits multiple independent threads to execute simultaneously on the same core
- Weaving together multiple threads on the same core
- Without SMT, only a single thread can run at any given time
- Cannot simultaneously use the same functional unit
Processor Pipeline Stall:
Two Causes
- Waiting for the result of a long floating point or integer operation
- Waiting for data to arrive from memory
- Other execution units wait unused if no SMT
Why SMT is not a “true” Parallel Processor
- Enables better threading (e.g. up to 30%)
- OS and applications perceive each simultaneous thread as a separate “virtual processor”
- The chip has only a single copy of each resource
- Compare to multicore:
- Each core has its own copy of resources
Combining
Multi-Core and
SMT
- Cores can be SMT-enabled (or not)
- Number of SMT threads:
- 2, 4, or something 8 simultaneous threads
- Intel calls them “hyperthreads”
Different Combinations:
- Single-Core, non-SMT (standard uniprocessor)
- Single-Core, SMT
- Multi-Core, non-SMT
- Multi-Core, SMT
Comparison:
Multi-Core vs SMT
Multicore:
- Several cores, each is smaller and not as powerful
- Easier to desgin and manufacture
- Great with thread-level parallelism
SMT:
- Can have one large and fast superscaler core
- Great performance on a single thread
- Mostly still only exploits instruction-level parallelism
Memory Hierarchy:
- SMT
- Multi-Core Chips
Simultaneous Multithreading Only:
All caches are shared
Multicore Chips:
- L1 caches are private
- L2 caches private in some architectures, shared in others
*Memory is always shared
What are “Fish” Machines?
- Dual-core Intel Xeon processors
- Each core is hyper-threaded
- Private L1 caches
- Shared L2 caches
Advantages of
Private Caches
- Closer to core, so faster access
- Reduces contention
Advantages of
Shared Caches
- Threads on different cores can share the same cache data
- More cache space available if a single (or a few) high-performance thread runs on the system
Cache Coherence Problem
Since multicore has private caches,
how to keep data consistent across caches?
- Each core should perceive memory as a shared, monolithic array
- One core copies something into its cache, makes changes, and writes back to memory
- But a second core reads the stale copy before core 1 writes back into memory
- This is a general problem with multiprocessors, not just multicore
- There are many solution algorithms and coherence protocols designed to deal with this
Cache Coherence:
Simple Solution
Invalidation-based protocol
with snooping
Alternatively: Update protocol
Cache Coherence:
What is “snooping”?
All cores continuously “snoop”, or monitor,
the bus connecting the cores