10 - Architecture and Parallelism Flashcards by Hugo Tdn

How can you create an ALU?

Integrating: A full-adder, 2s-complenter, shifter, and comparator.

There’s a logic unit for each bit, with its own carry in, carry out, decoder, and logical unit.

How well did you know this?

Not at all

Perfectly

What does the internal bus do?

It allows the control unit, ALU, registers, addressing unit, etc.

The speed of the entire system will depend on bus width (number of bits that can transfer simultaneously) and bus length (the motivation for miniature computers).

Bus arbitration - the problem is that only one set of signals can be sent per clock cycle. (e.g. a register transfers something to ALU, but also data transfer to general register). The bus arbitration system decides which gets to go first.

Also, there may be multiple busses on which they go.

How well did you know this?

Not at all

Perfectly

How is memory made up of gates?

Gates combine to make switches, which combine to make memory cells, which are then combined and integrated to make memory chips.

How well did you know this?

Not at all

Perfectly

What are the two major types of memory?

RAM (Random access memory) - programs can access and manipulate memory cells while the computer is running.
- - This can be addressed by machine instructions through the memory address register, manipulated through the data register, etc.
ROM (read-only memory) - cannot be changed while the computer is running.
- - Ordinarily burned into a single configuration (e.g. bootup).

How well did you know this?

Not at all

Perfectly

How quick is a clock cycle?

The computer can transition to a new “state” at every tick of the system. (near light-speed).

Clock cycle length determines CPU speed (mostly). However, clock cycle length depends on distance between components.

How well did you know this?

Not at all

Perfectly

How has CISC architecture been improved?

- more efficient microprograms
- more powerful ISA level instructions
- cache memory
- more registers
- wider buses
- making it smaller
- more processors
- floating point instructions

How well did you know this?

Not at all

Perfectly

What are the limitations of improving CISC?

Improving a specific architecture requires instructions to be backward compatible.

However, the improvements that you can make come at the expense of backward compatibility (SOME companies have built in the old and the new -> not an improvement, just a transition).

How well did you know this?

Not at all

Perfectly

What is RISC?

Reduced instruction set architecture. These instructions are like CISC micro-instructions.

There’s a much smaller set of instructions at ISA level because there’s no need to go through decoding.

For example, smartphones use RISC. Even though the programs look much longer, they execute faster (b/c they do several things). RISC architecture is generally used in embedded systems so that the programs execute much faster.

How well did you know this?

Not at all

Perfectly

What are the major RISC design principles?

Instructions are executed directly by the hardware (no microprograms)
Instruction cache (maximize rate of fetching instructions).
Instructions easy to decode (a separate fetch unit, often with its own cache)
Only 2 instructions from memory (LOAD and STORE).
Plenty of registers.

How well did you know this?

Not at all

Perfectly

How is speed generally improving now?

Try to miminize memory and I/O accesses
- - Cache
- - Separate I/O unit (buffers/ processing)
- - Separate network communication unit (NIC)
Parallel processing

How well did you know this?

Not at all

Perfectly

What are the two types of parallelism?

Instruction-level parallelism
- - pipeline
- - cache
Processor-level parallelism
- - multiprocessor (multiple CPUs, common memory)
- - multicomputer (multiple CPUs, each with own memory)

How well did you know this?

Not at all

Perfectly

What is pipelining?

The hardware must provide separate units responsible for its part of the instruction, and when that is done, it will work on the next instruction as the previous one is working on the previous instruction.

U-1 - instruction fetch
U-2 - instruction decode
U-3 - operand fetch
U-4 - instruction execute
U-5 - operand store

How well did you know this?

Not at all

Perfectly

What is instruction caching?

The hardware provides area for multiple instructions in the CPU.

- reduces number of memory accesses
- instructions available for immediate execution
- might cause problems with decision, repetition, and procedure structures in programs

How well did you know this?

Not at all

Perfectly

What is multiprocessor parallelism?

Multiple processors all accessing the same shared memory (the jobs can be split among all the processors).

One of the ways they are managed is to have a master processor to direct them. Another way is to have them communicate with each other.

How well did you know this?

Not at all

Perfectly

What is multicomputer parallelism?

Each of the processors has its own memory and communicates with the others through an interconnection network (the job is split up and assigned to those that have their own memory, etc.).

How well did you know this?

Not at all

Perfectly

How do multiprocessors and multicomputers compare?

Study These Flashcards

Multiprocessors are difficult to build, but relatively easy to program.

Multicomputers are easy to build (given networking technology), but extremely difficult to program.

Hybrid systems (like cloud computing) have integrated aspects of both.

How do multiprocessors and multicomputers coordinate?

Study These Flashcards

A multiprocessor system communicates through circuits/memory.

A multicomputer system communicates through networking technologies, like packets (data, source/destination information, etc.) or links, switches, and interfaces, etc.

What is software parallelism?

Study These Flashcards

Even though there have been advances in hardware, it’s been difficult for software to take advantage of those resources. It’s a major research area.

One factor is the parallelizability of algorithms, which depends on number of processors, trade-offs and efficiency (of synchronizing), and parsing sequential/parallel parts.

What is Amdahl’s law?

Study These Flashcards

When there’s a mixture of parallel parts and sequential parts, there’s a potential for some speedup, as represented by the formula:

speedup = n / (1 + (n-1)f)

n = number of processors
f = fraction of code that is sequential
T = time to process entire algorithm sequentially (one processor)

Note: the total execution time is = fT + ((1-f)T)/n – because the parallelizable part is shared equally among n computers.

An algorithm takes 10 seconds to execute on a single 2.46G processor. 40% of the algorithm is sequential. Assuming zero latency and perfect parallelism in the remaining code, how long should the algorithm take on a 16 * 2.4G processor parallel machine?

Study These Flashcards

f = .4
1-f = .6
T = 10s
n = 16

speedup = n / (1+(n-1)f)
= 16 / (1 + 15(.4)) = 16/7.

The expected time is T/speedup, so 10s / (16/7) = 4.375 seconds.

Alternatively, (.410) + (.610)/16, or the sequential plus the parallel divided by the number of processors.

Assuming perfect scalability, what are the implications of Amdahl’s Law when n approaches infinity?

Study These Flashcards

As you add more and more processors, the speedup factor becomes closer and closer to 1/f.

Therefore, if f = .4, parallelism can never make it run more than 2.5 times as fast.

Why is parallel computing such a big research area?

Study These Flashcards

It depends on so many aspects of hardware and software (although software has not been keeping up with hardware advances)

Hardware: CPU speed of individual processors, I/O speed of individual processors, interconnection network, and scalability.

Software: Parallelizability of algorithms, application programming languages, operating systems, and parallel system libraries.

What additional factors does hardware parallelism have to consider (beyond those of single-processor machines, like CPU and I/O speed) for performance?

Study These Flashcards

Interconnection network

Latency (wait time), including distance and collisions/collision resolution.
Bandwidth (bps), including bus limitations and CPU, I/O limitations (multicomputer systems might not even have the same type of processors).

Scalability
– adding more processors affects latency and bandwidth

What are some of the enhancements available for software parallelism?

Study These Flashcards

Parallel system libraries:

Precompiled functions designed for multiprocessing (e.g. matrix transformations)
Functions for control of communication (e.g. background printing)

Application programming languages

Built-in functions for creating child processes, threads, parallel looping, etc.
Mostly imperative (e.g. C)

Operating systems that can take a parallelizable program and assign the tasks.

What are some applications of parallelism?

Multi-user networks (e.g. even local LAN network). -- Internet server -> manage multiple servers simultaneously. Speed up single processes - - chess example (deep blue) - - expert systems - - other AI applications

10 - Architecture and Parallelism Flashcards

(25 cards)