10 - Architecture and Parallelism Flashcards

1
Q

How can you create an ALU?

A

Integrating: A full-adder, 2s-complenter, shifter, and comparator.

There’s a logic unit for each bit, with its own carry in, carry out, decoder, and logical unit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the internal bus do?

A

It allows the control unit, ALU, registers, addressing unit, etc.

The speed of the entire system will depend on bus width (number of bits that can transfer simultaneously) and bus length (the motivation for miniature computers).

Bus arbitration - the problem is that only one set of signals can be sent per clock cycle. (e.g. a register transfers something to ALU, but also data transfer to general register). The bus arbitration system decides which gets to go first.

Also, there may be multiple busses on which they go.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is memory made up of gates?

A

Gates combine to make switches, which combine to make memory cells, which are then combined and integrated to make memory chips.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two major types of memory?

A
  1. RAM (Random access memory) - programs can access and manipulate memory cells while the computer is running.
    - - This can be addressed by machine instructions through the memory address register, manipulated through the data register, etc.
  2. ROM (read-only memory) - cannot be changed while the computer is running.
    - - Ordinarily burned into a single configuration (e.g. bootup).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How quick is a clock cycle?

A

The computer can transition to a new “state” at every tick of the system. (near light-speed).

Clock cycle length determines CPU speed (mostly). However, clock cycle length depends on distance between components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How has CISC architecture been improved?

A
    • more efficient microprograms
    • more powerful ISA level instructions
    • cache memory
    • more registers
    • wider buses
    • making it smaller
    • more processors
    • floating point instructions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the limitations of improving CISC?

A

Improving a specific architecture requires instructions to be backward compatible.

However, the improvements that you can make come at the expense of backward compatibility (SOME companies have built in the old and the new -> not an improvement, just a transition).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is RISC?

A

Reduced instruction set architecture. These instructions are like CISC micro-instructions.

There’s a much smaller set of instructions at ISA level because there’s no need to go through decoding.

For example, smartphones use RISC. Even though the programs look much longer, they execute faster (b/c they do several things). RISC architecture is generally used in embedded systems so that the programs execute much faster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the major RISC design principles?

A
  1. Instructions are executed directly by the hardware (no microprograms)
  2. Instruction cache (maximize rate of fetching instructions).
  3. Instructions easy to decode (a separate fetch unit, often with its own cache)
  4. Only 2 instructions from memory (LOAD and STORE).
  5. Plenty of registers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is speed generally improving now?

A
  1. Try to miminize memory and I/O accesses
    - - Cache
    - - Separate I/O unit (buffers/ processing)
    - - Separate network communication unit (NIC)
  2. Parallel processing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two types of parallelism?

A
  1. Instruction-level parallelism
    - - pipeline
    - - cache
  2. Processor-level parallelism
    - - multiprocessor (multiple CPUs, common memory)
    - - multicomputer (multiple CPUs, each with own memory)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is pipelining?

A

The hardware must provide separate units responsible for its part of the instruction, and when that is done, it will work on the next instruction as the previous one is working on the previous instruction.

U-1 - instruction fetch
U-2 - instruction decode
U-3 - operand fetch
U-4 - instruction execute
U-5 - operand store
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is instruction caching?

A

The hardware provides area for multiple instructions in the CPU.

    • reduces number of memory accesses
    • instructions available for immediate execution
    • might cause problems with decision, repetition, and procedure structures in programs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is multiprocessor parallelism?

A

Multiple processors all accessing the same shared memory (the jobs can be split among all the processors).

One of the ways they are managed is to have a master processor to direct them. Another way is to have them communicate with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is multicomputer parallelism?

A

Each of the processors has its own memory and communicates with the others through an interconnection network (the job is split up and assigned to those that have their own memory, etc.).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do multiprocessors and multicomputers compare?

A

Multiprocessors are difficult to build, but relatively easy to program.

Multicomputers are easy to build (given networking technology), but extremely difficult to program.

Hybrid systems (like cloud computing) have integrated aspects of both.

17
Q

How do multiprocessors and multicomputers coordinate?

A

A multiprocessor system communicates through circuits/memory.

A multicomputer system communicates through networking technologies, like packets (data, source/destination information, etc.) or links, switches, and interfaces, etc.

18
Q

What is software parallelism?

A

Even though there have been advances in hardware, it’s been difficult for software to take advantage of those resources. It’s a major research area.

One factor is the parallelizability of algorithms, which depends on number of processors, trade-offs and efficiency (of synchronizing), and parsing sequential/parallel parts.

19
Q

What is Amdahl’s law?

A

When there’s a mixture of parallel parts and sequential parts, there’s a potential for some speedup, as represented by the formula:

speedup = n / (1 + (n-1)f)

n = number of processors
f = fraction of code that is sequential
T = time to process entire algorithm sequentially (one processor)

Note: the total execution time is = fT + ((1-f)T)/n – because the parallelizable part is shared equally among n computers.

20
Q

An algorithm takes 10 seconds to execute on a single 2.46G processor. 40% of the algorithm is sequential. Assuming zero latency and perfect parallelism in the remaining code, how long should the algorithm take on a 16 * 2.4G processor parallel machine?

A
f = .4
1-f = .6
T = 10s
n = 16

speedup = n / (1+(n-1)f)
= 16 / (1 + 15(.4)) = 16/7.

The expected time is T/speedup, so 10s / (16/7) = 4.375 seconds.

Alternatively, (.410) + (.610)/16, or the sequential plus the parallel divided by the number of processors.

21
Q

Assuming perfect scalability, what are the implications of Amdahl’s Law when n approaches infinity?

A

As you add more and more processors, the speedup factor becomes closer and closer to 1/f.

Therefore, if f = .4, parallelism can never make it run more than 2.5 times as fast.

22
Q

Why is parallel computing such a big research area?

A

It depends on so many aspects of hardware and software (although software has not been keeping up with hardware advances)

Hardware: CPU speed of individual processors, I/O speed of individual processors, interconnection network, and scalability.

Software: Parallelizability of algorithms, application programming languages, operating systems, and parallel system libraries.

23
Q

What additional factors does hardware parallelism have to consider (beyond those of single-processor machines, like CPU and I/O speed) for performance?

A

Interconnection network

  1. Latency (wait time), including distance and collisions/collision resolution.
  2. Bandwidth (bps), including bus limitations and CPU, I/O limitations (multicomputer systems might not even have the same type of processors).

Scalability
– adding more processors affects latency and bandwidth

24
Q

What are some of the enhancements available for software parallelism?

A

Parallel system libraries:

  1. Precompiled functions designed for multiprocessing (e.g. matrix transformations)
  2. Functions for control of communication (e.g. background printing)

Application programming languages

  1. Built-in functions for creating child processes, threads, parallel looping, etc.
  2. Mostly imperative (e.g. C)

Operating systems that can take a parallelizable program and assign the tasks.

25
Q

What are some applications of parallelism?

A

Multi-user networks (e.g. even local LAN network).
– Internet server -> manage multiple servers simultaneously.

Speed up single processes

    • chess example (deep blue)
    • expert systems
    • other AI applications