Algorithms and computations for big data Flashcards Preview

Big Data > Algorithms and computations for big data > Flashcards

Flashcards in Algorithms and computations for big data Deck (51)
Loading flashcards...
1

The four parallel paradigms

  1. Multithreading
  2. Message parsint interface (MPI)
  3. Map-Reduce
  4. Spark

2

Learning outcomes

  • Knowledge and understanding
    • discuss important technological aspects when designing and implementing analysis solutions for large-scale data,
    • describe data models and software standards for sharing data on the web.
  • Skills and abilities
    • use Python to implement applications for transforming and analyzing large-scale data with appropriate software frameworks,
    • provide access and utilize structured data over the web with appropriate data models and software tools
  • Judgement and approach
    • suggest appropriate computational infrastructures for analysis tasks and discuss their advantages and drawbacks,
    • discuss advantages and drawbacks of different strategies for dissemination of data,
    • discuss large-scale data processing from an ethical point of view.

3

Levels of parallelism

  • Multi-core CPU's
  • Several CPU's per system
  • Clusters of multiple systems

4

Speedup

Given two variants of a program solving the same problem- a baseline, and a optimzed implementation, faster algorithm, or parallel version- with running times t and t' (optimized time).

 

S = t/t'

5

Amdahl's  law

  • Propostion of the code that is parallizable

S(f,s) = 1/((1-f)+f/s)

as S goes to infinity then 1/(1-f)

 

  • No, only some programs benefit from parallelization and their maximal acceleration is bounded.

6

Multicore CPU are technical necessity ?

Yes. Cooling is a bottleneck when increasing clock frequency of a CPU

7

Flynn's taxonomy

  • SIMD: GPU
  • MIMD: Multi-core processors

8

Memory hierarchy

9

cache memory

Small hi-speed memory attached to processor core

10

Symmetric Multiprocessor (SMP)

  • Multiple CPU's (typically 2-8; each can have multiple cores) share the same main memory
  • One adress space

11

High performance computing (HPC)

12

Classical HPC compute cluster is an appropriate computer architecture for Monte Carlo simulations like the parallel Pi example. Assume that the parallelization across nodes is not a problem.

Yes, HPC is a good computer architecture for Monto carlo simulations.

13

Difference between HPC and commodity

14

Distributed compute cluster (commodity hardware)

15

Workload comparison between HPC and Datascience

16

Data-intensive Compute Cluster

17

Latency vs computation

Computation is cheap, datamovement is very expensive

18

Multithreading

  • Threads communicate via variables in shared memory
  • Simultaenous read acess to data
  • Write access to same data require lock

19

In multi-threaded programming the time needed to communicate between two threads is typically on the order of

200ns

20

In multithreaded- programming all threads can simultenously....

...read nd write, but not the same data

21

Threads writing to memory incorrectly

22

Threds writing to memory correctly

23

Locking

  • Protects from errors due to parallel writes
  • Lock
    • is acquired before reading/writing data
    • is released when done
    • assures serial access to shared mem

24

Deadlocks

  • Execution stops because 2 or more threads wait for each other
  • Two threads need to write variables a & b
    • Thread 1 locks a and waits for b
    • Thread 2 locks b and waits for a

25

Questions to ask when parallelizing

• Which sections can be parallelized?
• What needs to be serial?
• When is communication necessary between the thread?
• How much data needs to becommunicated?

26

Load balancing

Distributing the workload equally amongst the threads

27

Message parsing interface (MPI)

  • The message passing interface (MPI) is a standardized means of exchanging messages between multiple computers running a parallel program across distributed memory.
  • Used for high performance computing
  • Usually on super computers
  • Substantial latencies for thousands of cores
  • Lower throughput for sharing large amount of data
  • Communication incl. exchange of data via highspeed network (5000ns + 2x RAM acess)

28

Passing a message over Infiniband takes
about 5000 ns. This is how many times
slower than a memory access?

11-50 times slower

29

Word count and Cahracter count Map-Reduce

30

Combiners

  • Semi reduce
  • Word count
    • Each combiner adds up n/k values
    • Each reduces gets k values to add up
    • n amount of inputs
    • k amount of nodes
  • Use combiners to utiilize more cores