Qs from Prof Flashcards

1
Q
  • Provide a concise definition of a distributed system.
A

“A distributed system is a program that conisists of multiple parts running on more than one computer interconnected via a network”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • Describe three benefits that may be offered by using a distributed system
    rather than a centralized one.
A
  1. Improved/broader access to the system
  2. Enhanced sharing
    * resources can be easily shared by many users
  3. Cost-effectiveness (because of sharing)
  4. Less systems admin effort
  5. Enhanced availability (multiple copies of data, replicated servers)
  6. Better performance
    * content can be closer geographically
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  • Describe three possible problems that may arise when using a distributed
    system rather than a centralized one.
A
  1. More complex (harder to build and maintain)
  2. Higher operational costs (os upgrades/patches)
  3. Security and trust issue
  4. Decreased availability (what if part of the system goes down)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  • Briefly explain the difference between connection-oriented communication
    and connectionless communication.
A

Connection Oriented (like TCP) creates a connection like a phone call, it is active until one side hangs up

Connectionless Communication(like UDP): Is like sending a letter. Each communication is treated as a single letter addressed and sent to the other party.

We must specific the communication partnet (port and IP) in every send/recieve op

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • Explain why a connection-oriented protocol (like TCP/stream sockets) tends
    to have higher overhead than a connectionless protocol (like UDP/datagram
    sockets).
A

UDP is faster to setup and send, it doesn’t create and hold open a pipe for the entirety of communication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  • What is a network port number and why is it necessary?
A

Port number helps with ensuring we get unique PIDs across multiple machines.

We need a unique machine ID and a unique process ID

“A port number identifies a service provided by a process running on a given machine”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  • What does DNS stand for? What function does DNS perform? In general terms,
    what type of distributed application is DNS?
A

Domain Name System

Gives a name that is easier to remember than an IP address

DNS helps to locate a resource.

?A server farm?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • Briefly explain when *broadcasting* can be useful in locating some service.
A

Can be useful on a LAN, it will ask all machines network where a resource is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  • What are extended failure modes? Give two examples of such failure modes.
A

“ways of failingn that don’t occur in centralized systems”

  • Concurrency: Having more than one thing happening at the same time
  • Communication

Examples:

  • One process fails but another still runs
  • Communication fails between communicating processes
  • Communication is garbled between communicating processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  • How are extended failure modes and the choice between connectionless and

connection-oriented communications related?

A

They both rely on communications over a network.

When communicating order, reliability are both factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a timeout? Give an example of where a timeout might be useful in

a distributed system.

A

A timeout is a specific amount of time to wait for a response back when a communication is sent. It helps us to detect a failure.

This is useful, as we may be waiting on one machine to respond back and that can cause delays.

It is possible for the other side to be down.

A timeout allows for a retry with the same or a different server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  • What is meant by the term scalability?
A

Scalability is simply defined as the ability of a system to grow (in scale) to larger sizes without making changes to the system design.

  • We don’t have to change the techniques used as the system becomes 10/100/1000/… times bigger
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  • What is meant by the term consistency?

How does this relate to using replicated servers to enhance reliability?

A

Keeping the contents of replica servers identical.

If we replicate servers to increase reliability, we need to ensure all content is identical and don’t hand out incorrect/stale information to different web clients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  • Give an example of how design choices related to resource naming can affect

distributed system scalability.

A

When designing we need to consider:

  • Information required to do discovery
  • Cost of discovery
  • Reliability of discovery
  • The scale of the discovery mechanism
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  • What is the primary difference between a process and a thread?
A

Process can be though of as a program in execution includes:

  • code that is executing
  • data being operated on
  • Execution state
    • register contents, PC, call stack

processes run on a single machine

A thread is a cheaper alternative to processes

  • A unit of actibity alone

Can having more than on thread in a memory space

a thread is a light-weight process, multiple running on a single machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  • Concisely describe what a *nix fork() call does.
A

A clone of the current running process is made.

It returns either:

  • the process id (pid)of the created child process
  • or 0 (to indicate that we are the child process)

This can then be used to continue on with the code or have a clone exec(…) another program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  • Why must each thread have its own stack?
A

In order for it to be concurrent is needs to be able to manipulate the program/data independently, which wouldn’t happen with a shared stack.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  • What does the start() method on a Java Thread object do?
A

After a call to start( ), the original thread (the one that called start) and a new thread will both be running

The new thread, once it is started will be running it’s run( ) method. Every newly created thread starts by running this code.

So the thread is created but doesn’t actually start until the start( ) command is recieved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
  • How do we normally define a correct *concurrent* execution?
A

Correctness is defined to be “The same as any sequential execution of the concurrent programs”

Correctness refers to access and modifying data at the same time (well one at a time)

it doesn’t matter who goes first, as long as one finishes before the other begins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
  • What is a critical section?
A

A section of code that acceses shared data

  1. “What are the shared variables?”
  2. Which critical sections are related
    * Which access the same shared variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is RPC? Why does the lack of shared memory between the caller and the callee make RPC harder to implement than LPC?

A

RPC was originally designed to elimate sending messages (which was unfamiliar). It abstracts/hides the messaging.

An RPC system translates procedure calls into appropriate message passing for the programmer.

Instead of running a procedure call on a local machine, we are making a procedure call on a different machine than the caller.

In short, how do we deal with and pass data of different types. Long answer, see the image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
  • What is marshalling?
A
23
Q
  • What is meant by the term “wire/network format”? How does this relate to RPC?
A

Wire/network format

Everything is tranlated to a pre-agreed upon machine-architecture-neutral representation for transmission and then from that format upon reception.

It handles the problem of deal with multiple data types and system architectures to sucessfully pass as parameters.

24
Q
  • What is RMI? Why is an RMIregistry required in Java?
A

RMI is a more modern (object oriented) version of RPC.

RMI = Remote Method Invocation

Complex arguments to, and results from, a remote method invocation are passed by deep copy rather than by value.

The RMIregistery is required to get a reference to the remote object you want to invoke a method on.

This is an RMI look up server so we can get that reference.

25
Q

What is meant by synchronization? When must threads synchronize their activities?

A

Synchronization is used to control access to shared variables that could negatively affect the correctness.

In most cases, synchronization only involves ensuring that concurrent threads do not concurrently access shared data

Threads must be synchronized when they are altering shared variables.

26
Q

What is a mutex? How is it used to guard access to shared data and thereby synchronize the threads accessing it?

A

Mutex stands for mutually exclusive

Mutex is a common approach to controlling accress to shared data that uses locks or mutex (mutual exlcusion) on the variables associated with the corresponding data

Lock (sometimes called acquire)

Unlock (sometimes called free)

27
Q
  • What, generally speaking, is a socket?
A

A socket is an abrstraction that mimics a physical wall socket.

Sockets is a message passing API

Allows for message ppassing communication

It is an endpoint for communication

28
Q

-What is meant by the term “well-known address”?

A

Well known address refers to the address of a server that is known/hardcoded in advance.

We already know where and what server we are connecting to specifically.

A pre-agreed upon server name and port

29
Q

Briefly explain what an accept() call on a Java serverSocket object does.

A

accept() accepts a connection from a requesting client..

This method creates the connection and returns a normal socket which is then used by the server for messaging.

30
Q

Why are servers normally concurrent? What advantage does this provide?

A

Servers must serve multiple requests.

Threads then handle concurrent procesessing for each individual client.

There is usually delays in processing of information on either side, or in messaging, this would create a queue and slow every other client down.

31
Q

Briefly sketch the *high-level* operation of a *multi-threaded* server based on stream sockets.

A

draw it

32
Q

-Scripting languages are good for implementing “glue logic”. Briefly explain why this makes them useful in developing distributed applications.

A

Scripting languages have special facilities for things such as sequencing commands, pattern matching etc.

Interpreted - easy to use and distribute

Weakly typed - can work on any data

Powerful - from special features

Flexible and composible.

Helps us to easily handle connections or processing of data

33
Q

Give three examples of scripting language features that make them useful in creating distributed systems.

A
34
Q

What is JDBC? Why is it relevant to distributed systems development?

Sketch how this works generally.

A

JDBC allows for remote access of database systems

A standard providing a means of accressing “any” database seafly and remotely.

JDBC provides this same ability of DB access but specific to Java

35
Q

Give an example of some processing that naturally belongs on a client.

A

Display of results from server to the client.

As an example, a web-client.

Anything that would be hampered by a delay with the server.

36
Q

Give an example of some processing that naturally belongs on a server.

A

NFS file server and sending/recieving files

37
Q

Give an example of some processing that might belong on either a client or the server.

A

Some sort of processing that needs to be done and there are resources avail on either side.

38
Q
  • What was meant by the “Thin Client vs. Thick Client” argument?
A

Thin means not much is processed on the client side logic wise.

Thick means that there is a lot of processing happening on the client side.

39
Q

What three factors should you consider when deciding how to distributed application functionality between the server and the group of clients

A
  1. Location of the resources being accressed
  2. Communication costs
  3. Workload balance/distribution

You should always structure in a way that minimized communication

All things being equal, chose the dsitribution that best balance the workload across all available machines.

  • Conisder the capabilities of each machine
40
Q

Briefly explain why a stateless server provides better fault tolerance.

A

Stateful server is on that maintains information about it’s clients.

  • Which clients are connected
  • What are they doing?

What happens if the server fails? How is the state information reconstructed after it recovers.

Reconstructing this informating is the difficulty during fault tolerance.

41
Q

How does a *datagram* server know which client it is communicating with so that it can return a result to that client?

A

It gets the information from the datagram and responds back to that.

It’s gets the information from the packet

bankThread(Hashtable<integer> accounts,DatagramSocket clientSocket, InetAddress address, String firstMessage, int port){</integer>

this. accounts = accounts;
this. mySocket = clientSocket;

buf = new byte[1000];

this. address = address;
this. firstMessage = firstMessage;

PORT = port;

}

42
Q

Why is a stateless server commonly preferred over a stateful server?

A
43
Q

If a server is to be stateless, where must the state be stored?

A

The state should be stored in the client.

If stateful, then the state needs to be stored in the main server (not the inidividual threads)

44
Q

What is a server Process Pool? What advantage does it offer?

A

The cost of creating and maintain multiple processes is high

  • This impacts the performance improvements that can be achieved.

Rather than create new processes for each clinet as they connected, servers pre-created a “pool” of existing processes to which they could assign client request as they arrived.

  • Cheap that suring service provision as the cost of process creation is up front before connection.

This improved performance

45
Q

Give an example of an application where having the threads at a server able to share data would be a significant advantage.

A

Think about a multi-player game

46
Q

What is the difference between thread per request and thread per connection?

A

Thread per request:

Incoming client requests come into a central location and threads handle those individual requests

Thread per connection:

Once a connection is accepted it recieves a dedicated thread, which handles all interactions with that client.

47
Q

What is a server farm? What two primary benefits are offered by a server farm?

A
48
Q

What is consistency maintenance? Why is it a challenge with replicated server?

A

With server farms each individal server machine must maintain a copy of the “resources” to be server

-When we want to change the data, we have to change it in all the copies

We want to avoid providing clients stale data

49
Q

What is Round Robin (RR) DNS? How is it useful with server farms? When might its benefit be limited? (Hint: What does DNS do to ensure lookup efficiency and

how does this impact RR-DNS?

A

DNS provides a feature know as “round robin” DNS where a single domain name is mapped to multiple ip Addresses (e.g. aviary.cs.umanitoba.ca)

  • The address is selected in turn - round robin fashion

This is useful for server farms, because we put one machine out front and redirect to an available server.

Better handling of failed servers and load balancing

It’s benefit is limited if the initial outfront server is slow or we have an issues with it.

The lookup result is cached, so that it doesn’t need to lookup the request again. If RR assigns a specific server, then the comp will chache that result and connect to it directly.

50
Q

What is a redirect host? What potential benefits does it offer over RR-DNS?

A

A redirect host is one central server that then directs to a faster to access server, closer to a client geographically.

It also goes to one central direct, that hopefully doesn’t depend on your cache as much, because your chache directs to this central server.

51
Q

What does CDN stand for? What service does a CDN provide?

A

Content Distribution/Delivery Network

provides for efficient delivery of content across a large geohraphic are via wide-area server replication.

Akamai is an example of this.

It always for synchronization of content across multiple edge servers.

52
Q

How, in general, can a CDN client’s web site be transparently redirected to an appropriate CDN edge server?

A

The domain name from the client is mapped to IP address of appropriate edge server by “mapping” function (& DNS)

53
Q

Using a collection of simple diagrams, differentiate between the organizational structure of centralized, single-tier, two-tier, and 3-tier Client/Server designs

A
54
Q

What is middleware? What three common advantages are offered by the use of middleware in designing distributed systems?

A

“Middleware is any software that is used solely to connect application components together”

Allows for components to communicate.

Advantages:

  1. complexity Management
    - hide difficult code behind a simpler middleware
  2. Speed of development
    - re-use code so you don’t have to rewrite it
  3. Enhanced reliability