Blepp Flashcards

(148 cards)

1
Q

Adas mechanisms (protected objects with functions, procedures and entries with guards) are not that suited to solve this memory allocation problem… Why?

A

Guards cannot test on parameters, leading to complicated code; double interactions, Entry families or use of requeue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Asynchronous notification- to interrupt threads in what they are doing to communicate something, is discussed in relation to atomic actions.

Why ? Why is asynchronous notification discussed in relation to atomic actions?

A

When one participant detects an error *something* must be done…For those errors that may have spread to more participants already (which is all by common assumption :ref. merging of failuremodes) the other threads must be made aware of the error. The(better?faster?) alterntive to polling or waiting for the prepare To Commit is to immediately notfy - asynchronously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Javas wait and notify (and POSIX condition variables) works in many ways similar to suspend and resume. How can these be seen as better or more high-level synchronization mechanisms?

A

Because they are assumed called from inside a monitor (making them uninterruptable, and making testing on conditions safe), and releases the monitor when blocking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Designing parallel systems with message-sending is an alternative to basing the design on synchronization. Give shortly and by bullet points benefits and drawbacks of these two applied to real-time systems design.

A

Message based systems

more maintainable,

scale better,

makes better encapsulated modules.

not as well understood in RT context.

difficult to argue schedulability.

requires more infrastructure

and puts demands on thread model.

Better abstraction for most situations.

we leave one thread per rt demand.

synch:

scale badly,

difficult to get right.

Intuitive low level primitives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How would you rate the names of channels:

  • c, ch
  • incomingMessagesChan
  • udpSendChan
  • notAlive
  • floorChannel
  • primalCh
  • sendMsgCh• lightEventChan
  • channel_listen
  • channel_write
  • to_main, from_main
  • messageChan
A

Sverres answer:
• ’c’ and ’ch’ are great if the scope is small. (less than 2-3 lines and no nesting, says code complete - Sverre is a bit more flexible…)

  • ’udpSendChan’ is the winner in Sverres book: All messages on this channels are sent on udp. I could add ’incommingMessagesChan’ to this category, understanding that incomming Messages comes from another lift/node. ’sendMsgChannel’ and ’channel_listen’ is even weaker, and at the bottom here is channel_write.
  • sendMsgCh, channel_listen, channel_write: Not specific enough or too dependent on context? Of course it depends on the context and which metaphors are established in the system already.
  • messageChan: No seriously: All channels conveys messages….
  • notAlive, floorChannel, primalCh: No, these do not give enough, or requires massive amounts of context to be understood.
  • lightEventChan: probably good
  • to_main, from_main: Ok if the ’main’ functionality is well-established.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a deadlock ? Give an example of a deadlock with semaphores.

A

The classic example:
T1: wait(A); wait(B); dowork; signal(B); signal(A)
T2: wait(B); wait(A); dowork; signal(A); signal(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

I have suggested that systems based on shared variable-based synchronization scales badly. Can you argue this more generally?

A

At minimum, my comments could be repeated: Nested monitor calls, inheritance anomaly.

Anyway reasoning is expected to start at maintainability:

Synchronization mechanisms are operating systems mechanisms; global entities like global variables.

Semaphores are global, the Java object lock can be reserved from the outside by a “o.synchronized ” block.

Analysing the system for deadlocks (and other race conditions) is a global analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can deadlocks and race conditions happen in a message passing system?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

for a message passing system:

How do you assure the absence of deadlocks?

A

To avoid circular “dependencies” I would say was the correct answer here. Look at the communication arrows; use client server pattern, turn communication arrows by buffered “I need to communicate” signals a’la Øyvind Teigs guest lecture.

Going for buffered communication is a common solution and a reasonable answer (that we do not like so much…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

There are a number of assumptions/conditions that must be true for these tests to be usable.(“The simple task model”) Which? Comment(shortly) on how realistic they are.

A
  • Fixed set of tasks (No sporadic tasks… Not optimal but fair deal)
  • Periodic tasks, known periods (Realistic)
  • The threads must be independent. (Not realistic at all in an embedded system)
  • Overheads, switching times can be ignored (Sometimes yes, sometimes no)
  • Deadline == Period (Not optimal but Fair deal)
  • Fixed Worst Case Execution Time. (Not realistic to know a tight estimate here.)
  • and in addition: Rate-Monotonic Priority ordering. (our choice, so ok)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Operations for locking resources are always assumed to be atomic. Why is this so important?

A

Locking is often an integral part of the infrastructure allowing error handling (like in an AA) . We would like to avoid that the lock manager needs to get involved in error handling together with the action participants. (this would increase the complexity of the error handling, and possibly demand knowledge in the lock manager of the Action.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

We can prove that the deadlines in the system will be satisfied by response time analysis and utilization-based schedulability tests. Explain shortly how these two works.

A

Utilization: we have a formula which guarantees schedulability for N threads if the threads have rate-monotonic priorities and the sum of utilizations are less than a given number (dependent only in N).

Response time: For each thread we can calculate its worst case response time from its max execution time and the number of times it can be interrupted by higher-priority threads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

******** Both forward and backward error recovery can be generalized from single thread to multi thread systems Backward error recovery may, when generalizing to multi-thread systems give the domino effect. Explain. How can we avoid the domino effect?

A

Everything that looks like the “domino effect figure” is great for the first part of the question. Coordinating recovery points is the solution to the second.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we get «termination mode» asynchronous transfer of control in POSIX/C? What about Java(/RT Java)? And ADA?

A

C: Canceling of threads or the setjmp/longjmp trick…
RT Java: AsynchronouslyInterrupted Exception (Java: canceling of threads.)
Ada: select-then-abort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Inheritance Anomaly is a potential problem when we imagine inheriting from a class (in an object oriented setting likeJava orc++)where any methods synchronize. Explain where the problem lies.

A
For objectoriented inheritance we can add to or override any features of the base class. It is not given (and is in fact false) that extending synchronization by the same mechanisms will work at all. The interaction between synchronication code in parent and child classes becomes complex, and even in some cases the base class synchronization code becomes impossible to tweek into meaningful child class synchronization...
Any example will of course also do.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

List the techniques you would use together with short explanations of how they contribute to making the system fault tolerant.

System 1 is a multi threaded system to be written in C, using semaphores to protect shared resources. The size of the system and the interactions between the threads are such that you see no way of guaranteeing the complete absence of deadlocks. The system should be made tolerant to deadlocks.

A

The task here is to handle the fact that deadlocks happen - that is; detect deadlocks and then loosen the knot in some way.

  • Detection: Watchdog is a simple way. Introducing a lock manager that detects deadlocks is another.
  • Handling: We need to introduce “preemption of resources” in some manner aborting/restarting either threads or tasks. The problem here is to make this preemption without leaving the system in an inconsistent state. I would say structuring the systems functionality into atomic actions/transactions is the feasible way
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain the terms “backward error recovery” and “recovery points”.

A

Backward error recovery: If an error is detected we go back to a previous, known-to-be-consistent, state. Recovery point: One of these known-to-be-consistent, states. Typically a complete snapshot of the programs state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

We imagine a classical real-time system with threads that synchronize using semaphores: How, typically, is it decided which thread gets to run (to achieve the goals in the previous question)?

A

The threads are classified by priorities; The runnable thread with the highest priority gets to run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can the domino effect be avoided?

A

Coordinate the making of the recovery points, typically when entering an atomic action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

1 count++;

2 if(count==3){ // All here, signal the others, reset, continue

3 signal(A);

4 signal(A);

5 count = 0;

6 }else{ // Not everyone have arrived, must wait

7 wait(A); }

This code has at least one problem: Which?

A

If one thread gets interrupted after the count increment, then the last thread arriving might reset count before it gets to test on it. There is a race condition connected to the if that will lead to more threads executing the true-part rather than the false-part. These two are the most important bugs to find.

Also there is a potential problem with the count variable itself if count++ is not atomic. Reusability of the barrier is also an issue; is the barrier properly prepared for the next time the barrier is used? If a thread tries to enter again before the others have left?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Give some(3-4?) principles guiding variable naming.

A

Sverres favourites are:
• “Does the name fully and accurately describe what the variable represents?” • “Is the name long enough that you don’t have to puzzle it out?”

  • “Does the convention distinguish among local, class, and global data? ”and“ Does the convention distinguish among type names, named constants, enumerated types, and variables? ”.Sverre likes this; A name could easily signal a lot of its ’context’.
  • “Are all words abbreviated consistently?”. Sverre: More generally (not only abbreviations): consistence on all levels –> guessability!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is “Priority Inversion” ?

A

That one task ends up waiting for a lower prioritytask. This may happen when they share a resource that the lowpriority task holds.

It is ok if the student also explains unbounded priority inversion, but these two should not be confused.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Comment on the danger for starvation regarding the memory allocation task over. Suggest a strategy for memory allocation to waiting threads that does not have this problem.

A

We have the choice here between either blocking threads “unnecessary” - they are asking for little enough memory that the request could have been fulfilled or starving the big requests.

First come, first serve is a strategy that avoids this starvation (the price is blocking smaller requests unecessary…).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Grays third alternative way to make reliable calculations assumes that all data (the whole program state) is safely stored in a database, that all calculations are formulated as transactions, and that we have a system for running these transactions which also, automaticly handles any errors. In addition to the answer in the previous question; Why do we not make all (embedded, real-time) systems like this?
Large, heavy, less flexible than we need in an embedded setting. ...and it does not exist, generally.
26
How can the “unbounded priority inversion” problem be solved by scheduling strategies?
Priority Inheritance or Ceiling protocols. That the lowpriority thread get temporary increased priority - over the intermediate threads - in these situations.
27
28
Here we have timing demands on the system, like in 1-6, but we explore the possibility of making a system that guarantees all deadlines. Explain shortly how you would go about making such a system.
Divide the system into one thread per timing demand, choose a predictable scheduler, make safe estimates of WCET, perform a schdulability proof.
29
Discussing shared variable synchronization in Java, the ’inheritance anomaly’ comes up. What is the inheritance anomaly?
Integrating the Object’ of object oriented programming with the in-many-sences similar contruct the ’Monitor’ sounds like a good idea. However when it comes to classes and inheritance synchronization makes really not much meaning. We can not ’inherit’ synchronization behaviour and achieve any kind of ’hiding the details of the base class’.
30
Our goal is to make a barrier/rendezvous between three threads by using semaphores. That is, all threads should block at the synchronization point until all three are ready to proceed. 1 if(getWaiting(A) == 2){ 2 // processes waiting for the semaphore 3 // The two others are here already - continue 4 signal(A); 5 signal(A); 6 }else{ 7 // wait for threads 8 wait(A) 9 } This code has a race condition. What is a race condition? What is the problem with the code?
If a thread is interrupted just after the call to getWaiting have returned a number less than 2, but before the thread waits, the program will deadlock if the other threads arrives.
31
Writing to log is an alternative to creating recovery points. How does this work in the context of (single-thread) backward error recovery?
The introduction to the log was motivated by getting up after a restart to a consistent state. Here, however, we ask for the other usage for the log: To also write before-state into log records, and UNDO log records. A recovery point then becomes just a position in the log It is reasonable (but not expected) also to mix AA/transactions into this answer, since that infrastructure adds much to the log functionality.
32
“Standard” error handling is to test for error situations and then make code that handles the detected errors. The fault tolerance part of this course is however motivated by the fact that this is not good enough: 1. A real-time or embedded system often have higher demands to reliability; We must handle also unexpected errors. (The errors that you did not think of when making the program and the bugs that you are unaware of.) 2. We also often have more cooperating threads in a real-time software system. Sometimes these threads must cooperate also on error handling. How can we detect these unexpected errors ? (explain shortly.)
Acceptance tests: Give demands to the correct state/result rather than testing on error situations. Static redundancy: Given intermittent errors or independent systems this catches errors that would impossible in other ways.
33
Shared variable synchronization may be criticized for poor scalability. Give, shorly, arguments for this.
Synkroniseringsmekanismene har preg av globale variable. Å sjekke om det virker blir en global analyse. Vi kan ikke se på en kodesnutt om den virker, uten å kjenne all bruk og interaksjon med synkronisering i hele programmet. Andre gode argumenter må også belønnes.
34
To increase the abstraction level and give flexibility compared to semaphores, POSIX, Java and Ada have landed on different variants of monitors: POSIX combines mutexes and condition variables, Java has synchronized methods and wait/notify/notifyAll, and Ada has guarded entries in protected objects. Describe shortly how these three works
POSIX: I wait-kallet på en condition variabel kan en angi en mutex som skal frigis midlertidig. Ikke noe kobling til modulbegrep ut over hvordan en velger å bruke mutextene. Java: Alle objekter kan beskytte metoder mot samtidig aksess fra flere tråder ved å merke metodene som synchronized. Fra innsiden av en synchronized metode kan en kalle wait for å suspensere seg selv, og frigi objekt-låsen midlertidig. notify vekker en slik suspendert prosess for objektet, mens notifyAll vekker alle. Ada: Her har vi funksjoner, prosedyrer og entries som aksesserer et protected object. Blokkering fra innsiden av objektet er ulovlig (noe som rydder godt opp i dette med composition - det blir ikke mulig). Funksjoner kan bare lese interne variable, og kan dermed kjøre mange i slengen. Prosedyrer og entries krever eksklusiv aksess, og entriene kan være guardede - guardene kan bare teste på objektets private variable. Nevne requeue kanskje?
35
Which downsides exist to using exception handling in a program/real-time system.
* Partly difficult and non-intuitive semantics. * invisible program flow paths. * goto without knowing where you came from or where you are going * Large and unpredictable overhead at error.
36
The Suspend() and Resume(Thread) calls have been described as unusable when it comes to programming error free synchronization between threads. Explain why
Basicly code will end up with race conditions; depending on what happens first - one thread suspending itself or the other one resuming it. Getting around this is very difficult if not impossible. Testing on conditions for suspending itself does not work since we may be interrupted after the execution of the test and the suspend call.
37
Give a short description of how the synchronization primitives in Java (synchronized methods, wait, notify and notifyAll) works.
Java: Any method in a java object can be denoted “synchronized”, which means that calls to this method will happen under mutual exclusion with other synchronized methods. wait(): A call to wait() will suspend the current thread; It will be resumed by a call to one of the notify() calls (or by somebody calling interrupt() on the thread) notify() wakes an (arbitrary?) thread blocked by this objects lock. notifyAll() wakes all threads blocked by this objects lock.
38
An Atomic Action has start, side and end boundaries. What is the purpose of these boundaries ?
To make clear limitations to what consequences an (unexpected) error can have, so that error handling gets possible. Start: To establish which participants may be affected, and to set a safe, consistent, starting point (if an error have occured, it must have happened after the start point.) Side: Limiting communication (restricting messages, locking variables...) to members to hinder error spreading out of the Action. End: Ensuring a consistent system before leaving the action so that any errors did not spread / have consequenses after end of Action.
39
“Resource control” is more than mutual exclusion... What more can we expect from a system for resource control?
Open answer here, but the 5 applications of Blooms criteria should be in (Different criteria for who gets the resource when more are waiting): 1. Different types of requests (Ex. r/w locks) 2. Order of requests (Ex. FIFO/LIFO) 3. Server State (Ex. history, state, mode) 4. Parameters of the request (Ex size, amount, importancy) 5. Priority of client
40
Occasionally this can happen even inside of words: ... SveOla Normann Osrre Hendseth Trondheimsveien 34 loveien 23 ... What is the problem here? What do we call such faults that occur "occasionally"?
The library that printf comes from is not reentrant. Race Conditions
41
The priority ceiling protocol, in addition to solving the unbounded priority inversion problem, also have the property that it avoids deadlocks in the system. Explain how.
The trick is that since we know beforehand which resources a given thread uses, and that the priority of this resource is set to max+1 of all the using threads, it is impossible for any thread owning a given resource to be interrupted by any other thread also (potentially) wanting the same resource. As soon as T1 has allocated resource A, T2 will not even get to run (so that it can allocate resource B), since it has lower priority than T1 now has.
42
“Resumption model” is used to describe possible implementations of both signals, asynchronous notification in general and exceptions. In most cases it is seen as less useful than “termination model”. What is the difference? Why is resumption model less useful?
The difference is on where the execution continues after the signal/exception/notification. If the execution continues where it left off nothing is gained - the program must poll to check wether it has been interrupted.
43
Define the terms deadlock and race condition.
Deadlock: More parts of the system waits for each other in a curcular wait, locking the system in a state it cannot get out of. Race Condition: A bug that surfaces by unfortunate timing or order of events.
44
Why can Adas guards only test on the protected object’s private variables?
If it were not for this restriction we would not be able to know when to re-evaluate the guards and wake up sleeping processes.
45
List the techniques you would use together with short explanations of how they contribute to making the system fault tolerant. System 3 is a system where timing behaviour is critical, but where making a system that gurarantees that all deadlines are met is seen as infeasible/ too conservative. Your system should be tolerant to timing errors.
This has not been discussed deeply in the lectures; The students should still be able to reason on the problem. Any mature answer should be credited. Also here detection and handling is a way to go: Deadline misses and/or overload situations can be detected by comparing the time of the sideffects with their deadlines - or by having timers interrupt us when a task is out of time. Handling must be application dependent and if the students answer hints of understanding this i think it is very good.
46
Some times the program prints strange errors in the address list (Sverre Hendseth and Ola Normanns entries are mixed up): ... SverreOla Normann Osloveien 23 Hendseth Trondheimsveien 34 ... What is the problem?
One thread gets interrupted while printing a person, leaving another thread to print its person before the first one gets to finish.
47
What do we achieve (in the domain of error handling) by using static redundancy?
Error masking (of all errors as long as the replicas are independent!) and detection. Given modules with failure probabilities, we can throw replicas at the problem until the probabilities are acceptable.
48
If we imagine a larger system of threads that communicate by this two-semaphoresper-channel pattern, the number of semaphores in the system may become very large. Give an example showing how deadlocks can occur in such a message passing system.
Thread1 to tries to send something to Thread2 and vice versa - nobody listens.
49
Give a short description of how the synchronization primitives in ADA (Protected Objects, functions, procedures and entries with guards) works.
Ada: A **protected object** is a module, a collection of functions, procedures and entries along with a set of variables. **Functions** are read-only, and can therefore be called concurrently by many tasks, but not concurrently with procedures and entries. **Procedures** may make changes to the state of the object, and will therefore run under mutual exclusion with other tasks. **Entries** The important thing is that these are protected by guards - boolean tests - so that if the test fails, the entry will not be callable - the caller will block waiting for the guard to become true. These tests can only be formulated using the object’s private variables.
50
A simple and straightforward alternative to checkpoints is to write to the “log”. How does this work, and what do we achieve compared to the checkpoints?
Rather then writing the complete state to the storage, only the changed part of the state is written: As each part of the state (i.e. variable) is changed, the new value of the variable is written. When doing the recovery, the latest state is reconstructed by executing all the logrecords. Thisletsusavoidwritingthecompletestateeverytime-whichmaybeverybig.(Many more things to mention here...)
51
A way to achieve reliable calulations is process pairs. . Give a short explanation on how this works. What do we gain here compared to checkpoint/restart?
We get availability here - checkpoint/restart is dependent on the restart and any repair. Two programs, the primary and the backup are at the same time. The primary does the side effects and sends the checkpoints to the backup along with IAmAlive messages. The backup broadcasts IAmMaster (both to the clients any other “servers”) when enoughIAmAlivemessageshavebeenmissed-andcontinuesfromthelastcheckpoint, probably starting with the last prepared side effect/reply.
52
When doing forward error recovery in a multi thread setting, the need arises for the different threads to get to know about errors that happen in other threads. List mechanisms that can be used to convey such information between threads.
Avstemmingen i etterkant av en AA. En kan polle feilstatus variable, eller hvis systemet er meldingsbasert kan en sende feilmeldinger. Ellers er asynchronous transfer of control det som er mest behandlet i boken: select then abort i Ada, AsynchrouneslyIterruptedExceptions i Java, og setjump/longjump -trikset eller pthread\_cancel i C/POSIX.
53
54
Gray had the thought that we could build fault tolerant computer systems if we only had reliable data storage, reliable communication and reliable calculations... Refer shortly how we correspondingly can achieve reliable communication.
Redundancy is the keyword also here in the form of ack/timeout/resending. Also here some more details should be included: Sessions, checksums, sequence numbers, ...
55
What do we achieve by merging error modes?
We achieve a simpler system. Even though it would be possible to handle all different errors differently / do forward error recovery it is not always feasible since errors happen seldomly anyway. Some errors might get more drastic consequences than strictly necessary, but again - it happens seldomly.
56
Wrong use of semaphores may lead to deadlocks: What is a deadlock? Give an example on how this can happen.
“The system locked in cicular waiting” or “A state that the program cannot leave”. We expect the standard wait(A);wait(B) / wait(B);wait(A) solution here. But other examples must also be accepted.
57
Error detection — to decide whether something is wrong – can be done in more ways than checking for error returns. List principles that can be used to detect errors.
• Replication Checks • Timing Checks • Reveral Checks • Coding Checks • Reasonableness Checks • Structural Checks • Dynamic Reasonableness Checks
58
Why is FPS more commonly in use ?
This is surprising since EDF comes out better in more schedulability tests... The book lists: ## Footnote * Easier to implement * Easier to incorporate tasks without natural or hard deadlines. * Though peroid (as used by FPS/RMS for setting priorities) is not really a good measure of importance, but it is far better than absolute deadlines. FPS priorities can be tweeked... * The overload behavior of FPS may be preferred - see previous point. * The FPS/RMS schedulability test is unnecessary pessimistic (SH: seriously? are we making systems that may or may not work?)
59
Criticize the following procedure names shortly (picked from the 2016 projects). Feel free to group them if relevant. • void elev\_master\_running() * void start\_elevator() * int get\_q(int floor, int button\_type); * int elev\_get\_floor\_sensor\_signal(void); * void\* orders\_ctrl\_main(void\* arg); * void\* comm\_ctrl\_loop(void\* arg); * void\* manage\_slave(void\* slave\_id\_void\_ptr);
* Describe everything the routine does * Avoid meaningless, vague, or wishy-washy verbs * Don’t differentiate routine names solely by number * To name a function, use a description of the return value * To name a procedure, use a strong verb followed by an object * Use opposites precisely * Establish conventions for common operations
60
In the same system: How, typically, is it decided which of more threads waiting on a semaphore, gets to run when the semaphore is signaled?
Unfortunally priorities are the wrong answer here; First come, first serve is the normal behavior. This is to avoid starvation...
61
You have already made a module for keeping track of names of people. It has, among other parts of the interface, these functions: getFirstName, getLastName, setFirstName and getName. The last one calls the two first ones before it returns the complete name. The module worked perfectly until a multithreaded version of the program was made... What kind of problems with such a module can surface when it is used in a multithreaded program?
The module is not reentrant; Any datastructures (shared resources) used by the module may be accessed by more threads at the same time, or even worse; references to internal data-structures may be returned to the caller for concurrent access other places in the program.
62
There are a number of assumptions/conditions that must be true for these tests to be usable. (“The simple task model”) Which? Comment (shortly) on how realistic they are.
* Fixed set of tasks (No sporadic tasks... Not optimal but fair deal) * Periodic tasks, known periods (Realistic) * The threads must be independent. (Not realistic at all in an embedded system) * Overheads, switching times can be ignored (Sometimes yes, sometimes no) * Deadline == Period (Not optimal but Fair deal) * Fixed Worst Case Execution Time. (Not realistic to know a tight estimate here.) * and in addition: Rate-Monotonic Priority ordering. (our choice, so ok)
63
What is an Atomic Action? Which problem(s) is Atomic Actions meant to solve?
More other good answers exist for the first part here. Pointing out the three boundaries (side, start and end), possibly with standard mechanisms to achieve them (locking, explicit membership, and two-phase commit protocol) is reasonable enough. The problem to be solved: If more participants cooperates on something, they must possibly also cooperate in handling errors. AA provides the framework for achieving this, containing the errors and avoiding the domino effect. “A mechanism/infrastructure for error containment when we have cooperating threads.” basicly covers both questions quite well.
64
What is starvation?
One thread does not get the resources it needs due to unfortunate scheduling.
65
Deadlocks can of course also happen in systems where interaction between threads is message based. Construct an example from this domain.
T1: recv(T2); send(T2); T2: recv(T1); send(T1); Should really be trivial. - recv(T3);...
66
Handling the log gets more difficult if we have more parallel tasks, where some succeeds and some fails, and all generate log. How can we extend the log (compared to the single-thread version) to handle this (still in a backward error recovery perspective)?
: Ikke verre enn å operere med “TransaksjonsId’er” som knytter log-recordene til den enkelte oppgave. Recovery-punktene i logen kan da settes per-oppgave.
67
Give examples of what one can test for when making acceptance tests.
This is the “learn-by-heart” list from the book. * Replication checks (N-version programming). * Timing checks (watchdog, deadline). * Reversal checks (calculate input from output and compare). * Coding checks (checksums). * Reasonableness checks (variable range, assertions). * Dynamic reasonableness check (reasonable compared to prev. value). * Structural checks (integrity of data structures).
68
Suggest a general rule for how channels should be named.
Sverre tends to naming channels by describing the elements carried on the channels, but other parts of the channels context is as relevant sometimes.
69
Why is “scheduling” so important in real-time programming? — Which features are we hoping to gain in our system by choosing good scheduling strategies?
Predicatbility in timing domain, so that systems can be analyzed. Simple analysis, not too conservative analysis - facilitating high utilization. Ensuring that all thread meed their deadlines is ok. Comments on resource allocation (of other resource than CPU) and Deadlock avoidance are also good. Performance and fairness is bad answers.
70
Sverre has an issue with nested ’while(true)’ loops (infinite ’for’ loops, for the go programmers out there :-) ), claiming that it is bad form to nest infinite loops. Argue
What you see is not what you get when a loop gets hijacked by an inner loop (infinite, blocking, spinning, polling...).
71
Give a short description of how process pairs work.
Figure is good showing the master and backup processes sending IAmAlive, status and IAmMaster messages. That the backup takes over when the master dies should be clear, and it is very good if the consistency of service is argued.
72
How does priority inversion influence the schedulability analyses?
Invalidates the assumptions for the (basic form of) analyses to be valid. Can partly be compensated for by more complicated analyses, and more conservative results in terms of utilization. Difficult question so we can give credit for any mature answer.
73
Compare the Java and Ada mechanisms here, give the main strengths and weaknesses.
Blooms criteria, expressive power and ease of use, bør komme med her som standardmåten å sammenligne slikt på (men de trenger ikke listes og gås systematisk igjennom). Adas svakhet at vi ikke kan ha guarder på parametre bør komme frem som et minimum. Det at guarder kommer heldig ut på ease-of-use-siden er flott om det nevnes.
74
Why is Asynchronous Transfer of Control seen as relevant for use in Atomic Actions?
Because for example when an error is detected by one thread, it might mean that the work done by another thread is not meaningful anymore -\> It needs to be interrupted.
75
More of these “resource allocation” problems does not occur in pure message passing systems (“message-based synchronization” as Burns&Wellings calls it). Why?
There are no shared resources.
76
Imagine a system of mobile sensor vehicles sharing updates to via a base station, or a process control plant where the operators are allowed to program alarm scripts. In both these cases we may get unpredictable resource allocations. How can we avoid or solve any problems with deadlocks in such systems?
Detection: - Who owns and asks for what (detect circles) - Timeout/Watchdog Recovery: - Breaking mutual exclusion - Preemption - Abort of thread or Atomic Action
77
When an acceptance test fails, we are left with the knowledge that something is wrong, but we do not necessarily know exactly what. How can we recover i such a situation?
Backward or forward error recovery; just get to any consistent state. Merging of failure modes is also relevant
78
You have already made a module for keeping track of names of people. It has, among other parts of the interface, these functions: getFirstName, getLastName, setFirstName and getName. The last one calls the two first ones before it returns the complete name. The module worked perfectly until a multithreaded version of the program was made... you decide to protect the module with semaphores by allocating (wait) a given binary semaphore at the start of each function, and releasing it (signal)just before returning. This does not work: What happens ?
Since getName calls getFirstName and both allocates the semaphore we will deadlock.
79
What is starvation?
A thread does “by accident” not get necessary resources.
80
What is “Asynchronous Transfer of Control”?
...that one thread can interrupt, with termination mode (so that it does not continue where it left off after the interruption) another thread.
81
Process pairs is a technique for achieving available (in addition to reliable) processes. Which of the following points are the central building blocks/principles for implementing process pairs? 1 An “I Am Alive” protocol that lets the backup process detect whether the primary process is alive and working. 2 A (two-phase) commit protocol between primary and backup processes ensuring that they have a consistent view of the current state. 3 Sending state updates from the primary to the backup so that the backup will start working in the current state if/when it becomes primary. 4 Highly reliable and available messages must ensure that all IamAlive and State Update messages reach their destination and in the correct order. 5 Some mechanism for letting clients relate to the process that currently is primary (like broadcasting “IamPrimary” also to clients). 6 Process independent message queues that ensures that messages are not lost even though a process restarts.
1,3,5.
82
We can imagine that after some time such a program can generate huge amounts of log. “Checkpoints” is a design pattern that lets us delete old log. How does this work?
Into the log, at suitable intervals, we write a (non-consistent since there may be active operations going on) complete snapshot of the program state along with a list of active operations. Log that is older than the last checkpoint which lists no operations that is still currently active can be deleted. Recovery will then start by intitalizing to this checkpoint, and then executing the relevant logrecords from then on.
83
What is the benefits by structuring the functionality of your program into Atomic Actions? Any drawbacks?
Good: Error handling framework generalizeable to more participants! Errors do not have consequences beyond the AA. Bad: Bad composition, some infrastructure needed, one-design-fits-all ? Handles RT systems and side effects not that well.
84
A shared variable system written in C using semaphores, but here it may be possible to make a system that is guaranteed to be without deadlocks. List a number of approaches to making deadlock-free systems
* The priority ceiling schedulig protocol. (If all tasks reasource usage is known at compile time.) * Or using a lock manager avoiding risking deadlocks (bankers alg.) (If all tasks reasource usage is known at compile time.) * Formal verification. Like LTSA, FSP etc. * Any deadlock detecting/handling scheme. * Invalidating any of the 4 necessry conditions of deadlocks. * ...
85
“Shared variable synchronization” has its challenges: A number of problems(classes of bugs) may occur in a shared variable system that are not present in single thread systems. Which?
* Deadlock: system blocked in circular wait * Livelock: system locked in a subset of states (like deadlock but we use CPU) * Starvation: A thread does by accident not get necessary resources. Ex: Unfair scheduling. (Ref discussion on whether you should make assumptions on how the sceduling works. Ref. Go vs. Occam: & Who is waked by a signal?) * Race Condision: A bug that surfaces by unfortunate timing or order of events.
86
A number of modern programming languages comes with built-in mechanisms for exception handling. What are we hoping to achieve with these mechanisms?
* Easy-to-understand mechanism to make errorhandling simple. * Separation of error handling code and normal-operation code. * Same mechanism for handling different types of exceptions. * No overhead in normal operation? * Allow recovery actions :-)
87
Gray had the thought that we could build fault tolerant computer systems if we only had reliable data storage, reliable communication and reliable calculations... What is , in this context the errormode(s) of a “calculation”, and how are they detected?
:Error mode: not delivering the next correct sideeffect. Detected by acceptance tests.
88
A general advice on Java programming is to avoid using notify() and rather use notifyAll() combined with while loops around all the wait()’s. Why?
If there are more different reasons for waiting, you cannot easily know who you awaken with a notify. If you awaken somebody waiting for something else than what got ready, the system may misbehave. Even in a system with only one reason for waiting, this may change due to future reuse, maintenance or inheritance. From the module perspective using only notify may demand knowledge of the usage patterns of the object. That is, yielding not so good encapsulation.
89
90
Gray had the thought that we could build fault tolerant computer systems if we only had reliable data storage, reliable communication and reliable calculations... Reliable calculations are the most difficult in this triplet since the failure modes is difficult to describe in the general case. The curriculum sets up “Checkpoint/Restart” as the first alternative: Describe how this works.
Just before any side effect a set of acceptance tests are run, if ok the complete state of the program is written to safe storage - the checkpoint. If tests are not ok, the program is restarted, and loads the previous checkpoint, continuing from there.
91
“Transactions” are almost the same as “Atomic Actions” in the Burns&Wellings book, and are relevant when making fault tolerant computer systems. How do transactions contribute to fault tolerance.
The perfect answer here would be the ACID properties of a transaction: Atomicity (it happens all-or-nothing), Consistency (a transformation from one consistent/correct state to another), Isolation (partial results are not propagated out of the transaction before it has finished so that errors can be handled) and Durability (the result of a calculation is never lost),
92
What do we achieve (in the domain of error handling) by using dynamic redundancy?
(Acceptance tests must take care of error detection...) We increase the probability of being able to successfully recover from any errors, and get a flexible framework for how this is done — If the first way of doing something did not work, maybe we can do it in another way. (Or; if the first actor could not do it, maybe the backup actor like in process pairs e.g.)
93
What is the “unbounded priority inversion” problem?
That a high priority thread potentially might be waiting forever (unbounded time) for lower priority threads. This happens when a lowpriority thread owns a resource that the high-priority thread needs. Unboundedness comes from the unknown number of intermediate priority threads potentially wanting to run. Mention of the “intermediate-priority” threads is necessary for a good answer.
94
That the threads are cooperating means that they will have some need for interaction. This interaction usually takes the form of either “synchronization” or “communication”. Explain these two shortly and/or give examples.
Synchronization: access to a common global variable is restricted by mutual exclusion with e.g. semaphores. Communication: One thread sends a message to the other.
95
Describe one way of implementing an Atomic Action.
The clear and systematic approach here is to tell how the tree boundaries are made: Start boundary can be implemented by the ActionController keeping track of ActionMembers through some “entry protocol”. Side boundary can be implemented by some kind of locking of resources. Or limitations on communication: Participants can not communicate with nonparticipants. End boundary can be implemented by tho-phase commit protocol (or another kind of barrier)
96
If we envision a lot of readers and writers, then such use of notifyAll may be unfortunate. Why?
We can have a lot of waiters at a time, and waking all of them, only to loop and go to sleep again is a waste (and bad abstraction).
97
The shared variable synchronization contruct, "Monitors", have a problem with composition; We must be very aware when calling one monitor from the inside of another. What is the problem?
Locking the inner monitor /must/ also lock the outer monitor (this has to do with ’releasing control in safe places’). Getting the outer monitor locked in odd places leads to all the same problems that we originally had with semaphores, just on one call/abstraction-level higher.
98
The failure modes is the ways a system can fail. To “merge failure modes” is a technique: What do we gain by doing this ?
• Simplification of the system. (If handling the worstcase error anyway, maybe all other errors can be handled the same way) • Error modes is part of module interface: Fewer error modes enhances modularity / maintenance / composition by reducing size of interface. • Handling unexpected errors, since merging of failure modes also can encompass unknown error modes...
99
In a system where exception handling is used, the following code may cause a deadlock: ## Footnote { Wait(semaphoreA); f(); Signal(semaphoreA); } Explain what the problem is and how it is solved.
f may throw an exception, causing the semaphore never to be signaled. We solve this with “final wishes”, a piece of code that will allways be executed - even when an exception is thrown.
100
Good scheduling is important in real-time programming to ensure that all deadlines in the system are satisfied. How do we achieve this by scheduling strategies?
First we can schedule the most “critical” thread first like in EDF (or even rate-monotonic priority ordering), kind of increasing the chance that all deadlines will be met. But predictability is as important paving the way for schedulability analysis.
101
If the program state is big or we have more concurrent ongoing operations that can fail individually, then log might be a good alternative. How does this work in context of backwards error recovery?
Just before any part of program state is changed, the before-state of that part of the program state is stored away as a log-record. These are also typically labeled by the operation they are a part of so that the operations (the atomic action...) can fail individually. When an error is detected the log records are "undone” in the opposite order they was created.
102
An Atomic Action has start, side and end boundaries. How can each of them (start, side and end) be realized?
Start: In static systems this maybe hardcoded. If not, some kind of explicit membership list is ok, A action manager can keep track of the members of each action. Any recovery points may also be established at start boundary, if preparing for backward recovery. Side: Typically some kind of resource locking of resource to action. From transactions we learn that the transaction id should be part of all communication, meaning that all threads wanting to act on a message, must join the transaction. End: Acceptance tests, and any vote or synchronization; Twophase commit protocol.
103
In a real-time program we often have more concurrently running, cooperating threads. Why is this a reasonable way of organizing the functionality of a real-time system?
Keeping deadlines is difficult in a coupled system. Dividing the system into one thread per deadline is a simplification.
104
In one of the synchronization problems that Bloom put up, Adas mechanisms fails. Explain.
Ada fails at the case where an entry parameter is necessary for the resource allocation - like when allocating N of a resource. Guards cannot test on parameters (only private variables), leading (supposedly - the book does not demonstrate this) to “double interactions” or to the more complex application of the “requeue” or “entry families” mechanisms.
105
Explain shortly how a Priority Ceiling protocol works. (That is, choose one of them.)
Immediate: Allocating a priority value to shared resources in the system, and letting a thread allocating the resource get the resource’s priority (or really: max(current,resource)) while it owns it. The resource priorities should be equal to the max of that of all threads using the resource.
106
Are there any disadvantages by using exceptions in a program, real-time system or programming language. (Answer briefly).
* Error handling gets very expensive. * Must get final wishes right. These precautions is dependent on other parts of the code - breaking encapsulation. * Does it make the code easier to maintain?-“ a go to where you do not know where you came from and where you are going”? - And the mechanism is not allways simple to understand itself.
107
With Adas Protected Objects, readers/writers locks are very easy tomake; Explain how.
Ada differs between functions (that cannot change the object state) and procedures (that can), and allows functions to “run in parallel”. That is, use functions for readers, and procedures for writers.
108
Message based interaction between threads leads to a very different design than shared variable synchronization. How? Describe shortly the difference in designs.
Shared variable synchronization focuses on avoidance of the problems with more threads sharing common resources. Apart from the added complexity of synchronization the threads look like “normal”; programs working on data. Messagepassing systems have ideally no shared resources; each resource is managed by a thread, and other threads must access the resource by communicating with this. Most threads in a messagepassing system is built around the while-select loop. There are usually far more threads in a messagepassing system since we have more reasons to create them (...like managing resources)
109
Gray had the thought that we could build fault tolerant computer systems if we only had reliable data storage, reliable communication and reliable calculations... Refer shortly how we can achieve reliable storage (that is, built on unreliable storage)
Redundancy is the keyword, but for a full score \*some\* more details should be mentioned: refresh-thread, writeback on error, something on the error modes, ...
110
Given that we have detected an unexpected error... How can we know what we must do to handle the error ?
Difficulty is that we did not know the cause... But it is often solvable by merging error modes: “I failed, no matter the reason I now must do...” A reference to AA or transactions should be included: This lets us reason on and put limits on the possible consequences of the error. Recovery points / backward error recovery is a catch-all: Lets us go back to known consistent state (and possibly try again).
111
Optimistic Concurrency Control is a technique that can be used to avoid the overhead of locking. How does this work?
It works by assuming, optimisticly, that conflicts will not happen, and detects and handles it as an error if it happens.
112
What is forward error recovery?
We try to compensate and/or correct the error we have just detected
113
What purpose does process pairs fill?
Obviously fault recovery, but also importantly availability, minimizing service downtime. We could speculate further for bonuspoints; paving the road for online upgrade?
114
“synchronization” or “communication”. Compare shortly and in bullet points how well suited these two are for a real-time system.
Though message passing systems are well suited for large concurrent systems - scales well - easy to modularize as clients and servers, fantastic way of dividing a system into modules etc... ... message passing systems are not so well developed for real-time applications - no/-few schedulability proofs and inviting a larger number of (cooperating) threads which again makes scheduling & scheduling predicability harder. Maturity questions again - any well-funded reasonable answer may receive score.
115
What is the difference between Tansactions and Atomic Actions? How does this difference make Atomic Actions more attractive to use in a real-time system?
:Transactions have one and only one errormode: ABORT. While an AA is more of a transactional framework where both forward and backward error recovery is possible. ABORT might not be an option when you have a deadline, since we have wasted some of our time already.
116
There are a number of assumptions/conditions that must be true for analysis and utilization-based schedulability tests to be usable. (“The simple task model”) Which?
* Fixed set of tasks * Periodic tasks, known periods * The threads must be independent. * Overheads, switching times can be ignored * Deadline == Period * Fixed, known Worst Case Execution Time. * and in addition: Rate-Monotonic Priority Ordering or deadline first.
117
Readers/writers locks is an interesting case for discussing starvation. Why?
The need for readers/writers locks is motivated by a lot of readers. If there are a lot of readers, they may overlap in execution, starving any writers.
118
Readers/Writers locks is a variant of mutual exclusion where we recognize that we do not need mutual exclusion between more readers. The writers still needs to run under mutual exclusion — with each other and with the readers. How is it that we do not need mutual exclusion between multiple readers, when it is necessary between the writers?
``` A reader (per definition, here) does not change the (value of) the resource, meaning that all readers will see the same, consistent state - that is, no problem. More writers may if they interrupt each other see intermediate (inconsistent) states or overwrite each other partial writes. ```
119
Mention some hardware (/assembly) mechanisms that are used for achieving basic synchronization.
Disable interrupt, test and set/ swap, spin locks.
120
What do we achieve by using process pairs?
Fault tolerance by (dynamic) redundancy. High availability since the switching between replicas happens immediately.
121
Why/When would we use readers/writers locks in place of ordinary mutual exclusion?
This is a performance issue-typically we have a lot of readers/reads and few writers/writes, and locking/serializing these calls hampers performance/removes parallelism.
122
How can we detect these unexpected errors ? (explain shortly.)
Acceptance tests: Give demands to the correct state/result rather than testing on error situations. Static redundancy: Given intermittent errors or independent systems this catches errors that would impossible in other ways.
123
A purpose of a log could be to enable program restart, so that a restarted program can get initialized to a current, consistent state e.g. after a crash. How?
By, just before any side effect or program state change, the intended effect/new value is stored in the log - typically as part of the same log record as the old value. At restart all (relevant) logrecords are “executed” in order.
124
Recommend (give an as general rule as possible) how a function should be named. (A function returns a value but have no side effects.)
Functions should be named after their return values. Code Complete checkpoint: “Is the routine’s name a strong, clear verb -plus- objectname for a procedure or a description of the return value for a function?”
125
Generally; List methods you can use to avoid that deadlocks becomes a problem in a software system.
• Deadlock Prevention (Fjerne en av de 4 betingelsene) – Optimistic concurrency control (!) – Allokere alle ressurser samtidig. – Preemption: (timeout, priority...) – Global standard allokeringsrekkefølge. – Pluss denne: Global analyse - modellere & bevise fravær av deadlocks. • Deadlock Avoidance: – Resource allocation (Bankers algorithm) – Scheduling algorithms (Priority Ceiling) • Deadlock detection & Recovery: – Detection: ∗ Analyse av hvem som eier og forespør hva ∗ Timeout/Watchdog – Recovery: ∗ Breaking mutual exclusion ∗ Preemption (Ex. -\> Forward Error Recovery) ∗ Abort av Tråd eller Atomic Action (-\> backward E.R.)
126
How does Ada avoid any of these problems with nested monitor calls.
In Ada blocking - for any reason - from the inside of a monitor is a runtime error. Monitor calls is assumed to be “short”. Nested monitor calls \*are\* possible, just not blocking on one. Unnecessary detail for full score: Blocking on monitor access itself is not defined as blocking, but blocking on a guard is.
127
Message passing systems are not traditionally seen as very suited for implementing systems with real-time demands. Why?
* Schedulability proofs are not well developed. * Traditionally we have in RT systems been closer to HW, maybe even without an OS. The message passing infrastructure might not be available, * ...or it might not be to heavy/slow. * In synchronization-based RT systems we have “One thread per timing demand” and we handle these threads with priorities. While in processoriented systems we make threads of other reasons also, possibly making it difficult to assign priorities to them in any meaningful way. * ... There are many other reasonable arguments here
128
Gray had the thought that we could build fault tolerant computer systems if we only had reliable data storage, reliable communication and reliable calculations... Reliable calculations are the most difficult in this triplet since the failure modes is difficult to describe in the general case. The curriculum sets up “Checkpoint/Restart” as the first alternative: Describe how this works. What is the error modes, and how are they detected?
:Error mode: not delivering the next correct sideeffect. Detected by acceptance tests. Just before any side effect a set of acceptance tests are run, if ok the complete state of the program is written to safe storage - the checkpoint. If tests are not ok, the program is restarted, and loads the previous checkpoint, continuing from there.
129
What is a live-lock?
A live-lock is a bug where a subset of states that does not fill the whole functionality of the system is entered, and where there is no way of leaving this subset.
130
The schedulability proofs probably have not been performed for most real-time systems out there in the world. Why do you think the industry is reluctant to perform these proofs?
The assumptions does not hold. The execution time bounds are all too conservative. The SW is too complex to fit into the standard model. (System seems to work well enough after testing)
131
In a traditional real-time system we often know exactly which threads that uses which resources. This opens up for some techniques that lets us avoid the problem of deadlocks. Which?
Priority Ceiling and Bankers algorithm, possibly also “Global allocation order” and “allocate all resources at once” and formal verification.
132
Reliable calculations are the most difficult in this triplet since the failure modes is difficult to describe in the general case. The curriculum sets up "Checkpoint/Restart” as the first alternative: Describe how this works. What is the error modes, and how are they detected?
Error mode: not delivering the next correct side effect. Detected by acceptance tests. Just before any side effect is done, a set of acceptance tests are run: If ok, the complete state of the program is written to safe storage - the checkpoint. If tests are not ok, the program is restarted, and loads the previous checkpoint, continuing from there.
133
Interaction between different parallel tasks may lead to “inversion” problems (like priority inversion). Explain what the problem is.
If a task with high priority is dependent on a resource owned by a low priority process it will be blocked waiting for something that may not run for a long time given the low priority of the resource holder.
134
Explain shortly how process pairs work.
Twoprograms,theprimaryandthebackuparerunatthesametime. The primary does the side effects (like “send answer to the client”) and sends the program state/checkpoints to the backup (though in the opposite order!) along with IAmAlive messages. The backup broadcasts IAmMaster when enough IAmAlive messages have been missed - and continues from the last checkpoint.
135
List the techniques you would use together with short explanations of how they contribute to making the system fault tolerant. System 2 is the controller program for a single lift as you know it from the project. That is, there is no coordination of more lifts, but the spec still says that no button presses should be lost, and you should protect against power outages, harddisk crashes etc.
* Redundancy! If one controller/controller/hard disk fails, we should rely on the other one—We have a number of patterns here; static redundancy, processpairs, n-version programming... * Merging of failure modes: \*If\* something/anything goes wrong, then fall back on trusting the redundancy. * Acceptance tests: This way of detecting errors ensures that even unexpected errors are handled.
136
The book sets up four necessary conditions for deadlocks to occur. Which? And how can deadlocks be prevented by seeing to it that each of these are not met?
Necessary Conditions: 1. Mutual Exclusion 2. Hold & Wait 3. No Preemption 4. Circular Wait ``` Deadlock Prevention (removing one condition): 1. Optimistic concurrency control. ``` 2. Allocate all resources at once. 3. Preemption. 4. Global allocation order.
137
Backward error recovery is some times seen as not suited in a real time system. What is backward error recovery, and why is it not suited?
• The “giving up to preserve consistency” error mode is often not acceptable since we have interactions with other systems that may be dependent on us to behave correctly. • We have no time to waste; when we report back to the module that initiated the failed operation it might be too late to retry/fix it.
138
Acceptance tests are an “enabeling technology” for process pairs. Explain specificly how acceptance tests contributes to the functionality of a process pair.
It is extremely important that no (unexpected) errors propagate from the primary to the backup. The status messages \*must\* be error free. Acceptance tests is the mechanism that ensures this. A perfect answer should contain the sequence; Do work — perform acceptance test — send status to backup — do side-effects. If the primary crashes, the slave executes the (possibly duplicate) side effects.
139
When doing forward error recovery in a multi thread setting, the need arises for the different threads to get to know about errors that happen in other threads. List mechanisms that can be used to convey such information.
Avstemmingen i etterkant av en AA. En kan polle feilstatus variable, eller hvis systemet er meldingsbasert kan en sende feilmeldinger. Ellers er asynchronous transfer of control det som er mest behandlet i boken: select then abort i Ada, synchrouneslyIterruptedExceptions i Java, og setjump/longjump-trikset eller pthread\_cancel i C/POSIX.
140
Two wellknown scheduling strategies is FPS(Fixed Priority Scheduling) and EDF (Earliest Deadline First). Explain the terms FPS and EDF.
FPS: Alltasks geta fixedpriority, andthe scheduler allways lets therunnable highest priority thread run. EDF: No predetermined priorities are given; the scheduler allways runs the task with the earliest (absolute) deadline.
141
Explain the domino effect
If there are more interacting participants/threads, if the recoverypoints we aim to go back to are not sychronized/consistent with each other we may have to roll back to the beginning of program execution.
142
Operations for locking resources are always assumed to be atomic. Why is this so important?
: Locking is often an integral part of the infrastructure allowing error handling (like in an AA). We would like to avoid that the lock manager needs to get involved in error handling together with the action participants. (this would increase the complexity of the error handling, and possibly demand knowledge in the lock manager of the Action.)
143
Explain shortly how exception handling works. Use C++, Java or ADA as example language if you want.
Basicly: At error a program can “throw” an exception. Up in the call hierarchy a catch-clause may trig, handle the error and the program will(if it does not rethrow) continue operation after the catch-clause.
144
Code a barrier that works.
What about something so simple (Lots of credit to the one who make it!): 1 // Signal that I am ready to both the other two 2 signal(A); 3 signal(A); 4 // Then wait for the others. 5 wait(B); 6 wait(C);
145
``` We have used the following demands on a principle for dividing a system into modules: 1. We must be able to use a module without knowing its internals. ``` 2. We must be able to maintain a module without knowing its usage patterns. 3. Composition: Super-modules can be made out of sub-modules. In this perspective: Criticize (shortly) the following principles for dividing a system into modules. Will they be suited for contructing a large software system ? Dividing functionality into Threads, like in Java or POSIX.
Shared variable synchronization lies in the context here. This is global mechanisms for synchronization that makes at least point 2 fall, probably also point 1. Point 3 also falls: building superthreads from subthreads is not a good option.
146
Acceptance tests is seen as an important tool for handling errors. How do acceptance tests differ from the more traditional tests on error conditions?
De stiller krav til riktig tilstand, dvs. at de setter oss i posisjon til å oppdage uforutsette feil We can detect “unexpected” errors. It also has an “merging of failure modes” effect.
147
Real-time software is well and good, but no real-time demands is met when the program is not running. One of the difficult features to give a system is to allow upgrading the system to a new version without loosing service. Outline shortly how you would approach making a system with this feature.
Process pairs comes a long way; We can take down the backup, and replace it with a version running the new version of the software, before taking down the primary and provoking a take-over into the new version. The only thing to be awareof is the status messages that must be versioned, along with the new version of the software being able to relate to ’old’ status messages. Another challenge with processpairs is when the state of the program is too large to fit in a status message - any reflections on this is great, but outside of scope here.
148
Mention one problem that can be solved by the ’setjmp and longjmp’ calls in C.
From the top op my head: • Transforming a ’signal’, which is ’resumption mode’ into ’termination mode’ (ATC, which we want) * Zero-overhead error handling / C exception handling. * ... Any example, also outside of curriculum :-)