Blepp Flashcards

Question

Grays third alternative way to make reliable calculations assumes that all data (the whole program state) is safely stored in a database, that all calculations are formulated as transactions, and that we have a system for running these transactions which also, automaticly handles any errors. In addition to the answer in the previous question; Why do we not make all (embedded, real-time) systems like this?

Answer 1

Large, heavy, less ﬂexible than we need in an embedded setting. ...and it does not exist, generally.

Answer 2

Priority Inheritance or Ceiling protocols. That the lowpriority thread get temporary increased priority - over the intermediate threads - in these situations.

Answer 3

Divide the system into one thread per timing demand, choose a predictable scheduler, make safe estimates of WCET, perform a schdulability proof.

Answer 4

Integrating the Object’ of object oriented programming with the in-many-sences similar contruct the ’Monitor’ sounds like a good idea. However when it comes to classes and inheritance synchronization makes really not much meaning. We can not ’inherit’ synchronization behaviour and achieve any kind of ’hiding the details of the base class’.

Answer 5

If a thread is interrupted just after the call to getWaiting have returned a number less than 2, but before the thread waits, the program will deadlock if the other threads arrives.

Answer 6

The introduction to the log was motivated by getting up after a restart to a consistent state. Here, however, we ask for the other usage for the log: To also write before-state into log records, and UNDO log records. A recovery point then becomes just a position in the log It is reasonable (but not expected) also to mix AA/transactions into this answer, since that infrastructure adds much to the log functionality.

Answer 7

Acceptance tests: Give demands to the correct state/result rather than testing on error situations. Static redundancy: Given intermittent errors or independent systems this catches errors that would impossible in other ways.

Answer 8

Synkroniseringsmekanismene har preg av globale variable. Å sjekke om det virker blir en global analyse. Vi kan ikke se på en kodesnutt om den virker, uten å kjenne all bruk og interaksjon med synkronisering i hele programmet. Andre gode argumenter må også belønnes.

Answer 9

POSIX: I wait-kallet på en condition variabel kan en angi en mutex som skal frigis midlertidig. Ikke noe kobling til modulbegrep ut over hvordan en velger å bruke mutextene. Java: Alle objekter kan beskytte metoder mot samtidig aksess fra flere tråder ved å merke metodene som synchronized. Fra innsiden av en synchronized metode kan en kalle wait for å suspensere seg selv, og frigi objekt-låsen midlertidig. notify vekker en slik suspendert prosess for objektet, mens notifyAll vekker alle. Ada: Her har vi funksjoner, prosedyrer og entries som aksesserer et protected object. Blokkering fra innsiden av objektet er ulovlig (noe som rydder godt opp i dette med composition - det blir ikke mulig). Funksjoner kan bare lese interne variable, og kan dermed kjøre mange i slengen. Prosedyrer og entries krever eksklusiv aksess, og entriene kan være guardede - guardene kan bare teste på objektets private variable. Nevne requeue kanskje?

Answer 10

* Partly difﬁcult and non-intuitive semantics. * invisible program ﬂow paths. * goto without knowing where you came from or where you are going * Large and unpredictable overhead at error.

Answer 11

Basicly code will end up with race conditions; depending on what happens first - one thread suspending itself or the other one resuming it. Getting around this is very difficult if not impossible. Testing on conditions for suspending itself does not work since we may be interrupted after the execution of the test and the suspend call.

Answer 12

Java: Any method in a java object can be denoted “synchronized”, which means that calls to this method will happen under mutual exclusion with other synchronized methods. wait(): A call to wait() will suspend the current thread; It will be resumed by a call to one of the notify() calls (or by somebody calling interrupt() on the thread) notify() wakes an (arbitrary?) thread blocked by this objects lock. notifyAll() wakes all threads blocked by this objects lock.

Answer 13

To make clear limitations to what consequences an (unexpected) error can have, so that error handling gets possible. Start: To establish which participants may be affected, and to set a safe, consistent, starting point (if an error have occured, it must have happened after the start point.) Side: Limiting communication (restricting messages, locking variables...) to members to hinder error spreading out of the Action. End: Ensuring a consistent system before leaving the action so that any errors did not spread / have consequenses after end of Action.

Answer 14

Open answer here, but the 5 applications of Blooms criteria should be in (Different criteria for who gets the resource when more are waiting): 1. Different types of requests (Ex. r/w locks) 2. Order of requests (Ex. FIFO/LIFO) 3. Server State (Ex. history, state, mode) 4. Parameters of the request (Ex size, amount, importancy) 5. Priority of client

Answer 15

The library that printf comes from is not reentrant. Race Conditions

Answer 16

The trick is that since we know beforehand which resources a given thread uses, and that the priority of this resource is set to max+1 of all the using threads, it is impossible for any thread owning a given resource to be interrupted by any other thread also (potentially) wanting the same resource. As soon as T1 has allocated resource A, T2 will not even get to run (so that it can allocate resource B), since it has lower priority than T1 now has.

Answer 17

The difference is on where the execution continues after the signal/exception/notification. If the execution continues where it left off nothing is gained - the program must poll to check wether it has been interrupted.

Answer 18

Deadlock: More parts of the system waits for each other in a curcular wait, locking the system in a state it cannot get out of. Race Condition: A bug that surfaces by unfortunate timing or order of events.

Answer 19

If it were not for this restriction we would not be able to know when to re-evaluate the guards and wake up sleeping processes.

Answer 20

This has not been discussed deeply in the lectures; The students should still be able to reason on the problem. Any mature answer should be credited. Also here detection and handling is a way to go: Deadline misses and/or overload situations can be detected by comparing the time of the sideffects with their deadlines - or by having timers interrupt us when a task is out of time. Handling must be application dependent and if the students answer hints of understanding this i think it is very good.

Answer 21

One thread gets interrupted while printing a person, leaving another thread to print its person before the first one gets to finish.

Answer 22

Error masking (of all errors as long as the replicas are independent!) and detection. Given modules with failure probabilities, we can throw replicas at the problem until the probabilities are acceptable.

Answer 23

Thread1 to tries to send something to Thread2 and vice versa - nobody listens.

Answer 24

Ada: A **protected object** is a module, a collection of functions, procedures and entries along with a set of variables. **Functions** are read-only, and can therefore be called concurrently by many tasks, but not concurrently with procedures and entries. **Procedures** may make changes to the state of the object, and will therefore run under mutual exclusion with other tasks. **Entries** The important thing is that these are protected by guards - boolean tests - so that if the test fails, the entry will not be callable - the caller will block waiting for the guard to become true. These tests can only be formulated using the object’s private variables.

Answer 25

Rather then writing the complete state to the storage, only the changed part of the state is written: As each part of the state (i.e. variable) is changed, the new value of the variable is written. When doing the recovery, the latest state is reconstructed by executing all the logrecords. Thisletsusavoidwritingthecompletestateeverytime-whichmaybeverybig.(Many more things to mention here...)

Answer 26

We get availability here - checkpoint/restart is dependent on the restart and any repair. Two programs, the primary and the backup are at the same time. The primary does the side effects and sends the checkpoints to the backup along with IAmAlive messages. The backup broadcasts IAmMaster (both to the clients any other “servers”) when enoughIAmAlivemessageshavebeenmissed-andcontinuesfromthelastcheckpoint, probably starting with the last prepared side effect/reply.

Answer 27

Avstemmingen i etterkant av en AA. En kan polle feilstatus variable, eller hvis systemet er meldingsbasert kan en sende feilmeldinger. Ellers er asynchronous transfer of control det som er mest behandlet i boken: select then abort i Ada, AsynchrouneslyIterruptedExceptions i Java, og setjump/longjump -trikset eller pthread\_cancel i C/POSIX.

Answer 28

Redundancy is the keyword also here in the form of ack/timeout/resending. Also here some more details should be included: Sessions, checksums, sequence numbers, ...

Answer 29

We achieve a simpler system. Even though it would be possible to handle all different errors differently / do forward error recovery it is not always feasible since errors happen seldomly anyway. Some errors might get more drastic consequences than strictly necessary, but again - it happens seldomly.

Answer 30

“The system locked in cicular waiting” or “A state that the program cannot leave”. We expect the standard wait(A);wait(B) / wait(B);wait(A) solution here. But other examples must also be accepted.

Answer 31

• Replication Checks • Timing Checks • Reveral Checks • Coding Checks • Reasonableness Checks • Structural Checks • Dynamic Reasonableness Checks

Answer 32

This is surprising since EDF comes out better in more schedulability tests... The book lists: ## Footnote * Easier to implement * Easier to incorporate tasks without natural or hard deadlines. * Though peroid (as used by FPS/RMS for setting priorities) is not really a good measure of importance, but it is far better than absolute deadlines. FPS priorities can be tweeked... * The overload behavior of FPS may be preferred - see previous point. * The FPS/RMS schedulability test is unnecessary pessimistic (SH: seriously? are we making systems that may or may not work?)

Answer 33

* Describe everything the routine does * Avoid meaningless, vague, or wishy-washy verbs * Don’t differentiate routine names solely by number * To name a function, use a description of the return value * To name a procedure, use a strong verb followed by an object * Use opposites precisely * Establish conventions for common operations

Answer 34

Unfortunally priorities are the wrong answer here; First come, ﬁrst serve is the normal behavior. This is to avoid starvation...

Answer 35

The module is not reentrant; Any datastructures (shared resources) used by the module may be accessed by more threads at the same time, or even worse; references to internal data-structures may be returned to the caller for concurrent access other places in the program.

Answer 36

* Fixed set of tasks (No sporadic tasks... Not optimal but fair deal) * Periodic tasks, known periods (Realistic) * The threads must be independent. (Not realistic at all in an embedded system) * Overheads, switching times can be ignored (Sometimes yes, sometimes no) * Deadline == Period (Not optimal but Fair deal) * Fixed Worst Case Execution Time. (Not realistic to know a tight estimate here.) * and in addition: Rate-Monotonic Priority ordering. (our choice, so ok)

Answer 37

More other good answers exist for the first part here. Pointing out the three boundaries (side, start and end), possibly with standard mechanisms to achieve them (locking, explicit membership, and two-phase commit protocol) is reasonable enough. The problem to be solved: If more participants cooperates on something, they must possibly also cooperate in handling errors. AA provides the framework for achieving this, containing the errors and avoiding the domino effect. “A mechanism/infrastructure for error containment when we have cooperating threads.” basicly covers both questions quite well.

Answer 38

One thread does not get the resources it needs due to unfortunate scheduling.

Answer 39

T1: recv(T2); send(T2); T2: recv(T1); send(T1); Should really be trivial. - recv(T3);...

Answer 40

: Ikke verre enn å operere med “TransaksjonsId’er” som knytter log-recordene til den enkelte oppgave. Recovery-punktene i logen kan da settes per-oppgave.

Answer 41

This is the “learn-by-heart” list from the book. * Replication checks (N-version programming). * Timing checks (watchdog, deadline). * Reversal checks (calculate input from output and compare). * Coding checks (checksums). * Reasonableness checks (variable range, assertions). * Dynamic reasonableness check (reasonable compared to prev. value). * Structural checks (integrity of data structures).

Answer 42

Sverre tends to naming channels by describing the elements carried on the channels, but other parts of the channels context is as relevant sometimes.

Answer 43

Predicatbility in timing domain, so that systems can be analyzed. Simple analysis, not too conservative analysis - facilitating high utilization. Ensuring that all thread meed their deadlines is ok. Comments on resource allocation (of other resource than CPU) and Deadlock avoidance are also good. Performance and fairness is bad answers.

Answer 44

What you see is not what you get when a loop gets hijacked by an inner loop (inﬁnite, blocking, spinning, polling...).

Answer 45

Figure is good showing the master and backup processes sending IAmAlive, status and IAmMaster messages. That the backup takes over when the master dies should be clear, and it is very good if the consistency of service is argued.

Answer 46

Invalidates the assumptions for the (basic form of) analyses to be valid. Can partly be compensated for by more complicated analyses, and more conservative results in terms of utilization. Difﬁcult question so we can give credit for any mature answer.

Answer 47

Blooms criteria, expressive power and ease of use, bør komme med her som standardmåten å sammenligne slikt på (men de trenger ikke listes og gås systematisk igjennom). Adas svakhet at vi ikke kan ha guarder på parametre bør komme frem som et minimum. Det at guarder kommer heldig ut på ease-of-use-siden er flott om det nevnes.

Answer 48

Because for example when an error is detected by one thread, it might mean that the work done by another thread is not meaningful anymore -\> It needs to be interrupted.

Answer 49

There are no shared resources.

Answer 50

Detection: - Who owns and asks for what (detect circles) - Timeout/Watchdog Recovery: - Breaking mutual exclusion - Preemption - Abort of thread or Atomic Action

Answer 51

Backward or forward error recovery; just get to any consistent state. Merging of failure modes is also relevant

Answer 52

Since getName calls getFirstName and both allocates the semaphore we will deadlock.

Answer 53

A thread does “by accident” not get necessary resources.

Answer 54

...that one thread can interrupt, with termination mode (so that it does not continue where it left off after the interruption) another thread.

Answer 55

Into the log, at suitable intervals, we write a (non-consistent since there may be active operations going on) complete snapshot of the program state along with a list of active operations. Log that is older than the last checkpoint which lists no operations that is still currently active can be deleted. Recovery will then start by intitalizing to this checkpoint, and then executing the relevant logrecords from then on.

Answer 56

Good: Error handling framework generalizeable to more participants! Errors do not have consequences beyond the AA. Bad: Bad composition, some infrastructure needed, one-design-fits-all ? Handles RT systems and side effects not that well.

Answer 57

* The priority ceiling schedulig protocol. (If all tasks reasource usage is known at compile time.) * Or using a lock manager avoiding risking deadlocks (bankers alg.) (If all tasks reasource usage is known at compile time.) * Formal veriﬁcation. Like LTSA, FSP etc. * Any deadlock detecting/handling scheme. * Invalidating any of the 4 necessry conditions of deadlocks. * ...

Answer 58

* Deadlock: system blocked in circular wait * Livelock: system locked in a subset of states (like deadlock but we use CPU) * Starvation: A thread does by accident not get necessary resources. Ex: Unfair scheduling. (Ref discussion on whether you should make assumptions on how the sceduling works. Ref. Go vs. Occam: & Who is waked by a signal?) * Race Condision: A bug that surfaces by unfortunate timing or order of events.

Answer 59

* Easy-to-understand mechanism to make errorhandling simple. * Separation of error handling code and normal-operation code. * Same mechanism for handling different types of exceptions. * No overhead in normal operation? * Allow recovery actions :-)

Answer 60

:Error mode: not delivering the next correct sideeffect. Detected by acceptance tests.

Answer 61

If there are more different reasons for waiting, you cannot easily know who you awaken with a notify. If you awaken somebody waiting for something else than what got ready, the system may misbehave. Even in a system with only one reason for waiting, this may change due to future reuse, maintenance or inheritance. From the module perspective using only notify may demand knowledge of the usage patterns of the object. That is, yielding not so good encapsulation.

Answer 62

Just before any side effect a set of acceptance tests are run, if ok the complete state of the program is written to safe storage - the checkpoint. If tests are not ok, the program is restarted, and loads the previous checkpoint, continuing from there.

Answer 63

The perfect answer here would be the ACID properties of a transaction: Atomicity (it happens all-or-nothing), Consistency (a transformation from one consistent/correct state to another), Isolation (partial results are not propagated out of the transaction before it has ﬁnished so that errors can be handled) and Durability (the result of a calculation is never lost),

Answer 64

(Acceptance tests must take care of error detection...) We increase the probability of being able to successfully recover from any errors, and get a flexible framework for how this is done — If the first way of doing something did not work, maybe we can do it in another way. (Or; if the first actor could not do it, maybe the backup actor like in process pairs e.g.)

Answer 65

That a high priority thread potentially might be waiting forever (unbounded time) for lower priority threads. This happens when a lowpriority thread owns a resource that the high-priority thread needs. Unboundedness comes from the unknown number of intermediate priority threads potentially wanting to run. Mention of the “intermediate-priority” threads is necessary for a good answer.

Answer 66

Synchronization: access to a common global variable is restricted by mutual exclusion with e.g. semaphores. Communication: One thread sends a message to the other.

Answer 67

The clear and systematic approach here is to tell how the tree boundaries are made: Start boundary can be implemented by the ActionController keeping track of ActionMembers through some “entry protocol”. Side boundary can be implemented by some kind of locking of resources. Or limitations on communication: Participants can not communicate with nonparticipants. End boundary can be implemented by tho-phase commit protocol (or another kind of barrier)

Answer 68

We can have a lot of waiters at a time, and waking all of them, only to loop and go to sleep again is a waste (and bad abstraction).

Answer 69

Locking the inner monitor /must/ also lock the outer monitor (this has to do with ’releasing control in safe places’). Getting the outer monitor locked in odd places leads to all the same problems that we originally had with semaphores, just on one call/abstraction-level higher.

Answer 70

• Simplification of the system. (If handling the worstcase error anyway, maybe all other errors can be handled the same way) • Error modes is part of module interface: Fewer error modes enhances modularity / maintenance / composition by reducing size of interface. • Handling unexpected errors, since merging of failure modes also can encompass unknown error modes...

Answer 71

f may throw an exception, causing the semaphore never to be signaled. We solve this with “ﬁnal wishes”, a piece of code that will allways be executed - even when an exception is thrown.

Answer 72

First we can schedule the most “critical” thread ﬁrst like in EDF (or even rate-monotonic priority ordering), kind of increasing the chance that all deadlines will be met. But predictability is as important paving the way for schedulability analysis.

Answer 73

Just before any part of program state is changed, the before-state of that part of the program state is stored away as a log-record. These are also typically labeled by the operation they are a part of so that the operations (the atomic action...) can fail individually. When an error is detected the log records are "undone” in the opposite order they was created.

Answer 74

Start: In static systems this maybe hardcoded. If not, some kind of explicit membership list is ok, A action manager can keep track of the members of each action. Any recovery points may also be established at start boundary, if preparing for backward recovery. Side: Typically some kind of resource locking of resource to action. From transactions we learn that the transaction id should be part of all communication, meaning that all threads wanting to act on a message, must join the transaction. End: Acceptance tests, and any vote or synchronization; Twophase commit protocol.

Answer 75

Keeping deadlines is difficult in a coupled system. Dividing the system into one thread per deadline is a simplification.

Answer 76

Ada fails at the case where an entry parameter is necessary for the resource allocation - like when allocating N of a resource. Guards cannot test on parameters (only private variables), leading (supposedly - the book does not demonstrate this) to “double interactions” or to the more complex application of the “requeue” or “entry families” mechanisms.

Answer 77

Immediate: Allocating a priority value to shared resources in the system, and letting a thread allocating the resource get the resource’s priority (or really: max(current,resource)) while it owns it. The resource priorities should be equal to the max of that of all threads using the resource.

Answer 78

* Error handling gets very expensive. * Must get ﬁnal wishes right. These precautions is dependent on other parts of the code - breaking encapsulation. * Does it make the code easier to maintain?-“ a go to where you do not know where you came from and where you are going”? - And the mechanism is not allways simple to understand itself.

Answer 79

Ada differs between functions (that cannot change the object state) and procedures (that can), and allows functions to “run in parallel”. That is, use functions for readers, and procedures for writers.

Answer 80

Shared variable synchronization focuses on avoidance of the problems with more threads sharing common resources. Apart from the added complexity of synchronization the threads look like “normal”; programs working on data. Messagepassing systems have ideally no shared resources; each resource is managed by a thread, and other threads must access the resource by communicating with this. Most threads in a messagepassing system is built around the while-select loop. There are usually far more threads in a messagepassing system since we have more reasons to create them (...like managing resources)

Answer 81

Redundancy is the keyword, but for a full score \*some\* more details should be mentioned: refresh-thread, writeback on error, something on the error modes, ...

Answer 82

Difficulty is that we did not know the cause... But it is often solvable by merging error modes: “I failed, no matter the reason I now must do...” A reference to AA or transactions should be included: This lets us reason on and put limits on the possible consequences of the error. Recovery points / backward error recovery is a catch-all: Lets us go back to known consistent state (and possibly try again).

Answer 83

It works by assuming, optimisticly, that conflicts will not happen, and detects and handles it as an error if it happens.

Answer 84

We try to compensate and/or correct the error we have just detected

Answer 85

Obviously fault recovery, but also importantly availability, minimizing service downtime. We could speculate further for bonuspoints; paving the road for online upgrade?

Answer 86

Though message passing systems are well suited for large concurrent systems - scales well - easy to modularize as clients and servers, fantastic way of dividing a system into modules etc... ... message passing systems are not so well developed for real-time applications - no/-few schedulability proofs and inviting a larger number of (cooperating) threads which again makes scheduling & scheduling predicability harder. Maturity questions again - any well-funded reasonable answer may receive score.

Answer 87

:Transactions have one and only one errormode: ABORT. While an AA is more of a transactional framework where both forward and backward error recovery is possible. ABORT might not be an option when you have a deadline, since we have wasted some of our time already.

Answer 88

* Fixed set of tasks * Periodic tasks, known periods * The threads must be independent. * Overheads, switching times can be ignored * Deadline == Period * Fixed, known Worst Case Execution Time. * and in addition: Rate-Monotonic Priority Ordering or deadline ﬁrst.

Answer 89

The need for readers/writers locks is motivated by a lot of readers. If there are a lot of readers, they may overlap in execution, starving any writers.

Answer 90

``` A reader (per deﬁnition, here) does not change the (value of) the resource, meaning that all readers will see the same, consistent state - that is, no problem. More writers may if they interrupt each other see intermediate (inconsistent) states or overwrite each other partial writes. ```

Answer 91

Disable interrupt, test and set/ swap, spin locks.

Answer 92

Fault tolerance by (dynamic) redundancy. High availability since the switching between replicas happens immediately.

Answer 93

This is a performance issue-typically we have a lot of readers/reads and few writers/writes, and locking/serializing these calls hampers performance/removes parallelism.

Answer 94

Acceptance tests: Give demands to the correct state/result rather than testing on error situations. Static redundancy: Given intermittent errors or independent systems this catches errors that would impossible in other ways.

Answer 95

By, just before any side effect or program state change, the intended effect/new value is stored in the log - typically as part of the same log record as the old value. At restart all (relevant) logrecords are “executed” in order.

Answer 96

Functions should be named after their return values. Code Complete checkpoint: “Is the routine’s name a strong, clear verb -plus- objectname for a procedure or a description of the return value for a function?”

Answer 97

• Deadlock Prevention (Fjerne en av de 4 betingelsene) – Optimistic concurrency control (!) – Allokere alle ressurser samtidig. – Preemption: (timeout, priority...) – Global standard allokeringsrekkefølge. – Pluss denne: Global analyse - modellere & bevise fravær av deadlocks. • Deadlock Avoidance: – Resource allocation (Bankers algorithm) – Scheduling algorithms (Priority Ceiling) • Deadlock detection & Recovery: – Detection: ∗ Analyse av hvem som eier og forespør hva ∗ Timeout/Watchdog – Recovery: ∗ Breaking mutual exclusion ∗ Preemption (Ex. -\> Forward Error Recovery) ∗ Abort av Tråd eller Atomic Action (-\> backward E.R.)

Answer 98

In Ada blocking - for any reason - from the inside of a monitor is a runtime error. Monitor calls is assumed to be “short”. Nested monitor calls \*are\* possible, just not blocking on one. Unnecessary detail for full score: Blocking on monitor access itself is not deﬁned as blocking, but blocking on a guard is.

Answer 99

* Schedulability proofs are not well developed. * Traditionally we have in RT systems been closer to HW, maybe even without an OS. The message passing infrastructure might not be available, * ...or it might not be to heavy/slow. * In synchronization-based RT systems we have “One thread per timing demand” and we handle these threads with priorities. While in processoriented systems we make threads of other reasons also, possibly making it difficult to assign priorities to them in any meaningful way. * ... There are many other reasonable arguments here

Answer 100

:Error mode: not delivering the next correct sideeffect. Detected by acceptance tests. Just before any side effect a set of acceptance tests are run, if ok the complete state of the program is written to safe storage - the checkpoint. If tests are not ok, the program is restarted, and loads the previous checkpoint, continuing from there.

Answer 101

A live-lock is a bug where a subset of states that does not ﬁll the whole functionality of the system is entered, and where there is no way of leaving this subset.

Answer 102

The assumptions does not hold. The execution time bounds are all too conservative. The SW is too complex to ﬁt into the standard model. (System seems to work well enough after testing)

Answer 103

Priority Ceiling and Bankers algorithm, possibly also “Global allocation order” and “allocate all resources at once” and formal veriﬁcation.

Answer 104

Error mode: not delivering the next correct side effect. Detected by acceptance tests. Just before any side effect is done, a set of acceptance tests are run: If ok, the complete state of the program is written to safe storage - the checkpoint. If tests are not ok, the program is restarted, and loads the previous checkpoint, continuing from there.

Answer 105

If a task with high priority is dependent on a resource owned by a low priority process it will be blocked waiting for something that may not run for a long time given the low priority of the resource holder.

Answer 106

Twoprograms,theprimaryandthebackuparerunatthesametime. The primary does the side effects (like “send answer to the client”) and sends the program state/checkpoints to the backup (though in the opposite order!) along with IAmAlive messages. The backup broadcasts IAmMaster when enough IAmAlive messages have been missed - and continues from the last checkpoint.

Answer 107

* Redundancy! If one controller/controller/hard disk fails, we should rely on the other one—We have a number of patterns here; static redundancy, processpairs, n-version programming... * Merging of failure modes: \*If\* something/anything goes wrong, then fall back on trusting the redundancy. * Acceptance tests: This way of detecting errors ensures that even unexpected errors are handled.

Answer 108

Necessary Conditions: 1. Mutual Exclusion 2. Hold & Wait 3. No Preemption 4. Circular Wait ``` Deadlock Prevention (removing one condition): 1. Optimistic concurrency control. ``` 2. Allocate all resources at once. 3. Preemption. 4. Global allocation order.

Answer 109

• The “giving up to preserve consistency” error mode is often not acceptable since we have interactions with other systems that may be dependent on us to behave correctly. • We have no time to waste; when we report back to the module that initiated the failed operation it might be too late to retry/fix it.

Answer 110

It is extremely important that no (unexpected) errors propagate from the primary to the backup. The status messages \*must\* be error free. Acceptance tests is the mechanism that ensures this. A perfect answer should contain the sequence; Do work — perform acceptance test — send status to backup — do side-effects. If the primary crashes, the slave executes the (possibly duplicate) side effects.

Answer 111

Avstemmingen i etterkant av en AA. En kan polle feilstatus variable, eller hvis systemet er meldingsbasert kan en sende feilmeldinger. Ellers er asynchronous transfer of control det som er mest behandlet i boken: select then abort i Ada, synchrouneslyIterruptedExceptions i Java, og setjump/longjump-trikset eller pthread\_cancel i C/POSIX.

Answer 112

FPS: Alltasks geta ﬁxedpriority, andthe scheduler allways lets therunnable highest priority thread run. EDF: No predetermined priorities are given; the scheduler allways runs the task with the earliest (absolute) deadline.

Answer 113

If there are more interacting participants/threads, if the recoverypoints we aim to go back to are not sychronized/consistent with each other we may have to roll back to the beginning of program execution.

Answer 114

: Locking is often an integral part of the infrastructure allowing error handling (like in an AA). We would like to avoid that the lock manager needs to get involved in error handling together with the action participants. (this would increase the complexity of the error handling, and possibly demand knowledge in the lock manager of the Action.)

Answer 115

Basicly: At error a program can “throw” an exception. Up in the call hierarchy a catch-clause may trig, handle the error and the program will(if it does not rethrow) continue operation after the catch-clause.

Answer 116

What about something so simple (Lots of credit to the one who make it!): 1 // Signal that I am ready to both the other two 2 signal(A); 3 signal(A); 4 // Then wait for the others. 5 wait(B); 6 wait(C);

Answer 117

Shared variable synchronization lies in the context here. This is global mechanisms for synchronization that makes at least point 2 fall, probably also point 1. Point 3 also falls: building superthreads from subthreads is not a good option.

Answer 118

De stiller krav til riktig tilstand, dvs. at de setter oss i posisjon til å oppdage uforutsette feil We can detect “unexpected” errors. It also has an “merging of failure modes” effect.

Answer 119

Process pairs comes a long way; We can take down the backup, and replace it with a version running the new version of the software, before taking down the primary and provoking a take-over into the new version. The only thing to be awareof is the status messages that must be versioned, along with the new version of the software being able to relate to ’old’ status messages. Another challenge with processpairs is when the state of the program is too large to ﬁt in a status message - any reﬂections on this is great, but outside of scope here.

Answer 120

From the top op my head: • Transforming a ’signal’, which is ’resumption mode’ into ’termination mode’ (ATC, which we want) * Zero-overhead error handling / C exception handling. * ... Any example, also outside of curriculum :-)

Blepp Flashcards

(148 cards)