Chapter 5 Flashcards

Question

Approaches to Multiple-Processor Scheduling

Answer 1

One approach toCPUscheduling in a multiprocessor system has all scheduling decisions, I/O processing, and other system activities handled by a single processor — the master server. The other processors execute only user code. This asymmetric multiprocessing is simple because only one core accesses the system data structures, reducing the need for data sharing. The downfall of this approach is the master server becomes a potential bottleneck where overall system performance may be reduced. The standard approach for supporting multiprocessors is symmetric mul-tiprocessing (SMP), where each processor is self-scheduling. Scheduling proceeds by having the scheduler for each processor examine the ready queue and select a thread to run. Note that this provides two possible strategies for organizing the threads eligible to be scheduled: 1. All threads may be in a common ready queue. 2. Each processor may have its own private queue of threads.

Answer 2

Researchers have discovered that when a processor accesses memory, it spends a significant amount of time waiting for the data to become available. This situation, known as a memory stall, occurs primarily because modern processors operate at much faster speeds than memory. However, a memory stall can also occur because of a cache miss (accessing data that are not in cache memory). Figure 5.12 illustrates a memory stall. In this scenario, the processor can spend up to 50 percent of its time waiting for data to become available from memory. To remedy this situation, many recent hardware designs have implemented multithreaded processing cores in which two (or more) hardware threads are assigned to each core. That way, if one hardware thread stalls while waiting for memory, the core can switch to another thread. Figure 5.13 illustrates a dual-threaded processing core on which the execution of thread 0 and the execution of thread 1 are interleaved. From an operating system perspective, each hardware thread maintains its architectural state, such as instruction pointer and register set, and thus appears as a logical CPU that is available to run a software thread. This technique—known as chip multithreading (CMT) —is illustrated in Figure 5.14. Here, the processor contains four computing cores, with each core containing two hardware threads. From the perspective of the operating system, there are eight logical CPUs. Intel processors use the term hyper-threading (also known as simultaneous multithreading or SMT) to describe assigning multiple hardware threads to a single processing core. Contemporary Intel processors—such as the i7—support two threads per core, while the Oracle Sparc M7 processor supports eight threads per core, with eight cores per processor, thus providing the operating system with 64 logical CPUs.

Answer 3

With coarse-grained multithreading, a thread executes on a core until a long-latency event such as a memory stall occurs. Because of the delay caused by the long-latency event, the core must switch to another thread to begin execution. However, the cost of switching between threads is high, since the instruction pipeline must be flushed before the other thread can begin execution on the processor core. Once this new thread begins execution, it begins filling the pipeline with its instructions. Fine-grained (or interleaved) multithreading switches between threads at a much finer level of granularity—typically at the boundary of an instruction. However, the architectural design of fine-grained systems includes logic for thread switching. As a result, the cost of switching between threads is small. It is important to note that the resources of the physical core (such as caches and pipelines) must be shared among its hardware threads, and therefore a processing core can only execute one hardware thread at a time. Consequently, a multithreaded, multicore processor actually requires two different levels of scheduling, as shown in Figure 5.15, which illustrates a dual-threaded processing core.

Answer 4

On one level are the scheduling decisions that must be made by the oper-ating system as it chooses which software thread to run on each hardware thread (logicalCPU). For all practical purposes, such decisions have been the primary focus of this chapter. Therefore, for this level of scheduling, the oper-ating system may choose any scheduling algorithm, including those described in Section 5.3. A second level of scheduling specif i es how each core decides which hard-ware thread to run. There are several strategies to adopt in this situation. One approach is to use a simple round-robin algorithm to schedule a hardware thread to the processing core. This is the approach adopted by the UltraSPARC T3. Another approach is used by the Intel Itanium, a dual-core processor with two hardware-managed threads per core. Assigned to each hardware thread is a dynamic urgency value ranging from 0 to 7, with 0 representing the lowest urgency and 7 the highest. The Itanium identif i es f i ve different events that may trigger a thread switch. When one of these events occurs, the thread-switching logic compares the urgency of the two threads and selects the thread with the highest urgency value to execute on the processor core. Note that the two different levels of scheduling shown in Figure 5.15 are not necessarily mutually exclusive. In fact, if the operating system scheduler (the f i rst level) is made aware of the sharing of processor resources, it can make more effective scheduling decisions. As an example, assume that a CPU has two processing cores, and each core has two hardware threads. If two software threads are running on this system, they can be running either on the same core or on separate cores. If they are both scheduled to run on the same core, they have to share processor resources and thus are likely to proceed more slowly than if they were scheduled on separate cores. If the operating system is aware of the level of processor resource sharing, it can schedule software threads onto logical processors that do not share resources.

Answer 5

On SMP systems, it is important to keep the workload balanced among all processors to fully utilize the benefits of having more than one processor. Otherwise, one or more processors may sit idle while other processors have high workloads, along with ready queues of threads awaiting the bCPU. Load balancing attempts to keep the workload evenly distributed across all processors in an vSMP system. It is important to note that load balancing is typically necessary only on systems where each processor has its own private ready queue of eligible threads to execute. On systems with a common run queue, load balancing is unnecessary, because once a processor becomes idle, it immediately extracts a runnable thread from the common ready queue.

Answer 6

With push migration, a specific task periodically checks the load on each processor and—if it finds an imbalance—evenly distributes the load by moving (or pushing) threads from overloaded to idle or less-busy processors. Pull migration occurs when an idle processor pulls a waiting task from a busy processor. Push and pull migration need not be mutually exclusive and are, in fact, often implemented in parallel on load-balancing systems. For example, the LinuxCFSscheduler (described in Section 5.7.1) and theULE scheduler available forFreeBSDsystems implement both techniques. The concept of a “balanced load” may have different meanings. One view of a balanced load may require simply that all queues have approximately the same number of threads. Alternatively, balance may require an equal distri-bution of thread priorities across all queues.

Answer 7

Because of the high cost of invalidating and repopulating caches, most operating systems withSMPsupport try to avoid migrating a thread from one processor to another and instead attempt to keep a thread running on the same processor and take advantage of a warm cache. This is known as processor affinit —that is, a process has an aff i nity for the processor on which it is currently running. The two strategies described in Section 5.5.1 for organizing the queue of threads available for scheduling have implications for processor aff i nity. If we adopt the approach of a common ready queue, a thread may be selected for execution by any processor. Thus, if a thread is scheduled on a new processor, that processor’s cache must be repopulated. With private, per-processor ready queues, a thread is always scheduled on the same processor and can therefore benef i t from the contents of a warm cache. Essentially, per-processor ready queues provide processor aff i nity for free! Processor aff i nity takes several forms. When an operating system has a policy of attempting to keep a process running on the same processor—but not guaranteeing that it will do so—we have a situation known as soft affinit . Here, the operating system will attempt to keep a process on a single processor, but it is possible for a process to migrate between processors during load balancing. In contrast, some systems provide system calls that support hard affinit , thereby allowing a process to specify a subset of processors on which it can run. Many systems provide both soft and hard aff i nity. For example, Linux implements soft aff i nity, but it also provides thesched setaffinity()system call, which supports hard aff i nity by allowing a thread to specify the set ofCPUs on which it is eligible to run.

Answer 8

sys-tems are now designed using cores that run the same instruction set, yet vary in terms of their clock speed and power management, including the ability to adjust the power consumption of a core to the point of idling the core. Such systems are known as heterogeneous multiprocessing (HMP). Note this is not a form of asymmetric multiprocessing as described in Section 5.5.1 as both system and user tasks can run on any core. Rather, the intention behind HMPis to better manage power consumption by assigning tasks to certain cores based upon the specific demands of the task. ForARMprocessors that support it, this type of architecture is known as big.LITTLEwhere higher-peformance big cores are combined with energy eff i cientLITTLEcores. Big cores consume greater energy and therefore should only be used for short periods of time. Likewise, little cores use less energy and can therefore be used for longer periods. There are several advantages to this approach. By combining a number of slower cores with faster ones, avCPU scheduler can assign tasks that do not require high performance, but may need to run for longer periods, (such as background tasks) to little cores, thereby helping to preserve a battery charge. Similarly, interactive applications which require more processing power, but may run for shorter durations, can be assigned to big cores. Additionally, if the mobile device is in a power-saving mode, energy-intensive big cores can be disabled and the system can rely solely on energy-efficient little cores. Windows 10 supports HMPvscheduling by allowing a thread to select a scheduling policy that best supports its power management demands.

Answer 9

Soft real-time systems provide no guarantee as to when a critical real-time process will be scheduled. They guarantee only that the process will be given preference over noncritical processes. Hard real-time systems have stricter requirements. A task must be serviced by its deadline; service after the deadline has expired is the same as no service at all. In this section, we explore several issues related to process scheduling in both soft and hard real-time operating systems.

Answer 10

Consider the event-driven nature of a real-time system. The system is typically waiting for an event in real time to occur. Events may arise either in software —as when a timer expires—or in hardware—as when a remote-controlled vehicle detects that it is approaching an obstruction. When an event occurs, the system must respond to and service it as quickly as possible. We refer to event latency as the amount of time that elapses from when an event occurs to when it is serviced (Figure 5.17). Usually, different events have different latency requirements. For example, the latency requirement for an antilock brake system might be 3 to 5 millisec-onds. That is, from the time a wheel f i rst detects that it is sliding, the system controlling the antilock brakes has 3 to 5 milliseconds to respond to and control the situation. Any response that takes longer might result in the automobile’s veering out of control. In contrast, an embedded system controlling radar in an airliner might tolerate a latency period of several seconds.

Answer 11

Interrupt latency refers to the period of time from the arrival of an interrupt at theCPUto the start of the routine that services the interrupt. When an interrupt occurs, the operating system must f i rst complete the instruction it is executing and determine the type of interrupt that occurred. It must then save the state of the current process before servicing the interrupt using the specif i c interrupt service routine (ISR). The total time required to perform these tasks is the interrupt latency (Figure 5.18). Obviously, it is crucial for real-time operating systems to minimize inter-rupt latency to ensure that real-time tasks receive immediate attention. Indeed, for hard real-time systems, interrupt latency must not simply be minimized, it must be bounded to meet the strict requirements of these systems. One important factor contributing to interrupt latency is the amount of time interrupts may be disabled while kernel data structures are being updated. Real-time operating systems require that interrupts be disabled for only very short periods of time. The amount of time required for the scheduling dispatcher to stop one process and start another is known as dispatch latency. Providing real-time task tasks with immediate access to theCPUmandates that real-time operating systems minimize this latency as well. The most effective technique for keeping dispatch latency low is to provide preemptive kernels. For hard real-time systems, dispatch latency is typically measured in several microseconds. In Figure 5.19, we diagram the makeup of dispatch latency. The conflic phase of dispatch latency has two components: 1. Preemption of any process running in the kernel 2. Release by low-priority processes of resources needed by a high-priority process Following the conf l ict phase, the dispatch phase schedules the high-priority process onto an availableCPU.

Answer 12

The most important feature of a real-time operating system is to respond imme-diately to a real-time process as soon as that process requires theCPU. As a result, the scheduler for a real-time operating system must support a priority-based algorithm with preemption. Recall that priority-based scheduling algo-rithms assign each process a priority based on its importance; more important tasks are assigned higher priorities than those deemed less important. If the scheduler also supports preemption, a process currently running on theCPU will be preempted if a higher-priority process becomes available to run. Preemptive, priority-based scheduling algorithms are discussed in detail in Section 5.3.4, and Section 5.7 presents examples of the soft real-time schedul-ing features of the Linux, Windows, and Solaris operating systems. Each of these systems assigns real-time processes the highest scheduling priority. For example, Windows has 32 different priority levels. The highest levels—priority values 16 to 31—are reserved for real-time processes. Solaris and Linux have similar prioritization schemes. Note that providing a preemptive, priority-based scheduler only guaran-tees soft real-time functionality. Hard real-time systems must further guarantee that real-time tasks will be serviced in accord with their deadline requirements, and making such guarantees requires additional scheduling features. In the remainder of this section, we cover scheduling algorithms appropriate for hard real-time systems. Before we proceed with the details of the individual schedulers, however, we must def i ne certain characteristics of the processes that are to be scheduled. First, the processes are considered periodic. That is, they require theCPUat constant intervals (periods). Once a periodic process has acquired theCPU, it has a f i xed processing time t, a deadline d by which it must be serviced by the CPU, and a period p. The relationship of the processing time, the deadline, and the period can be expressed as 0 ≤ t ≤ d ≤ p. The rate of a periodic task is 1∕p. Figure 5.20 illustrates the execution of a periodic process over time. Schedulers can take advantage of these characteristics and assign priorities according to a process’s deadline or rate requirements. What is unusual about this form of scheduling is that a process may have to announce its deadline requirements to the scheduler. Then, using a technique known as an admission-control algorithm, the scheduler does one of two things. It either admits the process, guaranteeing that the process will complete on time, or rejects the request as impossible if it cannot guarantee that the task will be serviced by its deadline.

Answer 13

The rate-monotonic scheduling algorithm schedules periodic tasks using a static priority policy with preemption. If a lower-priority process is run-ning and a higher-priority process becomes available to run, it will preempt the lower-priority process. Upon entering the system, each periodic task is assigned a priority inversely based on its period. The shorter the period, the higher the priority; the longer the period, the lower the priority. The rationale behind this policy is to assign a higher priority to tasks that require theCPU more often. Furthermore, rate-monotonic scheduling assumes that the process- ing time of a periodic process is the same for eachCPUburst. That is, every time a process acquires theCPU, the duration of itsCPUburst is the same. Let’s consider an example. We have two processes, P1and P2. The periods for P1and P2are 50 and 100, respectively—that is, p1= 50 and p2= 100. The processing times are t1= 20 for P1and t2= 35 for P2. The deadline for each process requires that it complete itsCPUburst by the start of its next period. We must f i rst ask ourselves whether it is possible to schedule these tasks so that each meets its deadlines. If we measure theCPUutilization of a process Pias the ratio of its burst to its period—ti∕pi—theCPUutilization of P1is 20∕50 = 0.40 and that of P2is 35∕100 = 0.35, for a totalCPUutilization of 75 percent. Therefore, it seems we can schedule these tasks in such a way that both meet their deadlines and still leave theCPUwith available cycles. Suppose we assign P2a higher priority than P1. The execution of P1and P2 in this situation is shown in Figure 5.21. As we can see, P2starts execution f i rst and completes at time 35. At this point, P1starts; it completes itsCPUburst at time 55. However, the f i rst deadline for P1was at time 50, so the scheduler has caused P1to miss its deadline. Now suppose we use rate-monotonic scheduling, in which we assign P1 a higher priority than P2because the period of P1is shorter than that of P2. The execution of these processes in this situation is shown in Figure 5.22. P1 starts f i rst and completes itsCPUburst at time 20, thereby meeting its f i rst deadline. P2starts running at this point and runs until time 50. At this time, it is preempted by P1, although it still has 5 milliseconds remaining in itsCPUburst. P1completes itsCPUburst at time 70, at which point the scheduler resumes P2. P2completes itsCPUburst at time 75, also meeting its f i rst deadline. The system is idle until time 100, when P1is scheduled again. Rate-monotonic scheduling is considered optimal in that if a set of pro-cesses cannot be scheduled by this algorithm, it cannot be scheduled by any other algorithm that assigns static priorities. Let’s next examine a set of pro-cesses that cannot be scheduled using the rate-monotonic algorithm. Assume that process P1has a period of p1= 50 and aCPUburst of t1= 25. For P2, the corresponding values are p2 = 80 and t2= 35. Rate-monotonic 0 scheduling would assign process P1a higher priority, as it has the shorter period. The totalCPUutilization of the two processes is (25∕50) + (35∕80) = 0.94, and it therefore seems logical that the two processes could be scheduled and still leave theCPUwith 6 percent available time. Figure 5.23 shows the scheduling of processes P1and P2. Initially, P1runs until it completes itsCPU burst at time 25. Process P2then begins running and runs until time 50, when it is preempted by P1. At this point, P2still has 10 milliseconds remaining in its CPUburst. Process P1runs until time 75; consequently, P2 f i nishes its burst at time 85, after the deadline for completion of its CPU burst at time 80. Despite being optimal, then, rate-monotonic scheduling has a limitation: CPUutilization is bounded, and it is not always possible to maximizeCPU resources fully. The worst-caseCPUutilization for scheduling N processes is N(21∕N− 1). With one process in the system,CPUutilization is 100 percent, but it falls to approximately 69 percent as the number of processes approaches inf i nity. With two processes,CPUutilization is bounded at about 83 percent. CombinedCPU utilization for the two processes scheduled in Figure 5.21 and Figure 5.22 is 75 percent; therefore, the rate-monotonic scheduling algorithm is guaranteed to schedule them so that they can meet their deadlines. For the two processes scheduled in Figure 5.23, combinedCPUutilization is approximately 94 per-cent; therefore, rate-monotonic scheduling cannot guarantee that they can be scheduled so that they meet their deadlines.

Answer 14

Earliest-deadline-firs (EDF) scheduling assigns priorities dynamically accord-ing to deadline. The earlier the deadline, the higher the priority; the later the deadline, the lower the priority. Under the vEDF vpolicy, when a process becomes runnable, it must announce its deadline requirements to the system. Priorities may have to be adjusted to ref l ect the deadline of the newly runnable process. Note how this differs from rate-monotonic scheduling, where priorities are f i xed. To illustrate EDF scheduling, we again schedule the processes shown in Figure 5.23, which failed to meet deadline requirements under rate-monotonic scheduling. Recall that P1has values of p1= 50 and t1= 25 and that P2has values of p2= 80 and t2= 35. TheEDFscheduling of these processes is shown in Figure 5.24. Process P1has the earliest deadline, so its initial priority is higher than that of process P2. Process P2begins running at the end of theCPUburst for P1. However, whereas rate-monotonic scheduling allows P1 to preempt P2 at the beginning of its next period at time 50,EDFscheduling allows process P2to continue running. P2now has a higher priority than P1because its next deadline (at time 80) is earlier than that of P1(at time 100). Thus, both P1and P2meet their f i rst deadlines. Process P1again begins running at time 60 and completes its secondCPUburst at time 85, also meeting its second deadline at time 100. P2begins running at this point, only to be preempted by P1at the start of its next period at time 100. P2is preempted because P1has an earlier deadline (time 150) than P2(time 160). At time 125, P1completes itsCPUburst and P2resumes execution, f i nishing at time 145 and meeting its deadline as well. The system is idle until time 150, when P1is scheduled to run once again. Unlike the rate-monotonic algorithm,EDFscheduling does not require that processes be periodic, nor must a process require a constant amount ofCPU time per burst. The only requirement is that a process announce its deadline to the scheduler when it becomes runnable. The appeal ofEDFscheduling is that it is theoretically optimal—theoretically, it can schedule processes so that each process can meet its deadline requirements andCPUutilization will be 100 percent. In practice, however, it is impossible to achieve this level ofCPU utilization due to the cost of context switching between processes and interrupt handling.

Answer 15

Proportional share schedulers operate by allocating T shares among all appli-cations. An application can receive N shares of time, thus ensuring that the application will have N∕T of the total processor time. As an example, assume that a total of T = 100 shares is to be divided among three processes, A, B, and C. A is assigned 50 shares, B is assigned 15 shares, and C is assigned 20 shares. This scheme ensures that A will have 50 percent of total processor time, B will have 15 percent, and C will have 20 percent. Proportional share schedulers must work in conjunction with an admission-control policy to guarantee that an application receives its allocated shares of time. An admission-control policy will admit a client requesting a particular number of shares only if suff i cient shares are available. In our current example, we have allocated 50 + 15 + 20 = 85 shares of the total of 100 shares. If a new process D requested 30 shares, the admission controller would deny D entry into the system.

Answer 16

ThePOSIXstandard also provides extensions for real-time computing— POSIX.1b. Here, we cover some of thePOSIX APIrelated to scheduling real-time threads.POSIXdef i nes two scheduling classes for real-time threads: •SCHED FIFO •SCHED RR SCHED FIFOschedules threads according to a f i rst-come, f i rst-served policy using aFIFOqueue as outlined in Section 5.3.1. However, there is no time slic-ing among threads of equal priority. Therefore, the highest-priority real-time thread at the front of theFIFOqueue will be granted theCPUuntil it termi-nates or blocks.SCHED RRuses a round-robin policy. It is similar toSCHED FIFO except that it provides time slicing among threads of equal priority.POSIX provides an additional scheduling class—SCHED OTHER—but its implemen-tation is undef i ned and system specif i c; it may behave differently on different systems. ThePOSIX APIspecif i es the following two functions for getting and setting the scheduling policy: •pthread attr getschedpolicy(pthread attr t *attr, int *policy) •pthread attr setschedpolicy(pthread attr t *attr, int policy) The f i rst parameter to both functions is a pointer to the set of attributes for the thread. The second parameter is either (1) a pointer to an integer that is set to the current scheduling policy (forpthread attr getsched policy()) or (2) an integer value (SCHED FIFO,SCHED RR, orSCHED OTHER) for the pthread attr setsched policy()function. Both functions return nonzero values if an error occurs. In Figure 5.25, we illustrate aPOSIXPthread program using thisAPI. This program f i rst determines the current scheduling policy and then sets the scheduling algorithm toSCHED FIFO.

Chapter 5 Flashcards

(40 cards)