Midterm Flashcards

Question

Do you understand the need for using a while() look for the predicate check in the critical section entry code examples in the lessons?

Answer 1

When a condition variable is signaled on, and a thread is to be "woken up", two things occur: The thread is removed from the wait queue The thread reacquires the mutex The two steps are logically independent. What this means is that another thread may be given CPU time (and potentially the mutex) in between these two steps. As a result, the shared state may be updated before the waking up thread reacquires the mutex. Therefore, the thread must do another check when it actually acquires the mutex to ensure the condition it needs to proceed is still true.

Answer 2

Simplest way to prevent deadlocks is to maintain a lock order. This will prevent a cycle in the wait graph, which is necessary and sufficient for a deadlock to occur. Two threads can deadlock if they are each trying to acquire mutexes the other holds. A thread can also deadlock with itself if it tries to acquire a mutex it already holds. Enforcing a lock order can ensure that neither of these scenarios occurs.

Answer 3

User level threads are associated with a user level library (like pthreads), while kernel level threads are associated with the kernel threading implementation (like NPTL). For a user thread to run, it must be associated with a kernel thread, which in turn must be scheduled on the CPU. In the one-to-one model, there is one kernel thread for every user thread. That means that when a user thread is created, a kernel thread is also created. This 1:1 model is the current situation in Linux and is supported at the kernel level by the task struct. The benefit of the approach is that the kernel understands that the process is multithreaded, and it also understands what those threads need. Since the operating system already supports threading mechanisms to manage its thread, the user libraries can benefit directly from the multithreading support available in the kernel. One downside of this approach is that is it expensive: for every operation we must go to the kernel and pay the cost of a system call. Another downside is that since we are relying on the mechanisms and policies supported by the kernel, we are limited to only those policies and mechanisms. As well, execution of our applications on different operating systems may give different results. In a so-called many-to-many scenario, there can be one or more user threads scheduled on one or more kernel threads. The kernel is aware that the process is multithreaded since it has assigned multiple kernel level threads to the process. This means that if one kernel level thread blocks on an operation, we can context switch to another, and the process as a whole can proceed. One of the downsides of this model is that is requires extra coordination between the user- and kernel-level thread managers. Signaling in a many to many scenario comes with complexities. If the kernel thread has a signal enabled but the user thread does not, the user threading library may have to send directed signals back down into the kernel to get the right user level thread to respond to the signal.

Answer 4

The mechanisms described in the Solaris paper were all mechanisms for rationing the number of threads that are created at any given point in time. Configuring the degree of concurrency implied that you couldn't just be automatically given as much as you needed. Having LWPs multiplex over kernel threads indicated that kernel threads needed to be shared. The rationing was based on real concerns about the available memory in a system at any given time. The native implementation of threads in Linux is the Native POSIX Threads Library (NPTL). This is a 1:1 model, meaning that there is a kernel level task for each user level thread. In NPTL, the kernel sees every user level thread. This is acceptable because kernel trapping has become much cheaper, so user/kernel crossings are much more affordable. Also, modern platforms have more memory - removing the constraints to keep the number of kernel threads as small as possible.

Answer 5

Interrupts Interrupts are signals that the hardware sends to the CPU that signal that something has occurred. For example, when a user-level application tries to perform a illegal task using the hardware, the kernel is notified via an interrupt. An interrupt is handled on a per-CPU basis, and the operating system maintains an interrupt table, which maps interrupts by number to handling procedures. When the interrupt occurs, the kernel jumps to the associated interrupt handler and executes that code. Which interrupts occur is a function of the platform on which you are running. How those interrupts are handled is a function of the OS on top of the physical system. Signals Signals are notifications that are delivered from the kernel to a user process. For example, if a process tries to access memory it has not allocated, the operating system may throw a SIGSEGV (segmentation fault). Signals are operating system specific. Each process maintains its own signal handling table, which is very similar to kernel-level interrupt handling table. Each entry contains a reference to the signal and a reference to the handling code. When a signal comes in, the process jumps to the handling code. Threads cannot have their own handler, although they can set their own signal masks to ensure that they can disable signals they don't want to receive.

Answer 6

Since handlers are executed within a thread's stack, there is the potential for a thread to deadlock with itself if it tries to lock a mutex it has already acquired. This is because the current stack frame needs the mutex that was acquired in a lower stack frame to be released. The solution is to have signal handlers execute in another thread. This way, the signal handling code can contend for a mutex like any other thread, which removes the possibility of deadlock. Another solution is to have threads alter their signal masks before entering and after exiting their critical sections. While this solution requires fewer SPARC instructions than creating a new thread to handle signals, mutex acquisition happens much more frequently than signals. This is another example of optimizing for the common case!

Answer 7

Multiprocess The main benefits of the MP model are its simplicity. Each request is handled in a new process with it's own address space: there is no need for any (explicit) synchronization code. However, this benefit comes at the cost of performance. First, the memory foot print of the MP implementation is large. Processes have discrete address spaces, which means that memory usage grows steeply with request count. As a result, there is less memory available for caching, which means that computation becomes disk-bound more quickly. As well, context switching slows down the performance of the MP implementation. Context switching between processes consumes CPU cycles, which could otherwise be spent handling requests. Multithreaded The main benefits of the MT model, compared to the MP model are that it is more efficient. MT applications share the address space of the process they are contained in, so they are more memory efficient than MP applications. As well, this sharing allows for smaller context switches (and often leaves hotter hardware caches), which can help performance. Unfortunately, this benefit comes at the cost of complexity. MT applications require explicit synchronization code to be written (mutexes, condition variables, etc). In addition, MT applications require kernel MT support, which wasn't a sure bet at the time of this paper.

Answer 8

The event-based model operated primarily in one thread, which makes its memory footprint smaller than both the MT and MP model. As well, the cost of context switching was not (as) present. That being said, helper processes were involved, but their utilization was as needed: they weren't created blindly for every new request entering the system. The benefit of the smaller memory footprint was that more memory was available for the various types of caching done in the system. This meant that Flash could delay becoming disk-bound longer than the MT/MP models. Limitations of the main approach behind Flash was that not every kernel supported asynchronous I/O. As a result, Flash had to fake it in a sense, utilizing helper processes to make I/O look async. I think that the AMTED model would perform better than the AMPED model, as the MT model uses less memory than the MP model, which again allows for a bigger cache. I think the main reason for choosing process-based helpers was that kernel support for multithreading was spotty at the time of the writing of the paper; thus, a process based approach was the most portable. That being said, I think the performance differential would be on par with that between the standard MP/MT models, which is to say, not that much.

Answer 9

Single File Test This test was a test of "best case": to show what performance would look like across the servers when the request pattern was optimal. To achieve this, requests were made for a single file which varied in size. SPED performed better than AMPED - which performs the extra check to see if the file is in memory (not an issue here because its one file and is always going to be in memory). Both performed better than MT/MP, which were slowed by context switching overhead. Owlnet Trace This tests workloads that can primarily be served from cache, but not completely. AMPED performs slightly better than SPED here. Since some of the work is disk-bound SPED blocks where AMPED context switches. Both of these approaches are better than MT/MP models, which have more memory allocated to them (less for cache size), and still incur the overhead of context switching. CS Trace This tests workloads that become disk-bound very quickly. In this case, AMPED, and MT/MP smoke SPED. Since SPED is single-threaded, with not helpers, SPED blocks a lot when the workload is disk-bound. AMPED only has enough helper processes to keep the disk busy, so it has a smaller memory footprint and few context switching overheads than MP/MT Optimizations Looking at connection rate as a result of file size across the various combination of optimizations - pathname lookup caching, response header caching, and mapped file caching - shows that the combination containing all optimizations can handle the highest connection rate. Performance Under WAN MT/AMPED/STED all caused stable performance improvements when more clients were added. The per-process overhead of the MP model caused performance to degrade in this model.

Answer 10

Single file Adding more threads would likely increase the throughput of the GETFILE server. Files need to be read from disk into memory before they can be served. Once a file is read into memory, it can stay in memory until its contents need to be replaced. More threads would be able to serve more concurrent requests. Random requests across a large pool of large files Adding more threads may not have a huge impact on throughput in this scenario. Any given new request is not likely to be able to read from main memory, since the range of inputs is so wide. This means that this application would likely be disk-bound from the get-go. Increasing the number of threads allows for more concurrent requests to be processed, but each thread will likely have to pull content from disk. I would say that the graph for throughput vs threads in both cases would be logarithmically increasing, with the graph for the single file example rising much more sharply than the graph for the large pool.

Midterm Flashcards

(34 cards)