Part 2 Flashcards

1
Q

Process vs. thread, describe the distinctions. What happens on a process vs. thread context switch.

A

process

  • any user application that is running on the operating system.
  • has it’s own memory space allocated to it. Both heap and stack allocations in virtual memory belong to said process. The process consists of its address space - this includes: the code (text), the data that’s available when the process is initialized, the heap, and the stack. As the process executes dynamic data is added to the heap. The stack is a LIFO (last in first out) data structure that is useful for process execution when an applications needs to jump to another location to execute something and later return to a prior position. During a process context switch all information about the process that is being preempted that is tracking by the CPU will be updated in the process control block for that process. The CPU will then have the information loaded for a different process, and then switched back when the context switch happens in reverse. It may also require data be removed from the processor cache to make room for the other process.

A thread is similar to a process except in the case of multiple instances it has it’s own program counter, stack pointer, stack and thread-specific registers. But it shares the same virtual address space with other threads.. When a context switch occurs, the running thread stores its execution context (its program counter, stack pointer, and register values) in memory, and the new thread’s execution context is loaded from memory onto the processor. This is faster than a context switch between processes, because threads don’t need the costly virtual to physical address mappings to be swapped out or recreated. This also leads to hotter caches during switches which is a performance optimization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the states in a lifetime of a process?

A

New - First state when a process is created
Ready - Once OS admits process it will get a PCB and some initial resources, a running process is interrupted (context switch)
Running - OS Scheduler gives CPU to a process
Waiting - I/O event (or some other long running event/operation), causes a Ready state after I/O or event completes
Terminated - Process finishes all operations or encounters error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

lifetime of a thread

A

Threads are created when a parent process/thread calls a thread creation mechanism. The threads then run synchronously (unless blocked/waiting). They can be signalled or broadcasted to in order to check if they need to continue blocking or continue executing. The parent process/thread can call a join mechanism to block itself until a child completes after which it’s result can be returned to said parent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe all the steps which take place for a process to transition from a waiting (blocked) state to a running (executing on the CPU) state.

A

A waiting process will wait until it’s current event or operating that caused the WAITING state to finish. It will then transition to a READY state. Once in the READY state the process can be scheduled by the scheduler to have a CPU in which case it will enter the RUNNING state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the pros-and-cons of message-based vs. shared-memory-based IPC.

A

Message-based IPC uses a OS provided communication channel to allow processes to send messages to each other. This is good because the operating system maintains this communication channel for the processes so the API is more universally implemented. It can also be bad because it requires a lot of overhead. The processes have to copy information into the communication channel into kernel memory through the OS.

Shared-memory-based IPC is implemented when the OS maps a shared memory space for processes to share. Both/all processes can access this shared memory space as if it were their own. This gets the OS out of the way which is good for performance, but could be bad because the OS no longer manages that address space it’s up to the processes which can be bug prone and processes must know how to handle said shared memory space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are benefits of multithreading? When is it useful to add more threads, when does adding threads lead to pure overhead? What are the possible sources of overhead associated with multithreading?

A

Multithreading allows parallelization to occur which helps achieve overall performance/speed increases and/or execution time decreases. Multiple threads of an application can be at different points in execution handling more input at the same time (especially on multi-core systems). Threads can also be assigned long execution and block tasks so the main application or other threads can continue processing information/input while other threads wait for slower devices like I/O. Multithreading requires less memory because threads can share address space. This means the application requires less memory which could result in less memory swaps.

Depending on the input and tasks of an application it could be beneficial to add more threads. For example in a pipeline pattern it could make senses to match the number of threads to the number of steps in the pipeline or perhaps several threads per step (for the longer/more involved steps). In boss-worker patterns it might be determinantal to add more threads dynamically if there isn’t enough work for those threads to do.

Additional overhead in multithreading can occur when a boss thread has to manage a pool of worker threads. It may not know exactly what each thread is doing or what it did last so it is difficult to know during execution time which threads may be more/less efficient at certain tasks. The application must also deal with the overhead of keeping the shared buffer synchronized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the boss-worked multithreading pattern. If you need to improve a performance metric like throughput or response time, what could you do in a boss-worker model? What are the limiting factors in improving performance with this pattern?

A

The boss-worker pattern is a multithreading pattern where you have one boss/main thread which creates one or more children/worker threads to complete tasks. This pattern assumes each worker thread fully complete the task, in that, each worker will fully handle all portions/subtasks of a task. The boss thread will then call joins and wait for worker thread executions before finishing execution or moving on to another portion of the application logic.

One way to improve the boss-worker pattern is to have the boss simply add work to a queue which the worker threads can then work on. This gives the boss less one-on-one work with each thread and it can focus on creating a queue. The queue can be created fully before the worker threads are created, or the threads could be created and then the boss can start adding work to the queue for even more efficiency in response time/throughput.

A limiting factor of this pattern is the overhead associated with the synchronization requirements of shared buffer resources, in addition this pattern ignores locality (meaning the boss doesn’t try to positively affect hot cache performance opportunities).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the pipelined multithreading pattern. If you need to improve a performance metric like throughput or response time, what could you do in a pipelined model? What are the limiting factors in improving performance with this pattern?

A

The pipeline pattern is a multithreading pattern where you have n number of threads that each handle one subtask in a series of tasks to accomplish the overall purpose of the application. The first thread(s) pass on the tasks to the second thread(s) to perform a separate task that relies on the first task being complete, so on, until the end of the overall process.

Pipeline patterns can be improved when more information is gathered about how long each subtask will take, or at the very least how much time each subtask takes relative to each other. For example if subtask_1 takes 10 ms and subtask_2 takes 30 ms then it makes sense to have 3 times as many subtask_2 threads than subtask_1 threads.

A limiting factor of this design pattern is the overhead involved with synchronizing threads as well as balancing (accurately determining the number of threads that should be performing each subtask).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are mutexes?

A

Mutexes is a mechanism employed in multithreading program to enable mutual exclusion within the execution of concurrent threads. In other words mutexes protect shared information from being updated simultaneously from different areas at the same time. Mutexes are used like locks to ensure access to share information happens exclusively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are condition variables?

A

Condition Variables are supplemental mechanisms to mutexes that allow a more fine grained control over the behaviour of multiple executing threads. A condition variable keeps track of which threads are waiting on a certain set of criteria to be true. Condition variables are used in a multithreading library API via wait() commands. Threads can then signal or broadcast to condition variables to wake other threads that are waiting on the same condition(s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Read/Write Critical Sections

A

Read/Write Critical Sections

Mutex m;
Condition read_phase, write_phase;
int num_reader = 0, num_write = 0;

//READER(S)

// lock the mutex - critical enter
Lock(m) {
	// make sure there are no writers, wait if there are
	while(num_write > 0) {
		Wait(m, read_phase);
	}
num_reader++; }
// read data
// during this time num_reader > 0
// Lock mutex - critical exit
Lock(m) {
	num_reader--;
}
// signal to the waiting writer
signal(write_phase);

//WRITER

// Lock the mutex - critical enter
lock(m) {
	// make sure there are no other readers or writer, wait
	while(num_reader > 0 || num_writer > 0) {
		Wait(m, write_phase);
	}
num_writer++; }
// write data
// during this time num_writer = 1
// Lock mutex - critical exit
lock(m) {
	num_writer--;
}

// broadcast to other waiting readers
broadcast(read_phase);
// signal to a waiting writer
signal(write_phase);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Read/Write Critical Sections with Priority Read

A

Mutex m;
Condition read_phase, write_phase;
int num_reader = 0, num_write = 0;

//READER(S)

// lock the mutex - critical enter
Lock(m) {
	// make sure there are no writers, wait if there are
	while(num_write > 0) {
		Wait(m, read_phase);
	}
num_reader++; }
// read data
// during this time num_reader is > 0
// Lock mutex - critical exit
Lock(m) {
	num_reader--;
	// signal writer if no other readers
	if(num_reader == 0) {
		signal(write_phase);
	} else {
		broadcast(read_phase);
	}
}

//WRITER

// Lock the mutex - critical enter
lock(m) {
	// no other readers or writer, wait if there is
	while(num_reader > 0 || num_writer > 0) {
		Wait(m, write_phase);
	}
num_writer++; }
// write data
// during this time num_writer = 1
// Lock mutex - critical exit
lock(m) {
	num_writer--;
}

// broadcast to other waiting readers
broadcast(read_phase);
// signal to a waiting writer
signal(write_phase)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Read/Write Critical Sections with Priority Write

A

similar to Priority Read you just have to ensure the broadcast/signal happens in the mutex lock since you need access to the num_write shared variable. If there is another waiting writer it would get priority over broadcasting to waiting readers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Spurious wake-ups

A

occur when a thread completes a task and then before unlocking a mutex begins to signal/broadcast to other threads that will need to acquire that same mutex lock to continue processing. Since the initial thread still has the lock the other threads won’t be able to make any progress. You can avoid this problem by simply letting the mutex be unlocked in the initial thread and then making signal/broadcast calls. This isn’t always possible however if you need a priority in which threads should be notified first as you will probably need to access (read) shared variables to determine which other threads should be notified via the conditional variable(s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

while() loops in the critical_section_enter of a thread task are required because

A

because even if a thread can acquire the mutex lock it doesn’t mean the proxy variable (shared) between threads didn’t change just before the lock was acquired. Another thread could have already acquired the lock and was signalled to continue, if it was first the while loop ensures mutual exclusion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

simple way to prevent deadlocks? Why?

A

A simple way (if the application intention will allow it) is to use only one lock/mutex. This works because as long as the mutex is used correctly with lock/unlock it will have exclusivity in any given executing thread. This is problematic in that it’s quite restrictive and limits parallelism.

A more difficult approach would be to maintain a lock order. Example: you must lock mutex A before you can lock mutex B.

17
Q

Can you explain the relationship among kernel vs. user-level threads? Think though a general mxn scenario (as described in the Solaris papers), and in the current Linux model. What happens during scheduling, synchronization and signaling in these cases?

A

Kernel and user-level threads are essentially abstractions (via data structures). In a general many-to-many scenario kernel threads are assigned light-weight processes which are associated with one or more user-level threads. When a user-level thread is created it returns a thread ID which is not a pointer but instead an index in a table that points to the user-level thread data structure. The thread size is known at compile time which means it can be allocated in contiguous memory which helps improve locality and memory access. The OS/kernel does not know about the existence or properties of each user-level thread.

The kernel-level data structures have several data structures:

Process which tracks a list of the kernel threads, signal handlers, virtual address space, and user credentials.

Light-Weight Process (LWP) is close to a user-level thread but is visible to the kernel. It’s not needed when the specific process it is associated with is not running. It contains the user level registers, system call arguments, resource usage info (metrics), and a signal mask. It also keeps track of information regarding one more more user-level threads in the process it’s associated with at any given time.

Kernel-level Thread is a data structure that contains kernel-level registers, a stack pointer, scheduling info, pointers to an associated LWP, process, and CPU structure(s). This is always required to be loaded (not swappable).

CPU keeps track of it’s current (kernel) thread, the list of kernel-level thread(s), and dispatching/interrupt handling information.

Special signals and system calls allow the user and kernel level threads to coordinate. Even in many-to-many scenarios user-level threads can be bound to kernel-level threads (called a “bound” thread). This is similar to the idea that a kernel may “pin” a kernel-thread to a specific CPU.

In Linux user-level threads are always bound to kernel level threads. This makes handling signals and scheduling much simpler since each thread is bound to each other.

18
Q

Can you explain why some of the mechanisms described in the Solaris papers (for configuring the degree concurrency, for signaling, the use of LWP…) are not used or necessary in the current threads model in Linux?

A

Linux uses 1-to-1 user to kernel threading so the extra functionality/mechanisms provided by LWP aren’t necessary. Signaling is more straightforward as well in this design in that you don’t need the additional flag described by the Solaris paper because the kernel thread that is going to send a signal to the process thread is matches 1-to-1 and might have the ability to see it’s in a critical section meaning it will wait to avoid potential deadlocks?

“At the time of the Solaris paper memory was very constrained, and only a few kernel threads could be stored in physical memory at a time.” - Jonathan Cox

“Nowadays it’s possible to store a large number of kernel-level thread data structures in physical memory without running out of space. That’s why most modern operating systems opt for a 1:1 threading model. The advantages (the kernel knows about all user-level threads and can make smarter scheduling decisions) outweigh the disadvantages (more memory usage).” - Jonathan Cox

19
Q

What’s an interrupt?

A

An interrupt is an event from some external device (external to the CPU that retrieves the interrupt) notifying the CPU that something has occurred. It could be a timer notifying that a timeout has occurred, an I/O device announcing the arrival of a network packet. Interrupts happen asynchronously.

When a device interrupts the CPU it sends a unique message to the CPU. The hardware defined message is looked up in a interrupt handler table and the handler is called. The PC is moved to the address of the handler code and executed from that point.

20
Q

What’s a signal?

A

A signal is an event that comes from the software running on the OS or by the CPU hardware itself. Signals can appear both asynchronously and synchronously.

Similarly to interrupts when a signal event happens the OS signals a process (instead of the device interrupting the CPU). The process then has signal specific handlers in the signal handler table based on the OS defined signals there can also be default OS handlers for signals if they aren’t specified by the individual process.

21
Q

Can each process configure their own signal handler? Can each thread have their own signal handler?

A

Each process can handle their own signal as the signal handler table is process specific. Threads cannot set their own signal handler, only their signal mask. A KLT can call a wrapper routine that has access to all threads running in the process. It can find a ULT that can handle the signal handler even if the KLT signal it originated from is running a ULT that doesn’t handle that signal.

22
Q

What’s the potential issue if a interrupt or signal handler needs to lock a mutex? What’s the workaround described in the Solaris papers?

A

If an interrupt or signal handler needs to lock a mutex that is already in use by the thread that was interrupted by said interrupt/signal handler it will cause a deadlock.

The Solaris paper describes a solution where the threads library sets/clears a special flag in the threads structure whenever it enters/exists a critical section. This flag indicates all signals should be masked while the thread is in a critical section. (Implementing Lightweight Threads, D. Stein, D. Shah, “Signal Safe Critical Sections” on Page 8).

23
Q

Contrast the pros-and-cons of a multithreaded (MT) and multiprocess (MP) implementation of a webserver, as described in the Flash paper.

A

A MP implementation of a simple webserver has one main benefit and that is it’s simplicity. The downsides of this implementation are numerous. The MP implementation requires a lot of context switching (as certain parts of the process require longer tasks like I/O), each process has it’s own address space which can be good because synchronization is less important, but it requires a larger memory footprint the more requests you want to be able to handle. The separate address spaces also prevent the processes from sharing a cache.

A MT implementation requires that the operating system supports kernel threads. Each thread handles a full HTTP request in this implementation. The pros and cons are essentially the inverse of a MP simple webserver. Shared address space for a smaller memory footprint and cheaper context switches. This implementation is not as simple though and requires more complicated programming. It also requires synchronization with HTTP requests coming in to the server.

24
Q

What are the benefits of the event-based model described in the Flash paper over MT and MP? What are the limitations? Would you convert the AMPED model into a AMTED (async multi-threaded event-driven)? How do you think an AMTED version of Flash would compare to the AMPED version of Flash?

A

An event-based model is essentially a state machine. It’s a looping Event Dispatcher which based on incoming notifications/events calls specific event handlers to handle each process of an HTTP request. The dispatcher allows handlers to execute to completion, if handlers need to block the dispatcher will initiate the blocking operating and continue looping for additional events from HTTP requests.

The largest benefit of an event-based model is there are no context-switches and the entire process happens in the same address space with no synchronization required. There is no risk of context switches being needed to hide latency (t_idle > 2 * t_ctx_switch). In the case when there is more than one CPU the event-based model still works because multiple event-based processes can be running on each CPU each handling multiple requests. Some of the limitations of this model are the cache pollution (each event handler may require certain things to be loaded but not others)/loss of localities. Another issue is blocking I/O events can cause the entire model to block.

An AMPED model is an improvement on the SPED model. It essentially makes use of helpers that directly deal with the blocking I/O operations which allows the main dispatcher to continue to pick up requests and serve them to handlers. These helpers communicate directly with the dispatcher via pipes. These helpers are processes themselves which handle the blocking operations.

The AMTED model is similar to the AMPED model with the added benefit that the helpers can be threads instead of processes. This can be useful if you have a kernel that supports threading as well; however, it is necessarily more complicated to program than an AMPED implementation for the same reason threads are more complicated than processes.

“An AMTED model would probably be more efficient than an AMPED model because switching between threads is less costly than switching between processes.” - Jonathan Cox