Chapter 3A Flashcards

(106 cards)

1
Q

A process

A

is a program in execution, and the status of the current activity of a process is represented by
the program counter, as well as other registers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A process is independent

A

if it does not share data with any other processes executing in the system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

process is cooperating

A

if it can affect or be affected by the other processes executing in the system.
Clearly, any process that shares data with other processes is a cooperating process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

As a process executes, it changes state

A

The state of a process is defined in part by the current activity of
that process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of processes:

A
  • New. The process is being created.
  • Running. Instructions are being executed.
  • Waiting. The process is waiting for some event to occur (such as an I/O completion or reception of a
    signal).
  • Ready. The process is waiting to be assigned to a processor.
  • Terminated. The process has finished execution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Each process is represented in the operating system by a process control block (PCB)—also called a task
control block

A

. A PCB is shown in Figure 3.3. It contains many pieces of information associated with a
specific process, including these: * Process state. The state may be new, ready, running, waiting, halted,
and so on. * Program counter. The counter indicates the address of the next instruction to be executed for
this process.
*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  • CPU registers.
A

The registers vary in number and type, depending on the computer architecture. They
include accumulators, index registers, stack pointers, and general-purpose registers, plus any
condition-code information. Along with the program counter, this state information must be saved when
an interrupt occurs, to allow the process to be continued correctly afterward when it is rescheduled to run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • CPU-scheduling information
A

This information includes a process priority, pointers to scheduling
queues, and any other scheduling parameters. (Chapter 5 describes process scheduling.) *

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Memory-management information

A

This information may include such items as the value of the base and
limit registers and the page tables, or the segment tables, depending on the memory system used by the
operating system (Chapter 9). *

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  • Accounting information.
A

This information includes the amount of CPU
and real time used, time limits, account numbers, job or process numbers, and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  • I/O status
    information.
A

This information includes the list of I/O devices allocated to the process, a list of open files,
and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Threads

A

The process model discussed so far has implied that a process is a program that performs a single
thread of execution. For example, when a process is running a word-processor program, a single thread of
instructions is being executed. This single thread of control allows the process to perform only one task at
a time. Thus, the user cannot simultaneously type in characters and run the spell checker. Most modern
operating systems have extended the process concept to allow a process to have multiple threads of
execution and thus to perform more than one task at a time. This feature is especially beneficial on
multicore systems, where multiple threads can run in parallel. A multithreaded word processor could, for
example, assign one thread to manage user input while another thread runs the spell checker. On systems
that support threads, the PCB is expanded to include information for each thread. Other changes
throughout the system are also needed to support threads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The number of processes currently in memory is known as the degree of multiprogramming.

A

Balancing
the objectives of multiprogramming and time sharing also requires taking the general behavior of a
process into account. In general, most processes can be described as either I/O bound or CPU bound. An
I/O-bound process is one that spends more of its time doing I/O than it spends doing computations. A
CPU-bound process, in contrast, generates I/O requests infrequently, using more of its time doing
computations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

As processes enter the system, they are put into a ready queue, where they are ready and waiting to
execute on a CPU’s core

A

This queue is generally stored as a linked list; a ready-queue header contains
pointers to the first PCB in the list, and each PCB includes a pointer field that points to the next PCB in
the ready queue. The system also includes other queues. When a process is allocated a CPU core, it
executes for a while and eventually terminates, is interrupted, or waits for the occurrence of a particular
event, such as the completion of an I/O request. Suppose the process makes an I/O request to a device
such as a disk. Since devices run significantly slower than processors, the process will have to wait for the
I/O to become available. Processes that are waiting for a certain event to occur — such as completion of
I/O — are placed in a wait queue (Figure 3.4). A common representation of process scheduling is a
queueing diagram, such as that in Figure 3.5. Two types of queues are present: the ready queue and a set
of wait queues. The circles represent the resources that serve the queues, and the arrows indicate the flow
of processes in the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Once the process is allocated a CPU core and is executing, one of several events could occur:

A
  • The process could issue an I/O request and then be placed in an I/O wait queue. * The process could
    create a new child process and then be placed in a wait queue while it awaits the child’s termination. * The
    process could be removed forcibly from the core, as a result of an interrupt or having its time slice expire,
    and be put back in the ready queue. In the first two cases, the process eventually switches from the
    waiting state to the ready state and is then put back in the ready queue. A process continues this cycle
    until it terminates, at which time it is removed from all queues and has its PCB and resources deallocated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Tree of processes

A

During the course of execution, a process may create several new processes. As
mentioned earlier, the creating process is called a parent process, and the new processes are called the
children of that process. Each of these new processes may in turn create other processes, forming a tree of
processes
PID -> Most operating systems (including UNIX, Linux, and Windows) identify processes according to a
unique process identifier (or pid), which is typically an integer number. The pid provides a unique value
for each process in the system, and it can be used as an index to access various attributes of a process
within the kernel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In general, when a process creates a child process,

A

s, that child process will need certain resources (CPU
time, memory, files, I/O devices) to accomplish its task. A child process may be able to obtain its
resources directly from the operating system, or it may be constrained to a subset of the resources of the
parent process. The parent may have to partition its resources among its children, or it may be able to
share some resources (such as memory or files) among several of its children. Restricting a child process
to a subset of the parent’s resources prevents any process from overloading the system by creating too
many child processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When a process creates a new process, two possibilities for execution exist:

A
  1. The parent continues to execute concurrently with its children. 2. The parent waits until some or all of
    its children have terminated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

There are also two address-space possibilities for the new process:

A
  1. The child process is a duplicate of the parent process (it has the same program and data as the parent).
  2. The child process has a new program loaded into it.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

FORK() SYSTEM CALL

A

A new process is created by the fork() system call. The new process consists of
a copy of the address space of the original process. This mechanism allows the parent process to
communicate easily with its child process. Both processes (the parent and the child) continue execution at
the instruction after the fork(), with one difference: the return code for the fork() is zero for the new
(child) process, whereas the (nonzero) process identifier of the child is returned to the parent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

EXEC() SYSTEM CALL

A

exec() system call to replace the process’s memory space with a new program.
The exec() system call loads a binary file into memory (destroying the memory image of the program
containing the exec() system call) and starts its execution. In this manner, the two processes are able to
communicate and then go their separate ways. The parent can then create more children; or, if it has
nothing else to do while the child runs, it can issue a wait() system call to move itself off the ready queue
until the termination of the child. Because the call to exec() overlays the process’s address space with a
new program, exec() does not return control unless an error occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The parent waits for the child process

A

s to complete with the wait() system call. When the child process
completes (by either implicitly or explicitly invoking exit()), the parent process resumes from the call to
wait(), where it completes using the exit() system call.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Windows system calls

A

As an alternative example, we next consider process creation in Windows.
Processes are created in the Windows API using the CreateProcess() function, which is similar to fork() in
that a parent creates a new child process. However, whereas fork() has the child process inheriting the
address space of its parent, CreateProcess() requires loading a specified program into the address space of
the child process at process creation. Furthermore, whereas fork() is passed no parameters,
CreateProcess() expects no fewer than ten parameters. The C program shown in Figure 3.10 illustrates the
CreateProcess() function, which creates a child process that loads the application mspaint.exe. We opt for
many of the default values of the ten parameters passed to CreateProcess(). Readers interested in pursuing
the details of process creation and management in the Windows API are encouraged to consult the
bibliographical notes at the end of this chapter. The two parameters passed to the CreateProcess() function
are instances of the STARTUPINFO and PROCESS INFORMATION structures. STARTUPINFO
specifies many properties of the new process, such as window size and appearance and handles to
standard input and output files. The PROCESS INFORMATION structure contains a handle and the
identifiers to the newly created process and its thread. We invoke the ZeroMemory() function to allocate
memory for each of these structures before proceeding with CreateProcess(). The first two parameters
passed to CreateProcess() are the application name and command-line parameters. If the application name
is NULL (as it is in this case), the command-line parameter specifies the application to load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Process Termination

A

A process terminates when it finishes executing its final statement and asks the
operating system to delete it by using the exit() system call. At that point, the process may return a status
value (typically an integer) to its waiting parent process (via the wait() system call). All the resources of
the process —including physical and virtual memory, open files, and I/O buffers—are deallocated and
reclaimed by the operating system. Termination can occur in other circumstances as well. A process can
cause the termination of another process via an appropriate system call (for example, TerminateProcess()
in Windows). Usually, such a system call can be invoked only by the parent of the process that is to be
terminated. Otherwise, a user— or a misbehaving application—could arbitrarily kill another user’s
processes. Note that a parent needs to know the identities of its children if it is to terminate them. Thus,
when one process creates a new process, the identity of the newly created process is passed to the parent.
A parent may terminate the execution of one of its children for a variety of reasons, such as these: * The
child has exceeded its usage of some of the resources that it has been allocated. (To determine whether
this has occurred, the parent must have a mechanism to inspect the state of its children.) * The task
assigned to the child is no longer required. * The parent is exiting, and the operating system does not
allow a child to continue if its parent terminates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Cascading termination
is normally initiated by the operating system. To illustrate process execution and termination, consider that, in Linux and UNIX systems, we can terminate a process by using the exit() system call, providing an exit status as a parameter: /* exit with status 1 */ exit(
26
under normal termination -> exit(
will be called either directly (as shown above) or indirectly, as the C run-time library (which is added to UNIX executable files) will include a call to exit() by default
27
wait() system call
> A parent process may wait for the termination of a child process by using the wait() system call. The wait() system call is passed a parameter that allows the parent to obtain the exit status of the child. This system call also returns the process identifier of the terminated child so that the parent can tell which of its children has terminated: pid t pid; int status; pid = wait(&status); When a process terminates, its resources are deallocated by the operating system. However, its entry in the process table must remain there until the parent calls wait(), because the process table contains the process’s exit status.
28
Orphans -
Now consider what would happen if a parent did not invoke wait() and instead terminated, thereby leaving its child processes as orphans. Traditional UNIX systems addressed this scenario by assigning the init process as the new parent to orphan processes. (Recall from Section 3.3.1 that init serves as the root of the process hierarchy in UNIX systems.) The init process periodically invokes wait(), thereby allowing the exit status of any orphaned process to be collected and releasing the orphan’s process identifier and process-table entry. Although most Linux systems have replaced init with systemd, the latter process can still serve the same role, although Linux also allows processes other than systemd to inherit orphan processes and manage their termination.
29
Android Process Hierarchy
30
* Foreground process—
The current process visible on the screen, representing the application the user is currently interacting with *
31
* Visible process—
A process that is not directly visible on the foreground but that is performing an activity that the foreground process is referring to (that is, a process performing an activity whose status is displayed on the foreground process)
32
* Service process
—A process that is similar to a background process but is performing an activity that is apparent to the user (such as streaming music)
33
* Background process—
—A process that may be performing an activity but is not apparent to the user
34
* Empty process
A process that holds no active components associated with any application
35
Context switch
As mentioned in Section 1.2.1, interrupts cause the operating system to change a CPU core from its current task and to run a kernel routine. Such operations happen frequently on general-purpose systems. When an interrupt occurs, the system needs to save the current context of the process running on the CPU core so that it can restore that context when its processing is done, essentially suspending the process and then resuming it. The context is represented in the PCB of the process. It includes the value of the CPU registers, the process state (see Figure 3.2), and memory-management information. Generically, we perform a state save of the current state of the CPU core, be it in kernel or user mode, and then a state restore to resume operations. Switching the CPU core to another process requires performing a state save of the current process and a state restore of a different process. This task is known as a context switch and is illustrated in Figure 3.6. When a context switch occurs, the kernel saves the context of the old process in its PCB and loads the saved context of the new process scheduled to run. Contextswitch time is pure overhead, because the system does no useful work while switching. Switching speed varies from machine to machine, depending on the memory speed, the number of registers that must be copied, and the existence of special instructions (such as a single instruction to load or store all registers). A typical speed is a several microseconds. There are several reasons for providing an environment that allows process cooperation:
36
* Information sharing
g. Since several applications may be interested in the same piece of information (for instance, copying and pasting), we must provide an environment to allow concurrent access to such information.
37
* Computation speedup
. If we want a particular task to run faster, we must break it into subtasks, each of which will be executing in parallel with the others. Notice that such a speedup can be achieved only if the computer has multiple processing cores.
38
* Modularity.
We may want to construct the system in a modular fashion, dividing the system functions into separate processes or threads, as we discussed in Chapter 2.
39
Cooperating processes require an interprocess communication (IPC)
mechanism that will allow them to exchange data— that is, send data to and receive data from each other
40
Shared memory -
- In the shared-memory model, a region of memory that is shared by the cooperating processes is established. Processes can then exchange information by reading and writing data to the shared region. Shared memory can be faster than message passing, since message-passing systems are typically implemented using system calls and thus require the more time-consuming task of kernel intervention. In shared-memory systems, system calls are required only to establish shared memory regions. Once shared memory is established, all accesses are treated as routine memory accesses, and no assistance from the kernel is required.
41
Message-passing model
communication takes place by means of messages exchanged between the cooperating processes.Message passing is useful for exchanging smaller amounts of data, because no conflicts need be avoided. Message passing is also easier to implement in a distributed system than shared memory.
42
Interprocess communication using shared memory requires communicating processes to establish a region of shared memory
Typically, a shared-memory region resides in the address space of the process creating the shared-memory segment. Other processes that wish to communicate using this shared-memory segment must attach it to their address space. Recall that, normally, the operating system tries to prevent one process from accessing another process’s memory. Shared memory requires that two or more processes agree to remove this restriction. They can then exchange information by reading and writing data in the shared areas. The form of the data and the location are determined by these processes and are not under the operating system’s control. The processes are also responsible for ensuring that they are not writing to the same location simultaneously
43
the producer–consumer problem
- which is a common paradigm for cooperating processes. A producer process produces information that is consumed by a consumer process. For example, a compiler may produce assembly code that is consumed by an assembler. The assembler, in turn, may produce object modules that are consumed by the loader. The producer–consumer problem also provides a useful metaphor for the client–server paradigm. We generally think of a server as a producer and a client as a consumer. For example, a web server produces (that is, provides) web content such as HTML files and images, which are consumed (that is, read) by the client web browser requesting the resource. One solution to the producer–consumer problem uses shared memory. To allow producer and consumer processes to run concurrently, we must have available a buffer of items that can be filled by the producer and emptied by the consumer. This buffer will reside in a region of memory that is shared by the producer and consumer processes. A producer can produce one item while the consumer is consuming another item. The producer and consumer must be synchronized, so that the consumer does not try to consume an item that has not yet been produced
44
2 buffers for the producer-cosumer problem
45
- The unbounded buffer
places no practical limit on the size of the buffer. The consumer may have to wait for new items, but the producer can always produce new items. The bounded buffer assumes a fixed buffer size. In this case, the consumer must wait if the buffer is empty, and the producer must wait if the buffer is full.
46
The shared buffer
is implemented as a circular array with two logical pointers: in and out. The variable in points to the next free position in the buffer; out points to the first full position in the buffer. The buffer is empty when in == out; the buffer is full when ((in + 1) % BUFFER SIZE) == out. The code for the producer process is shown in Figure 3.12, and the code for the consumer process is shown in Figure 3.13. The producer process has a local variable next produced in which the new item to be produced is stored. The consumer process has a local variable next consumed in which the item to be consumed is stored. This scheme allows at most BUFFER SIZE − 1 items in the buffer at the same time. We leave it as an exercise for you to provide a solution in which BUFFER SIZE items can be in the buffer at the same time. In Section 3.7.1, we illustrate the POSIX API for shared memory
47
Message passing provides a mechanism
m to allow processes to communicate and to synchronize their actions without sharing the same address space. It is particularly useful in a distributed environment, where the communicating processes may reside on different computers connected by a network. For example, an Internet chat program could be designed so that chat participants communicate with one another by exchanging messages. A message-passing facility provides at least two operations: send(message) and receive(message) Messages sent by a process can be either fixed or variable in size. If only fixed-sized messages can be sent, the system-level implementation is straightforward. This restriction, however, makes the task of programming more difficult. Conversely, variable-sized messages require a more complex system-level implementation, but the programming task becomes simpler.
48
Direct communication-
- Under direct communication, each process that wants to communicate must explicitly name the recipient or sender of the communication. In this scheme, the send() and receive() primitives are defined as: * send(P, message)—Send a message to process P. * receive(Q, message)—Receive a message from process Q.
49
A communication link in this scheme has the following properties
A link is established automatically between every pair of processes that want to communicate. The processes need to know only each other’s identity to communicate.| * A link is associated with exactly two processes. * Between each pair of processes, there exists exactly one link
50
A process can communicate with another process via a number of different mailboxes, but two processes can communicate only if they have a shared mailbox. The send() and receive() primitives are defined as follows: *
send(A, message)—Send a message to mailbox A. * receive(A, message)—Receive a message from mailbox A. In this scheme, a communication link has the following properties: * A link is established between a pair of processes only if both members of the pair have a shared mailbox. * A link may be associated with more than two processes. * Between each pair of communicating processes, a number of different links may exist, with each link corresponding to one mailbox. Now suppose that processes P1 , P2 , and P3 all share mailbox A. Process P1 sends a message to A, while both P2 and P3 execute a receive() from A. Which process will receive the message sent by P1 ? The answer depends on which of the following methods we choose: * Allow a link to be associated with two processes at most. * Allow at most one process at a time to execute a receive() operation. * Allow the system to select arbitrarily which process will receive the message (that is, either P2 or P3 , but not both, will receive the message). The system may define an algorithm for selecting which process will receive the message (for example, round robin, where processes take turns receiving messages). The system may identify the receiver to the sender.
51
Communication between processes takes place through calls to send() and receive() primitives. There are different design options for implementing each primitive. Message passing may be either blocking or nonblocking— also known as synchronous and asynchronous.
. (Throughout this text, you will encounter the concepts of synchronous and asynchronous behavior in relation to various operating-system algorithms.) * Blocking send. The sending process is blocked until the message is received by the receiving process or by the mailbox. * Nonblocking send. The sending process sends the message and resumes operation. * Blocking receive. The receiver blocks until a message is available. * Nonblocking receive. The receiver retrieves either a valid message or a null.
52
When both send() and receive() are blocking, we have a rendezvous between the sender and the receiver.
The solution to the producer–consumer problem becomes trivial when we use blocking send() and receive() statements. The producer merely invokes the blocking send() call and waits until the message is delivered to either the receiver or the mailbox. Likewise, when the consumer invokes receive(), it blocks until a message is available
53
messages exchanged by communicating processes reside in a temporary queue
Basically, such queues can be implemented in three ways: * Zero capacity. The queue has a maximum length of zero; thus, the link cannot have any messages waiting in it. In this case, the sender must block until the recipient receives the message. * Bounded capacity. The queue has finite length n; thus, at most n messages can reside in it. If the queue is not full when a new message is sent, the message is placed in the queue (either the message is copied or a pointer to the message is kept), and the sender can continue execution without waiting. The link’s capacity is finite, however. If the link is full, the sender must block until space is available in the queue. * Unbounded capacity. The queue’s length is potentially infinite; thus, any number of messages can wait in it. The sender never blocks. The zero-capacity case is sometimes referred to as a message system with no buffering. The other cases are referred to as systems with automatic buffering
54
One of the earliest POSIX shared memory
is organized using - memory-mapped files, which associate the region of shared memory with a file. A process must first create a shared-memory object using the shm open() system call, as follows: fd = shm open(name, O CREAT | O RDWR, 0666); The first parameter specifies the name of the shared-memory object. Processes that wish to access this shared memory must refer to the object by this name. The subsequent parameters specify that the shared-memory object is to be created if it does not yet exist (O CREAT) and that the object is open for reading and writing (O RDWR). The last parameter establishes the file-access permissions of the shared-memory object. A successful call to shm open() returns an integer file descriptor for the shared-memory object. Once the object is established, the ftruncate() function is used to configure the size of the object in bytes. The call ftruncate(fd, 4096); sets the size of the object to 4,096 bytes. Finally, the mmap() function establishes a memory-mapped file containing the shared-memory object. It also returns a pointer to the memory-mapped file that is used for accessing the shared-memory object
55
Mach Message Passing
Mach was especially designed for distributed systems, but was shown to be suitable for desktop and mobile systems as well, as evidenced by its inclusion in the macOS and iOS operating systems, as discussed in Chapter 2. The Mach kernel supports the creation and destruction of multiple tasks, which are similar to processes but have multiple threads of control and fewer associated resources. Most communication in Mach—including all intertask communication—is carried out by messages. Messages are sent to, and received from, mailboxes, which are called ports in Mach. Ports are finite in size and unidirectional; for two-way communication, a message is sent to one port, and a response is sent to a separate reply port. Each port may have multiple senders, but only one receiver. Mach uses ports to represent resources such as tasks, threads, memory, and processors, while message passing provides an object-oriented approach for interacting with these system resources and services. Message passing may occur between any two ports on the same host or on separate hosts on a distributed system. Associated with each port is a collection of port rights that identify the capabilities necessary for a task to interact with the port. For example, for a task to receive a message from a port, it must have the capability MACH PORT RIGHT RECEIVE for that port. The task that creates a port is that port’s owner, and the owner is the only task that is allowed to receive messages from that port. A port’s owner may also manipulate the capabilities for a port.
56
When a task is created, two special ports— the Task Self port and the Notify port—are also created
The kernel has receive rights to the Task Self port, which allows a task to send messages to the kernel. The kernel can send notification of event occurrences to a task’s Notify port (to which, of course, the task has receive rights). The mach port allocate() function call creates a new port and allocates space for its queue of messages. It also identifies the rights for the port. Each port right represents a name for that port, and a port can only be accessed via a right. Port names are simple integer values and behave much like UNIX file descriptors.
57
All messages are delivered reliably and have the same priority. Mach guarantees that multiple messages from the same sender are queued in first-in, firstout (FIFO)
order but does not guarantee an absolute ordering. For instance, messages from two senders may be queued in any order. Mach messages contain the following two fields: * A fixed-size message header containing metadata about the message, including the size of the message as well as source and destination ports. Commonly, the sending thread expects a reply, so the port name of the source is passed on to the receiving task, which can use it as a “return address” in sending a reply. * A variable-sized body containing data.
58
Messages may be either simple or complex. A simple message contains ordinary, unstructured user data that are not interpreted by the kernel.
A complex message may contain pointers to memory locations containing data (known as “out-of-line” data) or may also be used for transferring port rights to another task. Out-of-line data pointers are especially useful when a message must pass large chunks of data. A simple message would require copying and packaging the data in the message; out-of-line data transmission requires only a pointer that refers to the memory location where the data are stored. The function mach msg() is the standard API for both sending and receiving messages. The value of one of the function’s parameters—either MACH SEND MSG or MACH RCV MSG—indicates if it is a send or receive operation. We now illustrate how it is used when a client task sends a simple message to a server task. Assume there are two ports—client and server—associated with the client and server tasks, respectively. The code in Figure 3.18 shows the client task constructing a header and sending a message to the server, as well as the server task receiving the message sent from the client. The mach msg() function call is invoked by user programs for performing message passing. mach msg() then invokes the function mach msg trap(), which is a system call to the Mach kernel. Within the kernel, mach msg trap(next calls the function mach msg overwrite trap(), which then handles the actual passing of the message.
59
If the port’s queue is full, the sender has several options (specified via parameters to mach msg():
1. Wait indefinitely until there is room in the queue. 2. Wait at most n milliseconds. 3. Do not wait at all but rather return immediately. 4. Temporarily cache a message. Here, a message is given to the operating system to keep, even though the queue to which that message is being sent is full. When the message can be put in the queue, a notification message is sent back to the sender. Only one message to a full queue can be pending at any time for a given sending thread.
60
The message-passing facility in Windows is called the advanced local procedure call (ALPC) facility. It is used for communication between two processes on the same machine.
It is similar to the standard remote procedure call (RPC) mechanism that is widely used, but it is optimized for and specific to Windows. (Remote procedure calls are covered in detail in Section 3.8.2.) Like Mach, Windows uses a port object to establish and maintain a connection between two processes. Windows uses two types of ports: connection ports and communication ports. Server processes publish connection-port objects that are visible to all processes. When a client wants services from a subsystem, it opens a handle to the server’s connection-port object and sends a connection request to that port. The server then creates a channel and returns a handle to the client. The channel consists of a pair of private communication ports: one for client–server messages, the other for server–client messages. Additionally, communication channels support a callback mechanism that allows the client and server to accept requests when they would normally be expecting a reply
61
When an ALPC channel is created, one of three message-passing techniques is chosen: 1. For small messages (up to 256 bytes), the port’s message queue is used as intermediate storage, and the messages are copied from one process to the other
r. 2. Larger messages must be passed through a section object, which is a region of shared memory associated with the channel. 3. When the amount of data is too large to fit into a section object, an API is available that allows server processes to read and write directly into the address space of a client.
62
A pipe acts as a conduit allowing two processes to communicate. Pipes were one of the first IPC mechanisms in early UNIX systems. They typically provide one of the simpler ways for processes to communicate with one another, although they also have some limitations. In implementing a pipe, four issues must be considered
1. Does the pipe allow bidirectional communication, or is communication unidirectional? 2. If two-way communication is allowed, is it half duplex (data can travel only one way at a time) or full duplex (data can travel in both directions at the same time)? 3. Must a relationship (such as parent–child) exist between the communicating processes? 4. Can the pipes communicate over a network, or must the communicating processes reside on the same machine?
63
Ordinary pipes allow two processes to communicate
in standard producer– consumer fashion: the producer writes to one end of the pipe (the write end) and the consumer reads from the other end (the read end). As a result, ordinary pipes are unidirectional, allowing only one-way communication. If two-way communication is required, two pipes must be used, with each pipe sending data in a different direction. the parent process creates a pipe and then sends a fork() call creating the child process. What occurs after the fork() call depends on how the data are to flow through the pipe. In this instance, the parent writes to the pipe, and the child reads from it. It is important to notice that both the parent process and the child process initially close their unused ends of the pipe
64
Ordinary pipes on Windows systems are termed anonymous pipes, and they behave similarly to their UNIX counterparts: they are unidirectional and employ parent–child relationships between the communicating processes
. In addition, reading and writing to the pipe can be accomplished with the ordinary ReadFile() and WriteFile() functions. The Windows API for creating pipes is the CreatePipe() function, which is passed four parameters. The parameters provide separate handles for (1) reading and (2) writing to the pipe, as well as (3) an instance of the STARTUPINFO structure, which is used to specify that the child process is to inherit the handles of the pipe.
65
Named Pipes
Named pipes provide a much more powerful communication tool. Communication can be bidirectional, and no parent–child relationship is required. Once a named pipe is established, several processes can use it for communication. In fact, in a typical scenario, a named pipe has several writers. Additionally, named pipes continue to exist after communicating processes have finished. Both UNIX and Windows systems support named pipes, although the details of implementation differ greatly. Next, we explore named pipes in each of these systems. Named pipes are referred to as FIFOs in UNIX systems. Once created, they appear as typical files in the file system. A FIFO is created with the mkfifo() system call and manipulated with the ordinary open(), read(), write(), and close() system calls. It will continue to exist until it is explicitly deleted Although FIFOs allow bidirectional communication, only half-duplex transmission is permitted. If data must travel in both directions, two FIFOs are typically used. Additionally, the communicating processes must reside on the same machine. If intermachine communication is required, sockets (Section 3.8.1) must be used. Named pipes on Windows systems provide a richer communication mechanism than their UNIX counterparts. Full-duplex communication is allowed, and the communicating processes may reside on either the same or different machines. Additionally, only byte-oriented data may be transmitted across a UNIX FIFO, whereas Windows systems allow either byte- or message-oriented data. Named pipes are created with the CreateNamedPipe() function, and a client can connect to a named pipe using ConnectNamedPipe(). Communication over the named pipe can be accomplished using the ReadFile() and WriteFile() functions.
66
A socket
is defined as an endpoint for communication. A pair of processes communicating over a network employs a pair of sockets—one for each process. A socket is identified by an IP address concatenated with a port number. In general, sockets use a client–server architecture. The server waits for incoming client requests by listening to a specified port. Once a request is received, the server accepts a connection from the client socket to complete the connection. Servers implementing specific services (such as SSH, FTP, and HTTP) listen to well-known ports (an SSH server listens to port 22; an FTP server listens to port 21; and a web, or HTTP, server listens to port 80). All ports below 1024 are considered well known and are used to implement standard services.
67
Java provides three different types of sockets.
Connection-oriented (TCP) sockets are implemented with the Socket class. Connectionless (UDP) sockets use the DatagramSocket class. Finally, the MulticastSocket class is a subclass of the DatagramSocket class. A multicast socket allows data to be sent to multiple recipients. Our example describes a date server that uses connection-oriented TCP sockets. The operation allows clients to request the current date and time from the server. The server listens to port 6013, although the port could have any arbitrary, unused number greater than 1024. When a connection is received, the server returns the date and time to the client. The IP address 127.0.0.1 is a special IP address known as the loopback. When a computer refers to IP address 127.0.0.1, it is referring to itself. This mechanism allows a client and server on the same host to communicate using the TCP/IP protocol. A port in this context is simply a number included at the start of a message packet. Whereas a system normally has one network address, it can have many ports within that address to differentiate the many network services it supports. If a remote process needs a service, it addresses a message to the proper port. For instance, if a system wished to allow other systems to be able to list its current users, it would have a daemon supporting such an RPC attached to a port—say, port 3027. Any remote system could obtain the needed information (that is, the list of current users) by sending an RPC message to port 3027 on the server. The data would be received in a reply message. The semantics of RPCs allows a client to invoke a procedure on a remote host as it would invoke a procedure locally. The RPC system hides the details that allow communication to take place by providing a stub on the client side. Typically, a separate stub exists for each separate remote procedure. When the client invokes a remote procedure, the RPC system calls the appropriate stub, passing it the parameters provided to the remote procedure. This stub locates the port on the server and marshals the parameters. The stub then transmits a message to the server using message passing. A similar stub on the server side receives this message and invokes the procedure on the server. If necessary, return values are passed back to the client using the same technique. On Windows systems, stub code is compiled from a specification written in the Microsoft Interface Definitio Language (MIDL), which is used for defining the interfaces between client and server programs. Parameter marshaling addresses the issue concerning differences in data representation on the client and server machines. Consider the representation of 32-bit integers. Some systems (known as big-endian) store the most significant byte first, while other systems (known as little-endian) store the least significant byte first. Neither order is “better” per se; rather, the choice is arbitrary within a computer architecture. To resolve differences like this, many RPC systems define a machine-independent representation of data. One such representation is known as external data representation (XDR). On the client side, parameter marshaling involves converting the machine-dependent data into XDR before they are sent to the server. On the server side, the XDR data are unmarshaled and converted to the machine-dependent representation for the server
68
The Android operating system has a rich set of IPC mechanisms contained in its binder framework, including RPCs that allow one process to request services from another process.
Android defines an application component as a basic building block that provides utility to an Android application, and an app may combine multiple application components to provide functionality to an apption component is a service, which has no user interface but instead runs in the background while executing long-running operations or performing work for remote processes. Examples of services include playing music in the background and retrieving data over a network connection on behalf of another process, thereby preventing the other process from blocking as the data are being downloaded. When a client app invokes the bindService() method of a service, that service is “bound” and available to provide client-server communication using either message passing or RPCs. A bound service must extend the Android class Service and must implement the method onBind(), which is invoked when a client calls bindService(). In the case of message passing, the onBind() method returns a Messenger service, which is used for sending messages from the client to the service. The Messenger service is only one-way; if the service must send a reply back to the client, the client must also provide a Messenger service, which is contained in the replyTo field of the Message object sent to the service. The service can then send messages back to the client
69
A thread
is a basic unit of CPU utilization; it comprises a thread ID, a program counter (PC), a register set, and a stack. It shares with other threads belonging to the same process its code section, data section, and other operating-system resources, such as open files and signals. A traditional process has a single thread of control. If a process has multiple threads of control, it can perform more than one task at a time. Figure 4.1 illustrates the difference between a traditional single-threaded process and a multithreaded process.
70
The benefits of multithreaded programming can be broken down into four major categories:
1. Responsiveness. Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user. This quality is especially useful in designing user interfaces. For instance, consider what happens when a user clicks a button that results in the performance of a time-consuming operation. A single-threaded application would be unresponsive to the user until the operation had been completed. In contrast, if the time-consuming operation is performed in a separate, asynchronous thread, the application remains responsive to the user. 2. Resource sharing. Processes can share resources only through techniques such as shared memory and message passing. Such techniques must be explicitly arranged by the programmer. However, threads share the memory and the resources of the process to which they belong by default. The benefit of sharing code and data is that it allows an application to have several different threads of activity within the same address space. 3. Economy. Allocating memory and resources for process creation is costly. Because threads share the resources of the process to which they belong, it is more economical to create and context-switch threads. Empirically gauging the difference in overhead can be difficult, but in general thread creation consumes less time and memory than process creation. Additionally, context switching is typically faster between threads than between processes. 4. Scalability. The benefits of multithreading can be even greater in a multiprocessor architecture, where threads may be running in parallel on different processing cores. A single-threaded process can run on only one processor, regardless how many are available.
71
concurrency vs parallelism:
A concurrent system supports more than one task by allowing all the tasks to make progress. In contrast, a parallel system can perform more than one task simultaneously. Thus, it is possible to have concurrency without parallelism. Before the advent of multiprocessor and multicore architectures, most computer systems had only a single processor, and CPU schedulers were designed to provide the illusion of parallelism by rapidly switching between processes, thereby allowing each process to make progress. Such processes were running concurrently, but not in parallel. 5 challenges in multicore programming 1. Identifying tasks. This involves examining applications to find areas that can be divided into separate, concurrent tasks. Ideally, tasks are independent of one another and thus can run in parallel on individual cores. 2. Balance. While identifying tasks that can run in parallel, programmers must also ensure that the tasks perform equal work of equal value. In some instances, a certain task may not contribute as much value to the overall process as other tasks. Using a separate execution core to run that task may not be worth the cost. 3. Data splitting. Just as applications are divided into separate tasks, the data accessed and manipulated by the tasks must be divided to run on separate cores. 4. Data dependency. The data accessed by the tasks must be examined for dependencies between two or more tasks. When one task depends on data from another, programmers must ensure that the execution of the tasks is synchronized to accommodate the data dependency. We examine such strategies in Chapter 6. 5. Testing and debugging. When a program is running in parallel on multiple cores, many different execution paths are possible. Testing and debugging such concurrent programs is inherently more difficult than testing and debugging single-threaded applications.
72
Data parallelism focuses
on distributing subsets of the same data across multiple computing cores and performing the same operation on each core. Consider, for example, summing the contents of an array of size N. On a single-core system, one thread would simply sum the elements [0] ...[N − 1]. On a dual-core system, however, thread A, running on core 0, could sum the elements [0] . . . [N∕2 − 1] while thread B, running on core 1, could sum the elements [N∕2] . . . [N − 1]. The two threads would be running in parallel on separate computing cores
73
Task parallelism involves
distributing not data but tasks (threads) across multiple computing cores. Each thread is performing a unique operation. Different threads may be operating on the same data, or they may be operating on different data. Consider again our example above. In contrast to that situation, an example of task parallelism might involve two threads, each performing a unique statistical operation on the array of elements. The threads again are operating in parallel on separate computing cores, but each is performing a unique operation.
74
Kernel vs User
threads user threads, or by the kernel, for kernel threads. User threads are supported above the kernel and are managed without kernel support, whereas kernel threads are supported and managed directly by the operating system. Virtually all contemporary operating systems—including Windows, Linux, and macOS— support kernel threads.
75
Many-to-One Model
The many-to-one model (Figure 4.7) maps many user-level threads to one kernel thread. Thread management is done by the thread library in user space, so it is efficient (we discuss thread libraries in Section 4.4). However, the entire process will block if a thread makes a blocking system call. Also, because only one thread can access the kernel at a time, multiple threads are unable to run in parallel on multicore systems. Green threads—a thread library available for Solaris systems and adopted in early versions of Java—used the many-toone model. However, very few systems continue to use the model because of its inability to take advantage of multiple processing cores, which have now become standard on most computer systems
76
One-to-One Model
The one-to-one model (Figure 4.8) maps each user thread to a kernel thread. It provides more concurrency than the many-to-one model by allowing another thread to run when a thread makes a blocking system call. It also allows multiple threads to run in parallel on multiprocessors. The only drawback to this model is that creating a user thread requires creating the corresponding kernel thread, and a large number of kernel threads may burden the performance of a system. Linux, along with the family of Windows operating systems, implement the one-to-one model.
77
Many-to-Many Model
The many-to-many model (Figure 4.9) multiplexes many user-level threads to a smaller or equal number of kernel threads. The number of kernel threads may be specific to either a particular application or a particular machine (an application may be allocated more kernel threads on a system with eight processing cores than a system with four cores). Although the many-to-many model appears to be the most flexible of the models discussed, in practice it is difficult to implement. In addition, with an increasing number of processing cores appearing on most systems, limiting the number of kernel threads has become less important. As a result, most operating systems now use the one-to-one model.
78
Two lvl model
Let’s consider the effect of this design on concurrency. Whereas the manyto-one model allows the developer to create as many user threads as she wishes, it does not result in parallelism, because the kernel can schedule only one kernel thread at a time. The one-to-one model allows greater concurrency, but the developer has to be careful not to create too many threads within an application. (In fact, on some systems, she may be limited in the number of threads she can create.) The many-to-many model suffers from neither of these shortcomings: developers can create as many user threads as necessary, and the corresponding kernel threads can run in parallel on a multiprocessor. Also, when a thread performs a blocking system call, the kernel can schedule another thread for execution. One variation on the many-to-many model still multiplexes many userlevel threads to a smaller or equal number of kernel threads but also allows a user-level thread to be bound to a kernel thread.
79
A thread library provides
the programmer with an API for creating and managing threads. There are two primary ways of implementing a thread library. The first approach is to provide a library entirely in user space with no kernel support. All code and data structures for the library exist in user space. This means that invoking a function in the library results in a local function call in user space and not a system call. The second approach is to implement a kernel-level library supported directly by the operating system. In this case, code and data structures for the library exist in kernel space. Invoking a function in the API for the library typically results in a system call to the kernel.
80
For POSIX and Windows threading,
any data declared globally— that is, declared outside of any function—are shared among all threads belonging to the same process. Because Java has no equivalent notion of global data, access to shared data must be explicitly arranged between threads.
81
With asynchronous threading
g, once the parent creates a child thread, the parent resumes its execution, so that the parent and child execute concurrently and independently of one another. Because the threads are independent, there is typically little data sharing between them. Asynchronous threading is the strategy used in the multithreaded server illustrated in Figure 4.2 and is also commonly used for designing responsive user interfaces.
82
Synchronous threading
occurs when the parent thread creates one or more children and then must wait for all of its children to terminate before it resumes. Here, the threads created by the parent perform work concurrently, but the parent cannot continue until this work has been completed. Once each thread has finished its work, it terminates and joins with its parent. Only after all of the children have joined can the parent resume execution. Typically, synchronous threading involves significant data sharing among threads. For example, the parent thread may combine the results calculated by its various children. All of the following examples use synchronous threading
83
Pthreads refers to the POSIX standard (IEEE 1003.1c)
defining an API for thread creation and synchronization. This is a specification for thread behavior, not an implementation. Operating-system designers may implement the specification
84
Windows Threads
Threads are created in the Windows API using the CreateThread() function, and—just as in Pthreads—a set of attributes for the thread is passed to this function. These attributes include security information, the size of the stack, and a flag that can be set to indicate if the thread is to start in a suspended state. In this program, we use the default values for these attributes. (The default values do not initially set the thread to a suspended state and instead make it eligible to be run by the CPU scheduler.) Once the summation thread is created, the parent must wait for it to complete before outputting the value of Sum, as the value is set by the summation thread. Recall that the Pthread program (Figure 4.11) had the parent thread wait for the summation thread using the pthread join() statement. We perform the equivalent of this in the Windows API using the WaitForSingleObject() function, which causes the creating thread to block until the summation thread has exited. In situations that require waiting for multiple threads to complete, the WaitForMultipleObjects() function is used. This function is passed four parameters: 1. The number of objects to wait for 2. A pointer to the array of objects 3. A flag indicating whether all objects have been signaled 4. A timeout duration (or INFINITE) For example, if THandles is an array of thread HANDLE objects of size N, the parent thread can wait for all its child threads to complete with this statement: WaitForMultipleObjects(N, THandles, TRUE, INFINITE);
85
Java Threads
s Threads are the fundamental model of program execution in a Java program, and the Java language and its API provide a rich set of features for the creation and management of threads. All Java programs comprise at least a single thread of control—even a simple Java program consisting of only a main() method runs as a single thread in the JVM. Java threads are available on any system that provides a JVM including Windows, Linux, and macOS. The Java thread API is available for Android applications as well. There are two techniques for explicitly creating threads in a Java program. One approach is to create a new class that is derived from the Thread class and to override its run() method. An alternative—and more commonly used — technique is to define a class that implements the Runnable interface. This interface defines a single abstract method with the signature public void run(). The code in the run() method of a class that implements Runnable is what executes in a separate thread. An example is shown below: class Task implements Runnable { public void run() { System.out.println("I am a thread."); } } Threads in Java is lambda
86
Thread creation in Java involves
creating a Thread object and passing it an instance of a class that implements Runnable, followed by invoking the start() method on the Thread object. This appears in the following example: Thread worker = new Thread(new Task()); worker.start(); Invoking the start() method for the new Thread object does two things: 1. It allocates memory and initializes a new thread in the JVM. 2. It calls the run() method, making the thread eligible to be run by the JVM. (Note again that we never call the run() method directly. Rather, we call the start() method, and it calls the run() method on our behalf.)
87
What problems does thread pool solve?
In this situation, whenever the server receives a request, it creates a separate thread to service the request. Whereas creating a separate thread is certainly superior to creating a separate process, a multithreaded server nonetheless has potential problems. The first issue concerns the amount of time required to create the thread, together with the fact that the thread will be discarded once it has completed its work. The second issue is more troublesome. If we allow each concurrent request to be serviced in a new thread, we have not placed a bound on the number of threads concurrently active in the system. Unlimited threads could exhaust system resources, such as CPU time or memory. One solution to this problem is to use a thread pool.
88
Thread pools
work well when the tasks submitted to the pool can be executed asynchronously. Thread pools offer these benefits: 1. Servicing a request with an existing thread is often faster than waiting to create a thread. 2. A thread pool limits the number of threads that exist at any one point. This is particularly important on systems that cannot support a large number of concurrent threads. 3. Separating the task to be performed from the mechanics of creating the task allows us to use different strategies for running the task. For example, the task could be scheduled to execute after a time delay or to execute periodically. The number of threads in the pool can be set heuristically based on factors such as the number of CPUs in the system, the amount of physical memory, and the expected number of concurrent client requests. More sophisticated threadpool architectures can dynamically adjust the number of threads in the pool according to usage patterns. Such architectures provide the further benefit of having a smaller pool— thereby consuming less memory—when the load on the system is low.
89
Java Thread Pools
The java.util.concurrent package includes an API for several varieties of thread-pool architectures. Here, we focus on the following three models: 1. Single thread executor—newSingleThreadExecutor()—creates a pool of size 1. 2. Fixed thread executor—newFixedThreadPool(int size)—creates a thread pool with a specified number of threads. 3. Cached thread executor—newCachedThreadPool()—creates an unbounded thread pool, reusing threads in many instances. We have, in fact, already seen the use of a Java thread pool in Section 4.4.3, where we created a newSingleThreadExecutor in the program example shown in Figure 4.14. In that section, we noted that the Java executor framework can be used to construct more robust threading tools. We now describe how it can be used to create thread pools. A thread pool is created using one of the factory methods in the Executors class: * static ExecutorService newSingleThreadExecutor() * static ExecutorService newFixedThreadPool(int size) * static ExecutorService newCachedThreadPool() Each of these factory methods creates and returns an object instance that implements the ExecutorService interface.
90
THE JVM AND THE HOST OPERATING SYSTEM
The JVM is typically implemented on top of a host operating system (see Figure 18.10). This setup allows the JVM to hide the implementation details of the underlying operating system and to provide a consistent, abstract environment that allows Java programs to operate on any platform that supports a JVM. The specification for the JVM does not indicate how Java threads are to be mapped to the underlying operating system, instead leaving that decision to the particular implementation of the JVM. For example, the Windows operating system uses the one-to-one model; therefore, each Java thread for a JVM running on Windows maps to a kernel thread. In addition, there may be a relationship between the Java thread library and the thread library on the host operating system. For example, implementations of a JVM for the Windows family of operating systems might use the Windows API when creating Java threads; Linux and macOS systems might use the Pthreads API. Thread pool -> The general idea behind a thread pool is to create a number of threads at start-up and place them into a pool, where they sit and wait for work. When a server receives a request, rather than creating a thread, it instead submits the request to the thread pool and resumes waiting for additional requests. If there is an available thread in the pool, it is awakened, and the request is serviced immediately. If the pool contains no available thread, the task is queued until one becomes free. Once a thread completes its service, it returns to the pool and awaits more work. Thread pools work well when the tasks submitted to the pool can be executed asynchronously. Thread pools offer these benefits: 1. Servicing a request with an existing thread is often faster than waiting to create a thread. 2. A thread pool limits the number of threads that exist at any one point. This is particularly important on systems that cannot support a large number of concurrent threads. 3. Separating the task to be performed from the mechanics of creating the task allows us to use different strategies for running the task. For example, the task could be scheduled to execute after a time delay or to execute periodically
91
Fork Join
The strategy for thread creation covered in Section 4.4 is often known as the fork-join model. Recall that with this method, the main parent thread creates (forks) one or more child threads and then waits for the children to terminate and join with it, at which point it can retrieve and combine their results. This synchronous model is often characterized as explicit thread creation, but it is also an excellent candidate for implicit threading. In the latter situation, threads are not constructed directly during the fork stage; rather, parallel tasks are designated. This model is illustrated in Figure 4.16. A library manages the number of threads that are created and is also responsible for assigning tasks to threads. In some ways, this fork-join model is a synchronous version of thread pools in which a library determines the actual number of threads to create— for example, by using the heuristics described in Section 4.5.1.
92
OpenMP
OpenMP is a set of compiler directives as well as an API for programs written in C, C++, or FORTRAN that provides support for parallel programming in sharedmemory environments. OpenMP identifies parallel regions as blocks of code that may run in parallel. Application developers insert compiler directives into their code at parallel regions, and these directives instruct the OpenMP run time library to execute the region in parallel
93
Grand Central Dispatch
Grand Central Dispatch (GCD) is a technology developed by Apple for its macOS and iOS operating systems. It is a combination of a run-time library, an API, and language extensions that allow developers to identify sections of code (tasks) to run in parallel. Like OpenMP, GCD manages most of the details of threading. GCD schedules tasks for run-time execution by placing them on a dispatch queue. When it removes a task from a queue, it assigns the task to an available thread from a pool of threads that it manages. GCD identifies two types of dispatch queues: serial and concurrent. Tasks placed on a serial queue are removed in FIFO order. Once a task has been removed from the queue, it must complete execution before another task is removed. Each process has its own serial queue (known as its main queue), and developers can create additional serial queues that are local to a particular process. (This is why serial queues are also known as private dispatch queues.) Serial queues are useful for ensuring the sequential execution of several tasks. Tasks placed on a concurrent queue are also removed in FIFO order, but several tasks may be removed at a time, thus allowing multiple tasks to execute in parallel. There are several system-wide concurrent queues (also known as global dispatch queues), which are divided into four primary quality-of-service classes: * QOS CLASS USER INTERACTIVE—The user-interactive class represents tasks that interact with the user, such as the user interface and event handling, to ensure a responsive user interface. Completing a task belonging to this class should require only a small amount of work. * QOS CLASS USER INITIATED—The user-initiated class is similar to the user-interactive class in that tasks are associated with a responsive user interface; however, user-initiated tasks may require longer processing times. Opening a file or a URL is a user-initiated task, for example. Tasks belonging to this class must be completed for the user to continue interacting with the system, but they do not need to be serviced as quickly as tasks in the user-interactive queue. * QOS CLASS UTILITY —The utility class represents tasks that require a longer time to complete but do not demand immediate results. This class includes work such as importing data. * QOS CLASS BACKGROUND —Tasks belonging to the background class are not visible to the user and are not time sensitive. Examples include indexing a mailbox system and performing backups.
94
The fork() and exec() System Calls
In Chapter 3, we described how the fork() system call is used to create a separate, duplicate process. The semantics of the fork() and exec() system calls change in a multithreaded program. If one thread in a program calls fork(), does the new process duplicate all threads, or is the new process single-threaded? Some UNIX systems have chosen to have two versions of fork(), one that duplicates all threads and another that duplicates only the thread that invoked the fork() system call. The exec() system call typically works in the same way as described in Chapter 3. That is, if a thread invokes the exec() system call, the program specified in the parameter to exec() will replace the entire process—including all threads. Which of the two versions of fork() to use depends on the application. If exec() is called immediately after forking, then duplicating all threads is unnecessary, as the program specified in the parameters to exec() will replace the process. In this instance, duplicating only the calling thread is appropriate. If, however, the separate process does not call exec() after forking, the separate process should duplicate all threads.
95
A signal
is used in UNIX systems to notify a process that a particular event has occurred. A signal may be received either synchronously or asynchronously, depending on the source of and the reason for the event being signaled. All signals, whether synchronous or asynchronous, follow the same pattern: 1. A signal is generated by the occurrence of a particular event. 2. The signal is delivered to a process. 3. Once delivered, the signal must be handled. Examples of synchronous signals include illegal memory access and division by 0. If a running program performs either of these actions, a signal is generated. Synchronous signals are delivered to the same process that performed the operation that caused the signal (that is the reason they are considered synchronous). When a signal is generated by an event external to a running process, that process receives the signal asynchronously. Examples of such signals include terminating a process with specific keystrokes (such as ) and having a timer expire. Typically, an asynchronous signal is sent to another process. A signal may be handled by one of two possible handlers: 1. A default signal handler 2. A user-defined signal handler
96
Every signal has a default signal handler that the kernel runs when handling that signal. This default action can be overridden by a user-define signal handler that is called to handle the signal.
Signals are handled in different ways. Some signals may be ignored, while others (for example, an illegal memory access) are handled by terminating the program. Handling signals in single-threaded programs is straightforward: signals are always delivered to a process. However, delivering signals is more complicated in multithreaded programs, where a process may have several threads. Where, then, should a signal be delivered? In general, the following options exist: 1. Deliver the signal to the thread to which the signal applies. 2. Deliver the signal to every thread in the process. 3. Deliver the signal to certain threads in the process. 4. Assign a specific thread to receive all signals for the process. The method for delivering a signal depends on the type of signal generated. For example, synchronous signals need to be delivered to the thread causing the signal and not to other threads in the process. However, the situation with asynchronous signals is not as clear. Some asynchronous signals—such as a signal that terminates a process (, for example)—should be sent to all threads.
97
The standard UNIX function for delivering a signal is kill(pid t pid, int signal)
This function specifies the process (pid) to which a particular signal (signal) is to be delivered. Most multithreaded versions of UNIX allow a thread to specify which signals it will accept and which it will block. Therefore, in some cases, an asynchronous signal may be delivered only to those threads that are not blocking it. However, because signals need to be handled only once, a signal is typically delivered only to the first thread found that is not blocking it. POSIX Pthreads provides the following function, which allows a signal to be delivered to a specified thread (tid): pthread kill(pthread t tid, int signal) Although Windows does not explicitly provide support for signals, it allows us to emulate them using asynchronous procedure calls (APCs). The APC facility enables a user thread to specify a function that is to be called when the user thread receives notification of a particular event. As indicated by its name, an APC is roughly equivalent to an asynchronous signal in UNIX. However, whereas UNIX must contend with how to deal with signals in a multithreaded environment, the APC facility is more straightforward, since an APC is delivered to a particular thread rather than a process
98
Thread cancellation
involves terminating a thread before it has completed. For example, if multiple threads are concurrently searching through a database and one thread returns the result, the remaining threads might be canceled. Another situation might occur when a user presses a button on a web browser that stops a web page from loading any further. Often, a web page loads using several threads—each image is loaded in a separate thread. When a user presses the stop button on the browser, all threads loading the page are canceled.
99
A thread that is to be canceled is often referred to as the target thread. Cancellation of a target thread may occur in two different scenarios
1. Asynchronous cancellation. One thread immediately terminates the target thread. 2. Deferred cancellation. The target thread periodically checks whether it should terminate, allowing it an opportunity to terminate itself in an orderly fashion.
100
The difficulty with cancellation occurs in situations where resources have been allocated to a canceled thread or where a thread is canceled while in the midst of updating data it is sharing with other threads.
This becomes especially troublesome with asynchronous cancellation. Often, the operating system will reclaim system resources from a canceled thread but will not reclaim all resources. Therefore, canceling a thread asynchronously may not free a necessary system-wide resource. With deferred cancellation, in contrast, one thread indicates that a target thread is to be canceled, but cancellation occurs only after the target thread has checked a flag to determine whether or not it should be canceled. The thread can perform this check at a point at which it can be canceled safely. In Pthreads, thread cancellation is initiated using the pthread cancel() function. The identifier of the target thread is passed as a parameter to the function.
101
Pthreads supports three cancellation modes. Each mode is defined as a state and a type, as illustrated in the table below. A thread may set its cancellation state and type using an API
off, deferred, asynchronous
102
The default cancellation type is deferred cancellation. However, cancellation occurs only when a thread reaches a cancellation point.
Most of the blocking system calls in the POSIX and standard C library are defined as cancellation points, and these are listed when invoking the command man pthreads on a Linux system. For example, the read() system call is a cancellation point that allows cancelling a thread that is blocked while awaiting input from read(). One technique for establishing a cancellation point is to invoke the pthread testcancel() function. If a cancellation request is found to be pending, the call to pthread testcancel() will not return, and the thread will terminate; otherwise, the call to the function will return, and the thread will continue to run. Additionally, Pthreads allows a function known as a cleanup handler to be invoked if a thread is canceled. This function allows any resources a thread may have acquired to be released before the thread is terminated.
103
Thread-Local Storage
Threads belonging to a process share the data of the process. Indeed, this data sharing provides one of the benefits of multithreaded programming. However, in some circumstances, each thread might need its own copy of certain data. We will call such data thread-local storage (or TLS). For example, in a transaction-processing system, we might service each transaction in a separate thread. Furthermore, each transaction might be assigned a unique identifier. To associate each thread with its unique transaction identifier, we could use thread-local storage. It is easy to confuse TLS with local variables. However, local variables are visible only during a single function invocation, whereas TLS data are visible across function invocations. Additionally, when the developer has no control over the thread creation process— for example, when using an implicit technique such as a thread pool— then an alternative approach is necessary. In some ways, TLS is similar to static data; the difference is that TLS data are unique to each thread. (In fact, TLS is usually declared as static.) Most thread libraries and compilers provide support for TLS. For example, Java provides a ThreadLocal class with set() and get() methods for ThreadLocal objects. Pthreads includes the type pthread key t, which provides a key that is specific to each thread. This key can then be used to access TLS data. Microsoft’s C# language simply requires adding the storage attribute [ThreadStatic] to declare thread-local data. The gcc compiler provides the storage class keyword thread for declaring TLS data. For example, if we wished to assign a unique identifier for each thread, we would declare it as follows: static thread int threadID;
104
Scheduler Activations
A final issue to be considered with multithreaded programs concerns communication between the kernel and the thread library, which may be required by the many-to-many and two-level models discussed in Section 4.3.3. Such coordination allows the number of kernel threads to be dynamically adjusted to help ensure the best performance. Many systems implementing either the many-to-many or the two-level model place an intermediate data structure between the user and kernel threads. This data structure— typically known as a lightweight process, or LWP—is shown in Figure 4.20. To the user-thread library, the LWP appears to be a virtual processor on which the application can schedule a user thread to run. Each LWP is attached to a kernel thread, and it is kernel threads that the operating system schedules to run on physical processors. If a kernel thread blocks (such as while waiting for an I/O operation to complete), the LWP blocks as well. Up the chain, the user-level thread attached to the LWP also blocks. An application may require any number of LWPs to run efficiently. Consider a CPU-bound application running on a single processor. In this scenario, only one thread can run at a time, so one LWP is sufficient. An application that is I/O intensive may require multiple LWPs to execute, however. Typically, an LWP is required for each concurrent blocking system call. Suppose, for example, that five different file-read requests occur simultaneously. Five LWPs are needed, because all could be waiting for I/O completion in the kernel. If a process has only four LWPs, then the fifth request must wait for one of the LWPs to return from the kernel. One scheme for communication between the user-thread library and the kernel is known as scheduler activation. It works as follows: The kernel provides an application with a set of virtual processors (LWPs), and the application can schedule user threads onto an available virtual processor. Furthermore, the kernel must inform an application about certain events. This procedure is known as an upcall. Upcalls are handled by the thread library with an upcall handler, and upcall handlers must run on a virtual processor. One event that triggers an upcall occurs when an application thread is about to block. In this scenario, the kernel makes an upcall to the application informing it that a thread is about to block and identifying the specific thread. The kernel then allocates a new virtual processor to the application. The application runs an upcall handler on this new virtual processor, which saves the state of the blocking thread and relinquishes the virtual processor on which the blocking thread is running. The upcall handler then schedules another thread that is eligible to run on the new virtual processor. When the event that the blocking thread was waiting for occurs, the kernel makes another upcall to the thread library informing it that the previously blocked thread is now eligible to run. The upcall handler for this event also requires a virtual processor, and the kernel may allocate a new virtual processor or preempt one of the user threads and run the upcall handler on its virtual processor. After marking the unblocked thread as eligible to run, the application schedules an eligible thread to run on an available virtual processor.
105
Windows Threads
A Windows application runs as a separate process, and each process may contain one or more threads. The Windows API for creating threads is covered in Section 4.4.2. Additionally,Windows uses the one-to-one mapping described in Section 4.3.2, where each user-level thread maps to an associated kernel thread. The general components of a thread include: * A thread ID uniquely identifying the thread * A register set representing the status of the processor * A program counter * A user stack, employed when the thread is running in user mode, and a kernel stack, employed when the thread is running in kernel mode * A private storage area used by various run-time libraries and dynamic link libraries (DLLs) The register set, stacks, and private storage area are known as the context of the thread. The primary data structures of a thread include: * ETHREAD—executive thread block * KTHREAD—kernel thread block * TEB— thread environment block The key components of the ETHREAD include a pointer to the process to which the thread belongs and the address of the routine in which the thread starts control. The ETHREAD also contains a pointer to the corresponding KTHREAD. The KTHREAD includes scheduling and synchronization information for the thread. In addition, the KTHREAD includes the kernel stack (used when the thread is running in kernel mode) and a pointer to the TEB. The ETHREAD and the KTHREAD exist entirely in kernel space; this means that only the kernel can access them. The TEB is a user-space data structure that is accessed when the thread is running in user mode. Among other fields, the TEB contains the thread identifier, a user-mode stack, and an array for thread-local storage. The structure of a Windows thread is illustrated in Figure 4.21
106
Linux Threads
Linux provides the fork() system call with the traditional functionality of duplicating a process, as described in Chapter 3. Linux also provides the ability to create threads using the clone() system call. However, Linux does not distinguish between processes and threads. In fact, Linux uses the term task —rather than process or thread— when referring to a flow of control within a program. When clone() is invoked, it is passed a set of flags that determine how much sharing is to take place between the parent and child tasks. Some of these flags are listed in Figure 4.22. For example, suppose that clone() is passed the flags CLONE FS, CLONE VM, CLONE SIGHAND, and CLONE FILES. The parent and child tasks will then share the same file-system information (such as the current working directory), the same memory space, the same signal handlers, and the same set of open files. Using clone() in this fashion is equivalent to creating a thread as described in this chapter, since the parent task shares most of its resources with its child task. However, if none of these flags is set when clone() is invoked, no sharing takes place, resulting in functionality similar to that provided by the fork() system call. The varying level of sharing is possible because of the way a task is represented in the Linux kernel. A unique kernel data structure (specifically, struct task struct) exists for each task in the system. This data structure, instead of storing data for the task, contains pointers to other data structures where these data are stored— for example, data structures that represent the list of open files, signal-handling information, and virtual memory. When fork() is invoked, a new task is created, along with a copy of all the associated data structures of the parent process. A new task is also created when the clone() system call is made. However, rather than copying all data structures, the new task points to the data structures of the parent task, depending on the set of flags passed to clone(). Finally, the flexibility of the clone() system call can be extended to the concept of containers, a virtualization topic which was introduced in Chapter 1. Recall from that chapter that a container is a virtualization technique provided by the operating system that allows creating multiple Linux systems (containers) under a single Linux kernel that run in isolation to one another. Just as certain flags passed to clone() can distinguish between creating a task that behaves more like a process or a thread based upon the amount of sharing between the parent and child tasks, there are other flags that can be passed to clone() that allow a Linux container to be created. Containers will be covered more fully in Chapter 18