Final Flashcards

1
Q

How does scheduling work? What are the basic steps and data structures involved in scheduling a thread on the CPU?

A

An OS scheduler is responsible for selecting a task (process or thread) and having it run on a CPU. Scheduling occurs when the CPU becomes idle and a task is chosen from the ready queue by the scheduler. How a task is selected depends on the scheduling policy/algorithm (eg FIFO, priority, time slicing, etc). The run queue data structure is the scheduling mechanism and tightly coupled with the scheduling algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the overheads associated with scheduling?

A

The time associated with context switching a task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Do you understand the tradeoffs associated with the frequency of preemption and scheduling/what types of workloads benefit from frequent vs. infrequent intervention of the scheduler (short vs. long timeslices)?

A

CPU-bound tasks benefit the most from large time slices. Limits the context switching overhead and keeps CPU utilization and throughput high. I/O bound tasks prefer shorter timeslices . Keeps CPU and device utilization high. In most cases, the I/O bound tasks will yield due to an I/O operation before the timeslice expires. Delivers the perception to the user that the system is more responsive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can you work through a scenario describing some workload mix (few threads, their compute and I/O phases) and for a given scheduling discipline compute various metrics like average time to completion, system throughput, wait time of the tasks…

A

Compute: Pay attention to time slice I/O: Pay attention to I/O ops frequency Throughput: # tasks/ total time Avg. Completion Time: Total Time from Start per Task / # tasks Avg. Wait Time: Total Wait Time from Start per Task / # tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Do you understand the motivation behind the multi-level feedback queue, why different queues have different timeslices, how do threads move between these queues… Can you contrast this with the O(1) scheduler?

A

A multi-level feedback queue allows us to cover the needs of both CPU & I/O bound tasks in a single structure without having to know the task type in advance. Different timeslices provide benefits to both CPU & I/O bound tasks of varying degrees (individually and mixed). This explains the motivation behind having different queues of different timeslices. Tasks enter at the top most queue. If it yields before timeslice, it stays. If not, it gets pushed to a lower level. Tasks get a priority boost when releasing the CPU for I/O. O(1) is similar in that it applies timeslice values to priority levels. Biggest difference is that it only relies on two queues (active and expired) and it’s feedback is based on sleep time (waiting/idling). The longer sleep implies I/o intensive and gets a priority boost (-5). Smaller sleep time implies CPU intensive and gets lowered (+5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Do you understand what were the problems with the O(1) scheduler which led to the CFS?

A

The problem was that the tasks in the active queue had to wait for the timeslice for all the tasks to expire before a swap occurred with the expired queue. This created too much litter in applications that needed more realtime performance such as Skype. CFS relied on a balance tree for it’s runqueue that allowed for more frequent updates of task priority.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Thinking about Fedorova’s paper on scheduling for chip multi processors, what’s the goal of the scheduler she’s arguing for?

A

To better handle resource contention in order to deliver better application performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some performance counters that can be useful in identifying the workload properties (compute vs. memory bound) and the ability of the scheduler to maximize the system throughput.

A

Traditionally, we look at instructions per cycle (IPCs). Compute bound tasks are close to 1. Memory-bound tasks are closer to 0. Fedora proposed the use of cycles per instructions (CPIs). Compute bound tasks would have a CPI of close to 1 while memory-bound tasks would be much higher than 1. To maximize system throughput, it was proposed that a core have a mixed CPI workload which leads to a well utilized processor pipeline and high CPI. A similar CPI workload leads to resource contention and wastes cycles on other cores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Linux Scheduler Quiz: What was the main reason the Linux O(1) scheduler was replace by the CFS scheduler?

A) Scheduling a task under high loads took unpredictable amount of time.

B) Low priority task could wait indefinitely and starve

C) Interactive tasks could wait unpredictable amounts of time to be scheduled

A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Shortest Job Performance Quiz:

Assume SJF is used to schedule tasks T1, T2, T3. Also, make the following assumptions:

  • scheduler does not preempt tasks
  • known execution times: T1=1s, T2=10s, T3=1s
  • All arrive at same time t=0 Calculate the throughput, avg. completion time and avg wait time.
A

Throughput: 3/12 = .25 Avg. Completion Time: 15/3 = 5 Avg. Wait Time: 3/3 = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we deal with the fact that processes address more memory than physically available? What’s demand paging?

A

Because the address space of Virtual memory can be larger than the address space of physical memory, we need to rely on page swapping and demand paging. Processes will rarely need the theoretical amount of virtual memory so we rely on demand paging which is when pages are swapped in/out of memory and in/out of swap partition (eg disk, flash device).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does page replacement work?

A

When the page is referenced and it’s not present in memory, the MMU sill see that the page table has the present bit set to 0 and raise a fault and cause a trap to the OS. The OS determines if the page was swapped out to disk and will then establish the correct disk access required to perform the I/O to retrieve the page. The OS uses history-based predictions like Least Recently Used (LRU) & Acces Bit to determine which pages should be swapped.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens when a process tries to modify a page that’s write protected/how does COW work?

A

The Copy-On-Write (COW) mechanism is done to avoid unnecessary process copies of virtual address space. On process creation, the VAT of a new process is mapped to the original page. The shared memory is now write protected. For reads, this saves on memory and the time to copy. For writes requested by either process, the MMU will issue a page fault. The OS will then create a copy of the updated pages and then update the page tables of each process. This copy cost is only paid on demand and if necessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does address translation work?

A

Every CPU package is equipped with a Memory Management Unit (MMU). The CPU issues virtual address to the MMU and the MMU is responsible for translating the virtual address to physical addresses (or generate a fault). Registered are used to store pointers to active page tables or for segments, details on the segment size, number, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What’s the role of the TLB?

A

The Translation Lookaside Buffer (TLB) is a cache of valid virtual to physical memory translations that is accessed by the MMU. The TLB is used to speed up memory translation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does the OS map the memory allocated to a process to the underlying physical memory?

A

The OS creates a page table per process. The page table contains the entries that map a page from virtual memory to a page frame in physical memory. Memory is allocated only when it is needed so the mapping doesn’t occur until a process attempts to access it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What happens when a process tries to access a page not present in physical memory?

A

The page table will show that physical memory hadn’t been allocated via the “valid bit”. If a process attempts to access the page, the hardware MMU (memory management unit) will see the valid bit is 0, it will raise a fault and trap to the OS. The OS assumes control and then decides on what to do. For example, if the page was swapped to disk, the OS would pull the page back from disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What happens when a process tries to access a page that hasn’t been allocated to it?

A

The page table will show that physical memory hadn’t been allocated via the “valid bit”. If a process attempts to access the page, the hardware MMU (memory management unit) will see the valid bit is 0, it will raise a fault and trap to the OS. The OS assumes control and then decides on what to do. If access is permitted, the OS will execute the process to allocate memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Do you understand the relationships between the size of an address, the size of the address space, the size of a page, the size of the page table…

A

For an address size of X bits (eg 64 bits architecture), a process can access 2^X address space (2^64). For a page-size of 2^Y bits (4KB = 2^12), a page table will have 2^(X-Y) entries (2^(64-12) = 2^52). Since each entry in a page table is X bits, a page table size is 2^(X-Y) * X bits (2^(64-12) * 64 = 32PB).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Do you understand the benefits of hierarchical page tables? For a given address format, can you workout the sizes of the page table structures in different layers?

A

Multi-level page tables don’t require a paget table entry for every virtual address space. As a result, we have smaller internal page tables/directories and can provide more granular coverage. The downside is that it requires more steps for address translation (more tables). Let’s look at a 32 bit architecture/address space broken into a 12 bit segment, a 10 bit segment, and another 10 bit segment. The first segment reflects the directory of page tables which is equal to 2^12 number of page tables. Each page table maintains 2^10 entries. Each entry is can address 2^10 bytes. So the total size of the inner page table is 2^10 x 2^10 = 2^20 (1MB).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Page Table Size Quiz:

On a 12-bit architecture what are the number of entries in the page table if the page size is 32 bytes? How about 512 bytes? (assume single level page table)

A

128 entries 8 entries Based on the page size value y, determine x off set bits by identifying base 2 value….32 bytes = 2^5 bytes….5 bits is use for offset. Subtract offset bits from total bits architecture (12 -5 = 7) and then apply base 2 to determine # of entries….2^7 = 128 entries.

22
Q

For processes to share memory, what does the OS need to do? Do they use the same virtual addresses to access the same memory?

A

The OS establishes a shared channel between the process by mapping certain physical memory pages to the virtual address of each process. The virtual addresses don’t need to be the same.

23
Q

For processes to communicate using a shared memory-based communication channel, do they still have to copy data from one location to another?

A

Data copies are potentially reduced but not completely eliminated. Data copies local to a process from memory aren’t required if the process only needs to read the data in the shared memory. However, there are likely situations where the data in memory needs to be copied for operations locally to the process.

24
Q

What are the costs associated with copying vs. (re-/m)mapping?

A

For copying data via messages, a cost is paid in cpu cycles to copy data to/from ports. For maps, most costs are in cpu cycles to map the physical memory into the VAT but this is required only for setup. There are additional cpu cycles in copying data to the channel but this is minimal.

25
Q

What are the tradeoffs between message-based vs. shared-memory-based communication?

A

Message-based IPC Pros: Simplicity. Everything is handled by the OS (eg channel management, synchronization). Message-based IPC Cons: Overhead of user/kernel crossings which requires context switching. Requires copying data twice (in/out of the kernel). Shared-memory IPC Pros: After initial setup, the OS is out of the way. No longer requires user/kernel crossings. Data copies are potentially reduced for read-only use cases. Shared-memory IPC Cons: More complexity. Requires developer to handle synchronization, communication protocol, shared buffer management, etc.

26
Q

What are different ways you can implement synchronization between different processes (think what kinds of options you had in Project 3).

A

Pthread mechanisms like mutexes and condition variables. Binary semaphores which rely on values of 0 & 1 to provide similar behavior to a mutex for controlling access to shared memory. Also message queues which can exhibit protocol type behavior by executing an operation only once a confirmation message is sent/received.

27
Q

IPC Comparison Quiz :

Consider using IPC to communicate between processes. You can either use a message-passing or a memory-based API. Which one do you think will perform better?

a) message-passing
b) shared memory
c) neither; it depends

A

c) neither; it depends

28
Q

To implement a synchronization mechanism, at the lowest level you need to rely on a hardware atomic instruction. Why? What are some examples?

A

Because in a concurrent thread use case, a purely software solution won’t be able to implement an efficient way to manage mutual exclusion. Whether your code uses a while loop or an if/else statement, the condition variable for the lock can be accessed concurrently by multiple threads. Only hardware atomic instructions can ensure that only one thread at a time can check the lock value using methods such as “test_and_set” within a loop.

29
Q

Why are spinlocks useful? Would you use a spinlock in every place where you’re currently using a mutex?

A

Spinlocks are useful in that they continue to burn CPU cycles, constantly checking to see if the lock is available. Compared to mutexes, spinlocks don’t wait to be signaled when the lock becomes available and therefore can improve performance for small critical sections and when no other work is required. In addition, they can support more complex implementations. No. Spinlocks provide mutual exclusion just like mutexes do so there would be an overlap. In addition, there are use cases where it makes sense for the thread to block so that another thread can execute a different operation as it would with mutexes.

30
Q

Do you understand why is it useful to have more powerful synchronization constructs, like reader-writer locks or monitors? What about them makes them more powerful than using spinlocks, or mutexes and condition variables?

A

Higher-level synchronization constructs like reader-writer locks or monitors are more powerful because they abstract away a lot of the complexities involved with using basic constructs like mutexes, spinlocks, etc. These complexities are error prone and effect the correctness/ease-of-use. Also, they inherently lack additional capabilities to control access or priority.

31
Q

Can you work through the evolution of the spinlock implementations described in the Anderson paper, from basic test-and-set to the queuing lock? Do you understand what issue with an earlier implementation is addressed with a subsequent spinlock implementation?

A

est and Set: Low latency and low delay but very bad contention. The lock holder needs to wait for all the other test_and_set operations to complete. There’s no clear way of giving it priority. In addition, the atomic instructions require bypass of the cache directly to the memory on every spin. Not even compared in table. Test-and-Test-and-Set (Spin on Read): Loops on value of lock in cache first and once it’s free, then executes Test-and-Set. The latency and delay are a little bit worse but performs well under light load. With a write-update strategy, the performance improves because the cache gets updated with the new locked value. With a write-invalidate strategy, the performance is the worst, creating a lot of contention and coherence traffic. Delay Lock: There’s either a delay introduced after the lock is released or after every access to the lock. The goal is to spread out the threads and improves the contention but makes the delay worse. Static delays are a little better vs Dynamic delays. Dynamic is better under lighter load. Delaying after each memory reference is better than delaying after the lock is freed. Queuing Lock: Uses an array of flags with the number of elements equal to the number of threads. Assigns values of must_wait and has_lock to each thread. Relies on read_and_increment atomic instructions to control concurrent access to the queue. This lock is great for contention and delay and performs the best for large loads. Under light loads, is the worst due to read_and_increment.

32
Q

What are the steps in sending a command to a device (say packet, or file block)? What are the steps in receiving something from a device?

A

Starting with the user process, a system call (send data/read file) is sent to the kernel. The kernel/OS will run the in-kernel stack for the applicable device (TCP/IP stack to form a packet, filesystem needed to determine the disk block that stores the file data). The kernel will invoke the applicable device driver. The device driver will perform the configuration of the request (perform a transmission of packet data, issue disk head movement) to the device. Once the device is configured, it will perform the request (perform transmission, read block from disk). The results from the device will traverse the steps in a reverse manner. There is another flow that allows for the CPU to directly to the device and back called OS bypass.

33
Q

What are the basic differences in using programmed I/O vs. DMA support?

A

For programmed I/O, the CPU make a call to the command register and then all subsequent calls to the data register for any data transfer use cases. For the same use case, the CPU will make one call to the command register and one setup call to the DMA controller with details about the memory address & buffer of data that needs to be transferred. The DMA then handles the data transfer calls moving forward. The DMA is preferred for transfer of large amounts and programmed I/O is preferred for frequent transfers of a small amount of data since it doesn’t execute the DMA setup instruction.

34
Q

For block storage devices, do you understand the basic virtual file system stack, the purpose of the different entities?

A

At the top of the stack, user applications interface with a file (via POSIX API). Next is the kernel file system (FS) which takes the application level reads/writes and determines where & how to find a file block and access it. The FS will rely on the generic block layer to know how to interact and pass those operations to a particular device driver and interpret any responses. The device driver will then speak the device specific APIs to the device. The file is the main abstraction for the VS. The file is represented as a file descriptor. Each file has an inode structure which holds an index of all the blocks for the file. A structure called the dentry is used to track a single path component for files/directories that are accessed. A superblock is used to track info about how the filesystem is laid out on disk.

35
Q

Do you understand the relationship between the various data structures (block sizes, addressing scheme, etc.) and the total size of the files or the file system that can be supported on a system?

A

The size of the inode directly determines the size limit of a file. The inode will contain direct pointers to file blocks so if each pointer was 4B pointing to 1KB blocks, for a 128B inode we’d have 32KB (128/4 = 32) file size limits. To address this, inodes have indirect pointers to point to a block of pointers. So for 1KB blocks and each block pointer is 4B, we’d have 256 pointers (1KB/4B = 256). To determine the maximum size of a file, you’d need to calculate the (number of direct pointers + single indirect pointers + double indirect pointers + triple indirect points) x block size. Using the previous example, we’d have (12 + 256 + 256^2 + 256^3) * 1KB = 16 GB.

36
Q

For the virtual file system stack, we mention several optimizations that can reduce the overheads associated with accessing the physical device. Do you understand how each of these optimizations changes how or how much we need to access the device?

A

Buffer Caches in main memory. File is read/written to the cache and periodically flushed to disc. Reduces the # of disk accesses. I/O scheduling maximizes sequential vs random disk access. Will reorder block numbers sequentially to reduce any disk head movement. For example, if the disk head is on block 15, it will reorder write requests for block 25 & 17 so that 17 is written first and then 25. Prefetching increases cache hits and locality by reading additional blocks. For example, when a read for block 17 is executed, blocks 18 & 19 are also read. Negates the need for future disk reads. Journaling/Logging keep a log of all writes so that there is a record to be references for disk access. This reduces random disk access since there is no guessing involved. Will periodically move log data to disk.

37
Q

What is virtualization? What’s the history behind it?

A

Virtualization originated in the 60s at IBM where there were a few large mainframes that were shared by many users and business services. Virtualization allows concurrent execution of multiple OSs (and their apps) on the same physical machine. Each OS thinks it “owns” the HW resources and is presented in a virtual machine. The virtualization layer manages the physical hardware and includes the virtual machine monitor (VMM) and hypervisor. Examples are Zen & ESX.

38
Q

What’s hosted vs. bare-metal virtualization?

A

Bare-metal (also known as hypervisor-based) virtualization relies on a VMM to manage all hardware resources and supports the execution of VMs. For interactions with devices, relies on a Service/Privileged VM. Hosted virtualization is where the host OS owns the interaction with devices via device drivers (no Service VM) and leverages a special VMM module to provide hardware interfaces to VMs. Not only can it host VMs, it can directly run native applications too. Example is KVM (kernel-based VM).

39
Q

What’s paravirtualization, why is it useful?

A

Paravirtualization gives up on the idea that the guest VMs are unmodified. Instead, the VMs know it’s running virtualized and can make explicit calls to the hypervisor. The goal here is to avoid the overhead of inspecting/rewriting the VM binary and improve performance. Originally adopted and made popular by Xen.

40
Q

What were the problems with virtualizing x86? How does protection of x86 used to work and how does it work now? How were/are the virtualization problems on x86 fixed?

A

Pre 2005, there were only 4 rings of privilege where the hypervisor was in ring 0 and the guest OS was in ring1. If called by the OS (ring 1), the privileged register flags instructions POPF and PUSHF failed silently. The hypervisor was never notified so it never changed it’s setting and the guest OS also was never notified so it assumed the operations were a success. The solution was Binary Translation which rewrites the VM binary so that it doesn’t execute the 17 instructions that cause the problem.

41
Q

How does device virtualization work? What a passthrough vs. a split-device model?

A

Device virtualization is when a guest VM needs to access hardware devices. The passthrough models involves the VMM giving the device driver on the Guest VM direct access permissions to the hardware device. This allows the Guest VM to directly access the device and bypass the VMM. Unfortunately, this makes it difficult to share the device which isn’t feasible and it requires the VMM to know the exact type of device that the Guest VM expects based on it’s device drivers. Also, VM migration becomes difficult because the Guest VM isn’t decoupled from the HW. The split-device driver model involves device access control to be split between the front-end driver in the guest VM and the back-end driver in the Service VM (or host). Requires the guest VM to install modified guest drivers so only paravirtualized guests are supported. This model eliminates emulation overhead of a direct hypervisor model and allows for better management of shared devices

42
Q

What’s the motivation for RPC?

A

In reviewing all the requirements for IPC, it was observed that there were a lot of common boilerplate code that was repeatedly used. The primary difference is the protocol definitions. RPC provides the developer with the ability to define protocol details while auto generating the necessary boilerplate code.

43
Q

What are the various design points that have to be sorted out in implementing an RPC runtime (e.g., binding process, failure semantics, interface specification… )? What are some of the options and associated tradeoffs?

A

Binding mechanism that allows a Client to determine which server to connect to and how. Clients can interact with a registry that is an online distributed service that any client can register with. The registry can also be a dedicated process that runs on every machine. For this use case, the client must know the machine address and port number. Use of Interface Definition Language (IDL) to determine how to package arguments & results for communication between client and server. RPC can use a language-agnostic IDL (eg XDR in SunRPC) or language-specific (eg Java in JavaRMI). Language-specific IDLs only bring value to those who actually use the language (eg Java). Use of Pointers as Arguments which should be disallowed or serialize the pointed data. Causes issues because the pointer will likely point to a location in another process/machine address space. Only way to make this work is to send the “pointed to” data as part of the call. Partial Failures can make it tricky to identify errors because there are so many components that could cause the failure. RPC provides a special RPC error notification that serves as a “catch all” for any type of failure but doesn’t specify what caused it.

44
Q

What’s specifically done in Sun RPC for these design points – you should easily understand this from your project?

A

IDL: XDR (language-agnostic) Pointers: Allows them and serializes the pointer data. Failures: Retry mechanism for when a connection times out. Returns meaningful errors with as much info as possible.

45
Q

What’s marshalling/unmarshaling?

A

Marshalling is when a procedure and it’s arguments are serialized/encoded so that they are reflected in contiguous bytes in the buffer sent in the RPC call. This way, the RPC call is ready in a manner that identifies the procedure and arguments in order. Unmarshaling involved reading the buffer from the RPC call, reading the procedure and data types, then parsing out the data. As a result of unmarshalling, the arguments are placed in the received address space and initialized. Marshaling/unmarshalling aren’t explicitly created by instead done by the RPC system compiler.

46
Q

How does an RPC runtime serialize and deserialize complex variable size data structures? What’s specifically done in Sun RPC/XDR?

A

For XDR encoding, all data types are encoded in multiples of 4 bytes. For complex data types like arrays, the transmission buffer will require 4 bytes for the array length, X bytes for the characters (1byte per char), and any additional padding bytes required to get the total bytes to a multiple of 4 bytes.

47
Q

When sharing state, what are the tradeoffs associated with the sharing granularity?

A

Sharing at the granularity of either the cache line or the variable is too fine-grained and would lead to too much coherence traffic. Instead, it is preferred to share at the granularity of a page which makes sense for the OS. However, we need to be careful of false sharing. Sharing at the granularity of the object also works but it is dependent on the language runtime.

48
Q

For distributed state management systems (think distributed shared memory) what are the basic mechanisms needed to maintain consistence – e.g., do you know why it is useful to use ‘home nodes’, why do we differentiate between a global index structure to find the home nodes and local index structures used by the home nodes to track information about the portion of the state they are responsible for.

A

A global index structure provides a map of all ‘home nodes’ so that we understand where the most updated copy of data resides. The ‘home node’ maintains an index of the file and is responsible for driving coherence with the other nodes so it relies on the index to tell it who else has a copy of the day.

49
Q

What’s a consistency model?

A

It’s a guarantee that state/memory will behave correctly (access ordered and memory propagated) if and only if the software follows specific rules (eg use of locks, atomic operations, counters).

50
Q

What are the different guarantees that change in the different models we mentioned – strict, sequential, causal, weak…?

A

Strict Consistency: Updates visible everywhere immediately and in the same order. In practice, there are no guarantees with a single SMP w/o locking and synchronization. With distributed systems, latency & message reorder/loss make this even harder. Impossible to guarantee. Sequential Consistency: Memory updates from different processors may be arbitrarily interleaved. Ordering of updates is consistent across all processes. Updates from the same process always appear in the order they were issued. Causal Consistency: For causally related writes (P2 reads P1 write before and in order to execute P2 write), it is guaranteed that those writes will be correctly ordered. Weak Consistency: Leverages synchronization points so that processes can see the updates in order from other processes to guarantee consistency. How often synchronization points occur impact the level of consistency. This limits data movement & coherence operations but extra state needs to be maintained for additional operations.

51
Q

When managing large-scale distributed systems and services, what are the pros and cons with adopting a homogeneous vs. a heterogeneous design?

A

Homogeneous Pros: Keeps front-end simple (eg load balancer). Cons: Won’t be able to benefit from local node caching (understanding state for efficiencies). Heterogenous Pros: Different nodes have different tasks/requests. Benefit from locality and caching. Cons: Requires a more complex front end and complex node management.

52
Q
A