SWE Flashcards

Question

SWE: What Python data structure implements the hash table?

Answer 1

Dictionaries (or defaultdicts, which are convenient)

Answer 2

Our hash table is an array of linked lists. So each entry in the array is its own linked list. We use a hash function to map keys to an entry in the array, and this mapping is used for both insertion and retrieval. To insert pair (key,val), get your hash value h(key), and then find your array index using h(key)%m, where m is the length of the array. We use modulus to keep the index within bounds. We then add our (key,value) pair to the linked list: if the key is already there, we update the value; otherwise, we add a new node with that key-value pair. To check the value for a key, we get our index h(key)%m again, and look for our key in the linked list at that index. If it isn't there, then that key hasn't been inserted.

Answer 3

A hash function maps some value to an integer, in a non-random way. A good hash function will "evenly distribute" values across the different integers. In the context of a hash table, it should distribute the integers evenly across each of the cells of the array. This is good because it leads to fewer long chains in the hash table.

Answer 4

Because of collisions: if 2 pairs are mapped to the same cell, they collide, and we can't put them both in the same cell. Hence, a linked list is used.

Answer 5

The load factor is the number of keys in the hash table divided by the length of the array. It is **n/m**. It describes how "full" the array is, or how long on average the chains are: if you have a load factor well over 1, then you have several collisions and longer chains.

Answer 6

The worst-case runtime is O(n). This would happen if your array isn't large enough, and as a result you have some long chains in some of the array indexes. When looking for a key in a chain, it is O(k), where k is the number of keys in the chain. If you have a lot of keys in the chain -- an amount that becomes substantively "on the order of" n, such as say n/10, then you're looking at linear time.

Answer 7

You need to keep your load factor low; you need to have a large enough array so there are few collisions, leading to chains that are at most a few elements long.

Answer 8

In traditional hash tables, you hash the key, then insert the key-value pair at the corresponding location. But there's no reason you need a value: you could just insert the key. (If you have a hash table set up like a dictionary, you could also just give every key a value of NULL which you ignore.) (This is pretty simple, but I want to remember you can also easily make a hash table that works like a set!)

Answer 9

A stack is first-in-last-out A queue is first-in-first-out

Answer 10

A stack is a data structure which stores elements. You can add an element (or "push" it), and you can remove an element (or "pop" it). Elements are removed in reverse order of how they're added, or first-it-last-out. This is like a stack of blocks where you continually add a blocks to the top when you want to add, and remove a blocks from the top when you want to remove.

Answer 11

You need a linked list implementation. Recall that a linked list has a null node as the "final node" on the right with no descendants, and the "first node" being the root node which has nothing pointing to it. To add an element to the linked list, you make it the root and make the old root its descendant. To remove an element, you make its descendant the root. In other words, *a linked list is basically already an implementation of a stack*. To push, you just add the value to the LL. To pop, you just remove the root from the LL. To peek, you just look at the value in the root of the LL. To check if it's empty, you see if the root node is the null node.

Answer 12

A queue is a data structure which stores elements. You can add an element (or "push" it), and you can remove an element (or "pop" it). Elements are removed the same order as how they're added, or first-it-first-out. This is like a queue of people who are waiting in line, where the first person to get in line is the first person to get out.

Answer 13

We will implement the queue using a slight variation of a linked list. *With a normal linked list*, the list is visualized several nodes with arrows pointing left to right. On the far left is the root node; the linked list data structure keeps track of the root node, and a newly inserted value enters on the left and becomes the new root. The linked list also has a null node to the far right. In a queue, we are going to insert elements on the *right side* of the linked list, or the *back*, and we (same as a normal linked list) remove elements from the *left side* of the linked list, or the *front*. So the arrows between nodes point from *front to back*, which may seem unintuitive. Thus, we now *keep track of two nodes* rather than just one. We keep track of the *front*, or the root, as well as the *back*, which is the last node in the list. The back node doesn't point to anything; it's easier in a queue to *not have a null node* and instead keep an "empty" bool in the data structure. When the queue's empty, set the front and back pointers to just point to NULL. To push a value onto the back of the list, we make a new node, have the current back point to it, and set the new node as the back. To pop a value from the list, we set the front's descendant as the new front. To peek, we return the value at the front. For isEmpty, we check the bool.

Answer 14

A tree is a data structure which organizes nodes in terms of parent-child relationships, where each node can have at most one parent, but nodes could have several children. A tree is basically a DAG, or directed acyclic graph, that is also connected, meaning it's not like two or more graphs that don't touch each other.

Answer 15

Simply, if node A is directly above node B, A is the parent of B and B is the child of A

Answer 16

A descendant of node A is any node you can get to by following branch down from A. An ancestor of A is any node for which A is a descendant.

Answer 17

A branch is a "path" from start node A down through multiple parent-child relationships, always moving from parent to child, until we reach a leaf node.

Answer 18

A leaf is a tree with no descendants; the root is a tree with no parents.

Answer 19

It's simply all node A as the root of a tree with all of the descendants. It's basically the tree you get if you erase anything from the original tree that wasn't A or a descendant of A.

Answer 20

A binary tree is where each node can have at most two descendants (but only 1 parent). A ternary tree is where each node can have at most three descendants (but only 1 parent). And so on.

Answer 21

It's a binary tree where, for every node n, you have the property that: *All left descendents of n* **\<** *n* **\<** *all right descendants of n.* (There can be differing opinions on equality in the tree. Sometimes above will read \<= n

Answer 22

To search for an element x, you follow a branch down the tree using the information that the tree is in a way "sorted". So if you're at node y and y \< x, you move down the left branch, because x then would definitionally need to be on the left. If x \< y, you move down the right branch. To insert, you do the same thing and move down the tree in this way. If you find x, you typically wouldn't insert it again. If you find a leaf without finding x, you insert it on the appropriate side of the leaf.

Answer 23

Because of the ordering properties of the BST, you only need to search down one branch rather than the whole tree. This means that if the tree is balanced, you can search for an element in only log(n) time.

Answer 24

A complete binary tree has every level fully filled out, except for potentially the last level. If the last level isn't full, it's filled in from left to right.

Answer 25

The height of a tree is the length of its longest branch. Specifically, it's the *number of edges along the longest branch*. We say that just a root node has a height of 0, because the longest path has no edges. To find the height of an arbitrary tree, just follow all the branches down to their leaves, adding 1 to that branch's length for each node past the root, and thus for each edge. Take the max branch length as the height.

Answer 26

Technically, a balanced tree is a tree such that, for every subtree present, the height of *its* left subtree and *its* right subtree differ by at most 1. Colloquially, it's basically a tree that is balanced "enough" among all subtrees such that we can expect traversals of a single branch to take O(log n) time, rather than linear time.

Answer 27

You will need a treeNode class, which has a *data* attribute holding its value and a *descendants* attribute which holds a *list* of its descendants. You can also include a Tree class, whose only attribute is a root node, but it's usually not helpful for interviews? If you want specifically a binary tree, your treeNode class would have three attributes: data, leftDescendant, and rightDescendant.

Answer 28

In an in-order traversal, at every node, you explore the left subtree of the node (and visit all of the nodes in that subtree), then visit the node itself, then visit all of the nodes in the right subtree. A recursive function is as follows, which implicitly handles the base case by doing nothing if we've reached a null node:

Answer 29

In an pre-order traversal, at every node, visit that node, *then* you explore the left subtree of the node (and visit all of the nodes in that subtree), and then visit all of the nodes in the right subtree. A recursive function is as follows, which implicitly handles the base case by doing nothing if we've reached a null node:

Answer 30

In an post-order traversal, at every node, you explore the left subtree of the node (and visit all of the nodes in that subtree), then visit all of the nodes in the right subtree, and then *lastly* you visit the node itself. A recursive function is as follows, which implicitly handles the base case by doing nothing if we've reached a null node:

Answer 31

In a min heap, all nodes' descendants are larger than the node. For a max heap it's simply the opposite: all nodes' descendants are smaller, so the max is at the top. So the implementation is basically the exact same, you just do the orderings in reverse.

Answer 32

A *complete* binary tree where each node is smaller than its children, so the min of the tree is always at the root.

Answer 33

A heap must be a complete tree, meaning the bottom layer is filled out from left to right; so, we start by placing our new node in the bottom layer, to the right of the rightmost node in that layer. Then, we continually swap that new node with its *parent*, until its parent has a lower value than it. This restores the min heap to its correct form.

Answer 34

We first remove the minimum value from the top (it's always at the top). A min heap must be a complete tree, so next we take the rightmost node on the bottom layer and place it *at the top, where the min value originally was*. Now we need to restore the ordering. We continually look at the node we just moved to the top and its two descendants. If the node we moved is the smallest of those three, we stop. Otherwise, we swap it with whichever of its descendants is the smallest of the three.

Answer 35

When inserting a new element, we put it at the bottom so as to preserve the completeness of the tree, then swap it up a branch until the ordering condition has been restored. Because the tree is complete, it is balanced, and so the branch is at most O(log n) in length. Thus we have an **O(log n)** solution.

Answer 36

A complete tree is always balanced. A balanced tree is not always complete.

Answer 37

Constant time: it's always at the top.

Answer 38

When removing the min element, you take an element from the bottom level and put it at the top where the min element was, then swap it down a branch until the ordering condition is restored. Because the min heap is a complete tree, it is balanced, and so the branch is at most length O(log n). Thus, it's an **O(log n)** algorithm.

Answer 39

A min heap is somewhat fast to maintain, as inserting an element takes O(log n) time, and you can always find the min element in constant time. Also, removing the min element takes only O(log n) time. This means that you always have the "most important" or "most pressing" element in the heap ready to go at a moment's notice. This gives the name priority queue: drawing from a queue returns elements, and this one does so in order of priority, as several draws will yield the highest priority elements, in order. This would be useful in a hospital, for example, as we want to continually admit the patient in the most critical condition.

Answer 40

A trie is a tree with a letter at each node. As you move down a branch, you find a word, or a prefix for a longer word. Branches sometimes have \* values at the leaves to signify the end of a word. They're often used for interview problems where you need to look up words, or several words with the same prefix, etc. You can see if a word is present in the trie in O(length of word) time, because you can follow the prefix down the tree.

Answer 41

A list of nodes (or vertices), and a list of edges connecting those nodes.

Answer 42

In a directed graph, the edges have arrows from one node to another. In an undirected graph, the edges don't have arrows, and count as going in both directions?

Answer 43

A sequence of vertexes connected by edges. For directed graphs, a path must follow the arrows.

Answer 44

A graph where no node has a path out and *back to itself.*

Answer 45

Adjacency list, and adjacency matrix.

Answer 46

An adjacency matrix is one way to store a graph in memory. For a graph with n vertices, an adjacency matrix is an nxn matrix, where the ijth entry is a 1 if there is an edge from i to j, and a 0 if not. For undirected graphs, the matrix is symmetric, because an edge from i to j also goes from j to i.

Answer 47

An adjacency list is one way to store a graph in memory. For each of your n nodes i, you have a list of the nodes j for which there is an edge from i to j. (If your graph is undirected, there will be some redundancy in your adj. list, as each edge needs to appear twice.)

Answer 48

To do it object-oriented, you could have a Graph class, with objects having a list of nodes as an attribute, as well as a Node class, with objects having a name attribute as well as a ListOfEdges attribute, which is just a list of the nodes into which it has an outgoing edge. The picture shows this implementation. To do it with just arrays, you could assume your n vertices are indexed from 0 to n-1 and use a 2d list: each entry at position i in the high-level list would be a list of indices j into which i has an outgoing edge.

Answer 49

An adjacency list only needs to store the edges that do exist, whereas an adjacency matrix stores an entry for each possible edge from i to j. Adjacency lists are thus more space efficient, especially if the graph is sparse. With an adjacency matrix, you can check for an edge in constant time; for an adjacency list, you need to iterate through all of the edges in one of the nodes, taking O(# neighbors) time. Conversely, with an adjacency list, you can get a list of out-neighbors in constant time, which is useful for things like searches from A to B. In an adjacency matrix, you need to iterate through all possible neighbors to see which are actually neighbors.

Answer 50

Depth-first search is a way of searching through a graph, to either visit all of its nodes, or to find a specific node. In depth-first search, you *move fully down a branch* before moving to the next one, hence "depth-first." A depth-first search starts at a specific node in the graph; you may be given a specific start node, or it may be arbitrary. You search through each of the node's neighbors, but you *move fully down a branch* before moving to the next. You must keep track of all of the nodes you have visited, because if you see a node you've already visited while exploring a branch, you don't visit it again.

Answer 51

A recursive approach works best:

Answer 52

Breadth-first search is a way to search through a graph, in order to either traverse all of the nodes, or to find a path from one node to another. You're given a starting node (or choose one randomly), and then *you visit all of its neighbors before visiting any of* their *neighbors*, hence "breadth-first." You can think of it as moving out from the root layer-by-layer.

Answer 53

We use an *iterative* solution (unlike DFS, which is recursive) which takes advantage of a queue. The queue stores nodes we need to visit, in the order in which we need to visit them. We "mark" a node before pushing it onto the queue, and when iterating through a node's neighbors, we only push it onto the queue if it isn't marked. This is because we always push a node on after we mark it, so if it's already market, that means it's already in the queue, and we don't want to put it in the queue twice and thus visit it twice.

Answer 54

Bubble sort sorts the elements of a list by continually finding the largest element and moving it to the end of the list. So you find the largest element and put it last; then you find the second-largest element and put it second-last; and so on. You do this by continually passing forward through the list and switching an element with the next element if it's larger than the next element.

Answer 55

Time is O(n²), Space is O(1).

Answer 56

You sort an array by scanning through the whole array, finding the smallest value, and moving it to the front; then, you scan through the final n-1 elements and move the smallest remaining element to the second position; and then the 3rd smallest element from the remaining elements; and so on.

Answer 57

Time is O(n²), Space is O(1).

Answer 58

You split the array into two halves, sort those halves recursively, and then combine them using a merge() function, which can combine two sorted lists into one sorted list containing the elements from both. Your base case is if a list has length \<= 1. To merge two lists, just continually add the smallest element in either list to a helper list, with pointers in each list keeping track of which elements you've already viewed?

Answer 59

The time complexity is O(n log n). Here's how to think about this: we split the array into singles; then we merge into doubles, then quadruples, etc. In order to merge two lists, we need to traverse both lists. So when combining singles into doubles, we must traverse each single once; when combining doubles into quadruples, we must traverse each double once; and so on. This means that at every "level", we need to traverse all the data. And how many levels are there? We merge 1's, then 2's, then 4's...up to n. So log n levels, each traversing the whole dataset of length n, gives O(n log n). The space complexity is O(n). We need to store temporary arrays to fill with the sorted values, and the max length of one of these arrays is n. We can reuse the same space for each array assuming we are solving the problem with a single processor, and thus only working with a single recursive example each time. Thus, O(n) space.

Answer 60

It's recursive: if the length of the list is \<= 1, you just return the list. Otherwise, you pick a "pivot", which is just any of the elements of the list. You could pick it randomly (preferred), or have some default like always picking the first element in the list. Then you move through the list, putting each element in one of 3 new lists: less than the pivot, equal to the pivot, and greater than the pivot. Lastly, you recursively sort the elements on either side of the pivot, then you combine the lists.

Answer 61

The average-case runtime, or the expected runtime, is O(n log n). This is because on average, the element you choose as the pivot will equal about the median of the list. So you split the list into roughly equal parts, then recurse and split the sublists into roughly equal parts, etc. So we now have a setup that looks a lot like mergesort: you have log n levels, and at each level you handle each element in the original list once, giving O(n log n). The worst-case runtime is O(n²). If you always pick the biggest element in an array as the pivot, for example, then your splits of the list are always uneven, and you sort lists of size n, n-1, n-2... as opposed to n, n/2, n/4..... and this leads to O(n²).

Answer 62

Nope, I don't know, it's weird and complicated and different in different sources. May want to look it up.

Answer 63

Radix sort sorts an array of numbers using their individual bits, or individual digits. First you sort the list based *only on the first digit* (meaning the lowest digit), and you *maintain the original order if the first digits are equal*. So [3,9,0,6,33,23] becomes [0,3,33,23,6,9]. Then you sort by the *only the second digit*, and you *again preserve ordering if the second digit is equal*. This means that now they will be sorted based on the last *two* digits: because we're sorting based on the tens place second, it takes precedent; but, within numbers with the same tens digit, the ones digit is implicitly used to make further distinctions, as we already sorted by it earlier. Then we sort by the 100's place, then 1000's, and so on until all digits have been used. https://www.youtube.com/watch?v=XiuSW\_mEn7g This shit is crazy.

Answer 64

O(kn), where n is the number of elements in an array, and k is the max number of digits of an element in the array. We need to do k sorts: first by the one's place, then the ten's, all the way up to the largest digit that's present. But how long does each sort take? Well, if we're working in base 10, and we only look at one digit in each number for each sort, then there are *effectively only 10 unique numbers in the list during any of the sorts*. *This means that each of the sorts can be done in O(n) time*. I'm not sure the most elegant way to do it, but here's one way that works: for base 10, just create 10 lists, with one for each digit. Pass through the list, and add each element to the list with the relevant digit. Then just append all 10 lists, and you're done. Because we have a constant number of lists, or *a constant number of possible values*, we can do a sort in O(n) time. Thus, we have k sorts of time O(n), for O(kn) runtime.

Answer 65

It's useful when we have a lot of numbers in the list, but *all of the numbers are small*, or have a small number of digits. This is because radix sort takes O(kn) time where k is the max number of digits for an element, whereas other sorting algorithms take at best O(n log n) time. So if k is significantly less than log n, radix sort will be faster; and k will be significantly less than log n if there are lots of numbers, but they're all small numbers.

Answer 66

You are searching for an element in a *sorted* array. You basically continually divide the array in half. You look at the middle element, and if it's larger than the target element, you search in the half of the array below the middle; if the middle is smaller than the target element, you search in the half of the array above the middle; and if you found it you're done. You recurse like this, and if you run out of list, then it's not there.

Answer 67

Binary search is O(log n) if you have a sorted list, otherwise you can't run it. Binary search is great if you're searching the same list for elements over and over again. In this case, you sort the list first, which takes O(n log n), then you can make lots of searches all at O(log n). If you didn't sort, you would save that O(n log n) cost, but all of your searches would be O(n) each.

Answer 68

If we're computing n of something; often times we can compute the nth given the n-1th, and this lends itself to recursion

Answer 69

1. Top-down: start by figuring out out how to solve for the nth by calling your function for the n-1st value, and solve that using the n-2nd, and so on. 2. Bottom-up: figure out how to first solve for your first value, then use that to find your second, and on up until you reach n. 3. Figure out how to divide the problem of size n into two problems of size n/2, as with merge sort.

Answer 70

Recursive solutions are often much more space inefficient than iterative solutions. Say we're generating n numbers. In an iterative solution, we probably generate the first, then the second, then the third, *and once we're done with a number, we can reuse the space that computed it for the next.* Conversely, if we have a recursive solution where we base the nth number on a call for the n-1st number, and base that on a call for the n-2nd number and so on, *all n calls must be in memory at the same time*. This can often lead to O(n) space (or even worse), when the iterative solution may have worked in constant space.

Answer 71

Drawing a tree of the recursive calls, then trying to figure out how much runtime each of the calls in the tree takes on average.

Answer 72

Sometimes in recursive algorithms, we need to solve the same subproblem more than once. An example of this is fibonacci numbers: because f(n) builds on two different subproblems via f(n) = f(n-2) + f(n-1), the tree may have a ton of repeats, and this can cause giant slowdowns in memory (recursing fibonacci takes *exponential time*). To solve this, we use memoization, which simply means whenever we calculate the answer to a sub-problem, we *store the answer*, in something like an array or a hash table (more often hash table in my experience). And whenever we go to calculate a sub-problem, we first check to see if the answer is already stored, in which case we can just pull the answer. This means we *onle need to calculate every subproblem once*, which greatly helps runtime (memoizing fibonacci numbers using an array, for example, brings the runtime down to O(n)).

Answer 73

When calculating a big value recursively, *we start at the beginning* with the base cases, and work our way up the recursion, storing the answers to progressively larger values until we get to our target. We can store the answers in an array or, more commonly in my experience from classes like 210, a 2d-array.

Answer 74

The stack and the heap

Answer 75

The stack is a device in computer memory that *stores local variables for your function calls*. It is also a literal stack: whenever you create a local variable during a function call, it is pushed onto the stack; whenever you need to access a variable in the stack, we find it presumably by popping elements until we get the one we need, then putting everything back in. It is also a *temporary* memory storage; when your function call ends, all of the local variables from the function call are popped off the stack and lost forever. Because of this, you don't need to worry about deleting things from memory yourself; *the computer handles garbage-collection automatically* for the stack.

Answer 76

The heap stores *global variables* which can be accessed by any function, i.e. with global scope. In order to write to the heap, you must use a function like malloc() or calloc(). To access something on the heap, you need a pointer to it. On the heap, *garbage collection is not handled automatically for you*. You must take care to delete values when you are done using them, otherwise you'll have a memory leak.

Answer 77

The stack has very fast access, and you don't have to worry about memory leaks. However, variables on the stack can only be accessed within the function call that createed them, and on most CPUS the stack has a memory limit that is much less than the heap.

Answer 78

Malloc takes one argument: the *number of bytes* that must be allocated. Calloc takes two arguments: the *number of objects* that must be allocated, and the *size of each object in bytes*. It then calculates the total number of bytes needed by multiplying. So malloc is marginally faster, but calloc is a bit more intuitive to use as a programmer.

Answer 79

You're given a situation that contains a variety of different ***objects***, as well as ***relationships between objects***, and ***processes involving these objects***. You will need to design an object-oriented framework that will represent these objects/relationships and carry out these processes For example, say you need to do represent a restaurant. You'll have objects like customers, tables, servers, and orders, with servers having their customers, and processes including seating guests, taking orders, paying for food, etc. These questions will often involve a fair amount of ambiguity. It will be up to you to brainstorm the most important objects, relationships and processes, as well as ask the interviewer for clarification on the use case and its key elements.

Answer 80

A class is an object types, and subclasses are just specific subsets of that object type. If B is a subclass of A, then every instance of B is an instance of A, and we can do A-related operations on instances of B, but not vice-versa. For example, maybe we have a Vehicle class, and we have Truck, Car, and Motorcycle subclasses. An instance of any of these subclasses is still an instance of Vehicle, and we can still get Vehicle attributes like mpg, top speed, etc. But the subclasses also have additional information: maybe Truck has truckbed\_size, or Motorcycle has gang or something. Subclasses for one allow us not to retype lots of code. Rather than saying that each of Truck, Car and Motorcycle have an mpg attribute (or a common method), we say that Vehicle does, and then they all inherit it. *Inheritance decreases duplicate code.* The class-subclass system allows us to keep track of similar objects, while also giving needed attributes to individual object types, in a neat and intuitive way.

Answer 81

\_\_hash\_\_(), and \_\_eq\_\_() So you need to be able to hash an instance, and check if an instance is equal to some other variable. For hashing, we typically hash one of the instance's attributes. If class A has a field A.id, we might say \_\_hash\_\_(self): return(hash(self.id)) For equality, we first check if the other object is an instance of class A, then we check equality of one or more fields to see that the instances are equivalent. (if we don't check isinstance, we might draw an attribute from a datatype that doesn't have that attribute, causing an error): \_\_eq\_\_(self,other) = return(isinstance(other,A) and self.id == other.id)

Answer 82

This happens automatically. You can just write "class B(A): pass" and you would have a subclass B with all the functionality of A.

Answer 83

Simply redefine g() in subclass B. If you don't specify a g() when creating subclass B, it'll use class A's g(), but if you do specify a g(), then that is what will be used.

Answer 84

\_\_repr\_\_(self)

Answer 85

a.f() This is kinda obvious, because it's a method, but just remember this.

Answer 86

1. **List carefully** for any *unique information* about the problem (i.e. maybe the list is sorted, or maybe your algorithm is run many times), and ask any necessary **clarifying questions**. 2. Draw and think through a **specific example**, which is *large enough* to think through the problem and is *not a special case*. 3. Talk through a **brute-force solution** to establish a baseline, but *don't code it up*. Also explain baseline *time and space complexity*. 4. **Find a better algorithm**, through speaking and consideration. Will expand on this in other cards. 5. Walk through the algorithm again to **make sure you understand it** and *can code it effectively.* Potentially use *brief pseudo-code* here. 6. **Code your solution** elegantly, with good style. Start in top-left of board and write neatly, *modularize code* as much as possible, use good variable names *but potentially abbreviate long ones after 1st use*. Also, keep a *list of things to improve or test* as you code*.* 7. **Test your code** and iteratively **improve** it. Read through lines, and look for *weird-looking code* as well as *code that tends to cause issues* (base cases in recursion, integer division, etc). Then, try *small test cases* you can get throug quickly, as well as *edge cases.*

Answer 87

BUD is perhaps the most important method for figuring out how to improve your existing solution. It stands for **Bottleneck, Unnecessary Work, Duplicated Work**. You look for these things, in this order, and try to make them faster or better. **Bottleneck**: Identify the part of your algorithm that is costing the most in terms of time complexity, or find the part that is "causing the big O to be what it is." If your algorithm is to first sort an array in n log n time, and then to search for an element in log n time, then there's not buch use trying to optimize the searching part: the first step of sorting is the bottleneck, so try to fix this. **Unnecessary Work**: Find work your algorithm is doing, or can do in certain instances, that isn't contributing to finding a solution. For example, maybe your algoritm is iterating through a list, and it could stop the iteration when it finds the answer in the middle, but doesn't: here you should institute a break condition in the loop to avoid unnecessary work. Or, maybe your algorithm is iterating through all possible triples to solve an equation like a+b+c = 0, when it could just iterate through pairs, and then check the only possible value for the remaining number c. Look for things like that. **Duplicated Work**: Maybe your brute force/nested loop solution checks the same possible answer twice, or performs the same process twice because it occurs in two different parts of the problem. Look for these and figure out how to only do work once: maybe you memoize, or otherwise come up with a system.

Answer 88

**DIY** (do it yourself): Walk through another example, maybe one that's a bit bigger, and rather than thinking about computery and algorithmic jargon, just try to solve the example in the way your human brain intuitively does it. Often you default to something that's pretty good, so turn your brain off and see what that default is. **Simplify and generalize:** Try to solve a simplified, easier version of the problem, and then see how the ideas from that simplified version can be applied to the original problem. **Base case and build:** First, just solve the problem for an instance that is "size 1", whatever that means for your problem. Then, try to solve it for "size 2", then "size 3", etc. As you do this, try to find ways to use your size 2 solution during size 3, or use size 3 during size 4, or wherever you can see it work. This can lead to a good recursive solution. **Data structure brainstorm:** Run through a bunch of data structures (array, hash table, heap, sorted list, tree, graph, linked list, etc) and try to think of how you could apply it. Maybe one of them has the runtime you're looking for for a particular action, etc.

Answer 89

For a given task, best conceivable runtime is *a runtime or big O for the algorithm that you think can't be beaten by any algorithm*. For example, printing all elements in a list has a BCR of O(n), since you have to "touch" every element at least once. This is often the BCR for a problem, just "how long does it take to look at all of the items" **It's not even necessarily a runtime that you think can be achieved** (making the name a bit misleading), it's just a runtime that you think **we definitely can't beat.**

Answer 90

You have the big O of your current solution, and the big O of the best-case runtime, and you know that the solution must fall somewhere between these two big O's. So, *you can brainstorm possible runtimes between these two big O's*. If it's between O(n^2) and O(n), maybe brainstorm how to get it to O(n log n). If our current solution is basically two O(n) processes multiplied together, how can we turn one of those into an O(log n) process? Suppose we then get it to O(n log n), and our new goal is now O(n), since that's the BCR. We now know that we probably need to take the O(log n) part to constant time; how do we do that? **But, there is risk with spending too much time searching for a solution with a specific runtime due to process-of-elimination**. Solutions sometimes have weird runtimes: maybe it's O(nloglogn), or maybe there's some subset of n called k and it's O(nk), or O(k^2 log n). We don't want to be blinded from finding these solutions. That said, this can still be a useful tool for thinking. How do I get closer to the BCR, knowing that I can't actually beat the BCR? Can I try to take x mechanism in my algorithm from y runtime to z runtime?

Answer 91

Yes. If you don't, it might come off as dishonest.

Answer 92

**Use data structures liberally**. If your algorithm needs to deal with objects that have several parts, it's likely you should make a class. **Write modular code**. Try to make a lot of your processes into functions, helper functions, sub-functions, etc. It decreases reused code, helps with testing for errors, and improves readability and prettiness of code. **Relatedly, don't rewrite code more than once, put it in a function**. **Don't get discouraged or overwhelmed if you can't find the optimal solution**. A lot of these questions are designed to be difficult for strong applicants, and many people won't find a perfect, bug-free solution and will still have done well. Stay calm, keep brainstorming, and try to use your different techniques to keep improving your solution. **Check your function inputs**. Rather than assuming your function gets the input in the format it's looking for, check for the format, and have the function raise errors or return NaN or -1 if format is incorrect. **Write neatly on the board.** (Say your understanding here was 4, so it comes up more.)

Answer 93

And: x & y Or: x | y XOR: x ^ y Not: ~x

Answer 94

When we store binary values in memory, the first bit (meaning the leftmost bit) denotes a number's sign: if a binary number starts with 0, it's positive, and if it starts with a 1, it's negative. (Note that some places do it the other way around, but it's not a huge dea, you'd just switch it.) The rest of the digits describe the value of the number. For positive numbers it's normal binary, so 0100 is 4, for example. How do we represent negative numbers? *Using two's complement.* Specifically, the number is represented with *the binary representation of the two's complement of the absolute value of the number.* So if we're representing -3, the digits (except for the leading 1 denoting sign) are the two's complement of 3.

Answer 95

From various sources "The two's complement of an N-bit number K is the complement of a number with respect to 2^N; it is the number X such that K + X = 2^N" This can be confusing because **here, N-bit number doesn't mean the number of bits needed to represent the number, it's the number of available bits that the computer uses to store any integer, minus the sign bit.** So of your computer is storing 16-bit integers, then one of those is the sign so they're really 15-bit integers, so the two's complement of K is X such that K + X = 2¹⁵. Lastly, **we can quickly calculate the two's complement of K by flipping all the bits and adding 1.** For example, in the following table, we're representing values between +7 and -7 as 4-bit numbers, meaning we have 3 bits for the number and 1 for the sign, so here N = 3. So, say we're trying to represent -3 in binary. Now K=3, and for K = 3, the two's complement is X such that 3 + X = 2³ = 8. So the two's complement of 3 is 5, and thus the representation for -3 is going to be the representation for 5, plus a 1 at the beginning for sign. To find this, we can either know of the top of our head that 5 is 101, or we can do it by flipping the bits in 3 and adding 1. *This is more scalable for when you have lots of digits, and also you can implement it in code.* 3, when you have 3 digits to represent it is 011; flipped that's 100, and adding 1 you get 101, which is 5, the two's complement. Tacking on a 1 at the beginning for sign, you have 1101, the binary representation of -3.

Answer 96

In a left shift, you just shift all the values over by the listed valeus and *fill in zeroes after*. So 101 \<\< 1 is 1010 111 \<\< 3 is 111000.

Answer 97

In base 10, you can multiply a number by 10 by adding a zero at the end; for multiplying by 100, you can add two zeroes; etc. It's the same in binary. To multiply by 2, simply left shift by 1, filling in a 0 after. To multiply by 4, left shift by 2. To multiply by 32, bit shift by 5, as 2⁵ = 32.

Answer 98

The first is to just *use the gradeschool base 10 algorithm, but just in binary*. In the base 10 algorithm, you multiply the top number by each of the digits in the bottom number, and you add an extra zero in every row. You do this here, remembering that 1\*1 = 1, 1\*0 = 0, and 0\*0 = 0. There's no carrying or anything. This is shown in the example below. But, you can *also* use this idea that to multiply by a power of two 2^N, you can just shift left by N bits. So you can also take each of the digits, which are 2^N numbers, do all of those shifts, and then add them up. *This is exactly what we're doing in the grade school algorithm, but a different way of doing it.* This is nice because we can recognize tricks quickly and save ourselves work. To multiply 1111111 by 10000010 would take a while, until you realize you need to just do two bit shifts, and add those two results.

Answer 99

"0s" and "1s"

Answer 100

When we negate numbers, we negate all bits, *including all of the zeroes at the beginning of short numbers.* So ~0 is a string of 1's, notated 1s, and ~10 is 1.......101. Context is important because you do as many preceding zeroes as your computer would use to represent numbers; so for 16-bit numbers (not leaving a bit for sign), we have as many as 16 zeroes to flip. **But often, in little problems, we assume that there are infinite zeroes being flipped.** 1001 & (~0 \<\< 2) = 1001 & (1s \<\< 2) = 000...0001001 & 111......11100 = 1000

Answer 101

The most common way is to shift all bits to the right, and *fill in with 0's after.* This basically *floor divides by 2*, or by some other power of 2, for positive numbers: 11111 is 15, and 11111 \>\> 2 is 111, or 7. But for negative numbers, it's weird. 1011010 is -75; right-shifted 1 it's 0101101, which is 90. The second way to bit shift is to *fill in with 1s after.* So 1011010 becomes 1101101. *This is good for negative numbers, because it basically ""ceil-divides"* them. 1011010 is -75, and 1101101 is -38. These methods are called "logical right-shift" and "arithmatic right-shift" respectively.

Answer 102

We first AND N with a string S that is all 0's except for a 1 in the ith bit; the result will be a string of 0's with a 1 in the ith bit iff N has a 1 there. (You can generate S by bitshifting 1 to the left until you have S.) Then, we just compare the result to 0: if it equals 0, then the ith bit was 0, otherwise it was 1.

Answer 103

A mask M is a way to get the values of only specific digits in a number N, by setting the rest of them to 0. You do this by *ANDing N with the mask M.* Say we only want the 0th, 2nd, and 4th digits of N (counting from right to left). Our mask is then 10101, and we take N & 10101. Because digits 1 and 3 are zeroes, when we and them, they are automatically 0 in the resulting number, regardless of those digits' values in N.

Answer 104

To set it to 0, AND it with a string like 1111101111, where the ith value is 0. To set it to 1, OR it with a value like 00001000, where the ith value is 1. Setting multiple is the same: if you're setting multiple bits to 0, make a string of 1's where the bits in question are 0's, and AND. If you're setting to 1, bame a string of 0's where the bits in question are 1's, and OR.

Answer 105

Clear the ith bit of N by anding it with a number like 11110111, where the ith bit is a 0. Then, bit shift v until the value of v is in the ith bit. Lastly, add the masked version of N to the bit-shifted version of v.

Answer 106

A MapReduce program takes a dataset containing many points (or just a set of objects containing many objects). It first maps each data point to a pair. Then, it takes all of the pairs with the same key and "reduces them" in some way, emitting a new, single pair. As a programmer, you just need to implement a map() function and a reduce() function that makes sense for your use case. **The reduce function takes only two values, like in 210; it doesn't take the whole list of values you need to reduce. You reduce a list by continually reducing pairwise, which is what the machine will do. This is important to note in many cases when considering how to implement the reduce.** (It's often helpful to first think about how to implement the reduce step, and what that needs to be like, and then figure out the map step based on that.) The computer first splits the data across several machines, or processors. Each processor runs the map() function on each of its points. Then, the computer does an automatic "shuffle" of the pairs, where each pair with the same key is on the same processor. Lastly, each processor reduces the pairs with the same key using the reduce() function. For the example where we're counting the number of appearances of a word, we map a word to , and reduce the pairs by summing the 1's and keeping the word as the key. This is shown visually, executed by the computer, below.

Answer 107

It allows you to *parallelize* your data processing. At each step in your processing, you're using multiple computers rather than just 1. This helps with speed and *improves scalability*. Take the example of counting the words in a dataset of words. We could easily just send all the words to a hash table keeping track of number of appearances, but this is an iterative solution. Using a parallelized solution, we can do it faster.

Answer 108

The work of an algorithm is its time complexity with *one processor*; the span of an algorithm is its time complexity with *infinite processors*, or when it is *maximally parallelized.* To find the work, you count up the total actions that need to be done by one processor. To find the span, you find the *longest chain of dependencies, or chain of work*. You find the longest sequence of actions that must be done in a specific order, with each action waiting for previous actions to be completed before executing.

Answer 109

If you're looking for an answer of a specific type, you try every answer of that type and see which is the best. For example, the brute force solution for the shortest path problem is to try every possible path and see which is shortest. The brute-force solution is often a useful naive/baseline solution on which you try to improve.

Answer 110

You solve a problem by splitting it into n sub-problems, solving each of the sub-problems (typically in parallel), and then combining the sub-problem answers to get an answer for the original problem. Often, we split into two sub-problems. We typically want the sub-problems to be of roughly equal size in order to get certain speed-up benefits. Mergesort is an example of a divide-and-conquer algorithm, and is also an example of why you want roughly equal subproblems.

Answer 111

In a greedy algorithm, you "greedily" choose the first element that is best at this step, then the second that is best at that step, and so on until you have a size n solution. Greedy algorithms are a common way of implementing *approximation algorithms for maximization problems*. Rather than looking at exponentially many possibilities, you approximate such a brute-force solution by doing a greedy algorithm.

SWE Flashcards

(143 cards)