MIT 6.006- Data structures Flashcards

(69 cards)

1
Q

Define a computational problem

A

a binary relation between two sets, the set of inputs and the set of outputs, usually expressed as a predicate.

For example, given all birthday of students, which two students were born in the same day etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the course goals?

A
  1. Solve computational problems
  2. Prove correctness
  3. Argue efficiency
  4. Communication (be able to convince people it’s correct and efficient)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the difference between an interface and a data structure?

A

Interface are the operations supported = What I want to do

Data structure - how I do those operations (algorithms). eg i can have different data structures supporting the same interface…

Interface is the “problem” and a specific data structure is the “solution” or implementation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What two main interfaces types are we going to discuss during this class?

A

Sets (we care about the items values but not order) and sequences (we care about the sequence)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two main DS approaches?

A
  1. Array based (static)
  2. pointers based (dynamic)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s the interface for a static sequence dataset?

A
  1. Init
  2. get_at(index)
    3.set_at(index,value)
    4.len
    5.iterate_seq
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s the natural solution for the interface for a static sequence dataset?

A

a static array…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s the interface for a dynamic sequence dataset? What special operations exist, in why do we care about them?

A

All of the operations of a static sequence dataset plus:

-Insert_at(i,x) -> push all the items from i to i+1…n+1

remove_at(i)->push all items from i+1…n to i…n-1

special operations-
remove/add first/last:
While those are special cases of the full interface, we care about those sub-case of the full interface because we might be able to implement those efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s a natural solution to the dynamic sequence interface?

A

a linked list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s the time complexity of the following data structures for a dynamic sequences interface:

static array vs linked list

A

static array/linked list:
init- O(n)/O(n)
set/ get_at(index) - O(1) vs O(n)
iterate - O(n)
remove_at - O(n)
insert_at - O(n)

insert/delete front/last :
O(n) vs O(1) (last only for doubly linked list and if i save a pointed to the last element)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the idea behind dynamic arrays

A

-relax the contraint of a static array that it has to be the size of exactly n. Instead make it O(n) (of course bigger than n haha, we’re going to discuss 2n but many options are possible).

-basically maintain an array of size n with the n elements that i store plus extra space at the end.

-Now set_last is O(1) as long as actual length of array allocated is at least n+1. if size==n we have to allocate a new array of size O(n) (say,2*n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What dynamic arrays exist in python?

A

a python’s list is a dynamic array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

an intuitive way to remember that the sum of 1+2+…+2^K is 2^(K+1)-1

A

it’s just 111111…11 K times, this plus 1 is just 10000000000 of total size of K+1 which is 2^(K+1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How much time does the resizing operation of a dynamic array cost?

A

It’s just O(1) AMORTIZED since we pay, if we grow size to be always 2*size:

1+2+4+….+n=sum(1,log n)(2^i) = O(n)
and we do this after n operations, so the amortized total time (average time per operation)is O(1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define amortized time

A

We say that an operation takes T(n) amortized time if for k operations it takes <=K*T(n) time,

averaging over operations sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Compare the following operation(s) for, and to which interface does the operation belong

  1. Dynamic array
  2. static array
  3. linked list

get_at,set_at

A

The operation is both for static sequence and dynamic sequence

O(1) for dynamic and static array,O(n) for linked list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Compare the following operation(s) for, and to which interface does the operation belong

  1. Dynamic array
  2. static array
  3. linked list

insert_first, delete first

A

Dynamic array: O(n)
static array: O(n)
linked list: O(1)

operation is from the dynamic sequence interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Compare the following operation(s) for, and to which interface does the operation belong

  1. Dynamic array
  2. static array
  3. linked list

insert_last, delete last

A

dynamic array: O(1) amortized
static array: O(n)
linked list: O(1) (or O(n) if it’s not a doubly linked list….)

operation is from the dynamic sequence interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Compare the following operation(s) for, and to which interface does the operation belong

  1. Dynamic array
  2. static array
  3. linked list

insert_at, delete at

A

O(n) for all of them

operation is from the dynamic sequence interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

a simple dynamic array will have O(n) for the operaion “insert first”. How can I make it O(1)?

A

O(1)- amortized-

simply add another dynamic array(free space) at the ‘beginning’ of the data as well. This could be just another flipped dynamic array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Descrive the interface of a set

A

-build(I) - given iterable I, builds a set from the items

-len

-find(key)

-insert(x)

-delete(key) - remove an object with the key

iter_ord - iterate by order of keys
find_min/find_max
find_next(k) - find the object with the smallest key greater than key
find_prev

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Dumping all of the objects into an unsorted array is a simple way of implementing the set interface. How long does each of the operations take? For example, for a dynamic array for insert(x), how long, amortized, will it take?

A

O(n) for every single operation, including insert(x), and that is because i have to iterate the entire array to make sure that an element of the same key doesn’t exist (even if, amortized, the allocation of the new memory is O(1) as in the dynamic sequence interface)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Sorted dynamic array when implementing the set interface - how long does each of the operations take?

A

Build - O(nlogn)
find - O(logn)
delete/insert(x) - O(n) (I may need to move all of the objects)

find_min/find_max O(1)
find_next O(logn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In the context of sorting, what does the following words mean?

  1. Destructive

2.In place

A
  1. Destructive - overwrites the input array
  2. In place- O(1) extra space
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Describe selection sort. What's the time complexity?
Repeatedly select the largest element and swap it with the rightmost element of the array which was not yet 'sorted'... It's destructive, in place and take O(n^2)
26
What's the comparison model? (in the context of supporting set operations)
We get to ask,for any two keys, if they are equal, or if not, which is greater than the other, and get "True" or False" as a response.
27
How do I prove that within the restrictions of the comparison model, to find a key I must perform at least O(logn) operations?
For any algorithm, I'll have to use some comparison. If true, i will branch to,say, the left side of the 'tree of comparisons' within my algorithm to perform some other comparisons, or the right side of the tree if False. The leaves represent 'algorithm stopped'. Since I need to find all of the keys, and there are n keys, there must be at least n+1 leaves(1 to represent "not found"). Therefore, the minimum height of this binary decisions tree is logn meaning that I will need at least log(n) operations.
28
How did the lecturer present the idea of hashing?
-We can "solve" the problem of having to compare, by storing all of them in a huge array, of size 2^K. Instead, we can use less space (say, up to m), and use a function from k to {0...m-1}
29
In the context of hashing, what the collisions, and what are 2 solutions to this?
when h(k1)=h(k2) for two different keys (something which will happen inevitably at the space of keys is greater than the space that we're reserving...). Solutions: 1. Store multiple elements at the same locations via a dynamic storage such as a linked list/dynamic array 2. Open addressing - store the element in another location at the array (this is what's used in practice in python for example) - notoriously hard to analyze. A simple example: Linear probing: go to the next address[es] until a free space is found when inserting/deleting. When deleting by key, notice that now i have to leave some 'tombstone' so that the find algorithm will know to keep looking at consequence addresses.
30
Give an example of an easy hash function from {0...k-1} to {0..m-1}
Division hash function: k modulo m. This is essentially what python does :D
31
What's the idea behind all universal hash functions?
Not choose one,deterministic, hash function beforehand, but choose one after getting inputs from users, from a family of hash fucntions
32
Give an example of a family of functions from which we can choose, such that those create a good universal hash function
H(a,b,p)= ((a*k+b) mod p) mod k when p is a prime number, a,b

33
What does the universal global property that P(collision) give? When should I do if m (the size that i pre-allocated for the hashtable) becomes too small for n?
Let us denote Xi as the number of elements in h(Ki) E(Xi) <= 1+n/m That means that if we choose m to be linear in n, then the chain at that point will be, on average, of constant length. if m becomes too small, we allocate more space and re-build the whole data structure from scratch. It's okay since this doesn't happen "often" and takes amortized time of O(1)
34
Fodoby is a company that makes customized software tools for creative people. Their newest software, Ottoshop, helps users make collages by allowing them to overlay images on top of each other in a single document. Describe a database to keep track of the images in a given document which supports the following operations: 1. make document(): construct an empty document containing no images 2. import image(x): add an image with unique integer ID x to the top of the document 3. display(): return an array of the document’s image IDs in order from bottom to top 4. move below(x, y): move the image with ID x directly below the image with ID y Operation (1) should run in worst-case O(1) time, operations (2) and (3) should each run in worstcase O(n) time, while operation (4) should run in worst-case O(log n) time, where n is the number of images contained in a document at the time of the operation
a doubly linked list of images will be sufficient to implement 1-3. I need to move an image FAST, which implies that I need to find the images fast. maintain a sorted array of image ids as keys, with pointers to the linked list nodes ->this will ensure O(logn) as the worst case scenario.
35
How do I prove that within the restrictions of the comparison model (where I can only compare between keys and not get more insights into them), any algorithm will take at least nlogn operations?
Same argument as with finding an element, but this time I have n! different outputs, and so the height of the tree is log(n!) = omega(nlogn)
36
What's a stable sorting algorithm?
if item A appears before item B, and, when sorting, they are the same at some iteration, then the output of the sorting algorithm will output item A before item B
37
In what context is a stable algorithm important?
When we do "tuple sort" and sort based on the least significant digit for example, and then move on to a more significant digit, we don't want to ruin the work previously done.
38
Let's say that our keys are of order n^2 so we decide to use a tuple to represent each key: a= k//n b=b%n (a,b). What should I sort by first, the more significant digit (a) or the less significant digit(b) and why?
The less significant: Imagine sorting (0,2),(1,1) sorting by the more significant digit first and then the less significant will yield the output (1,1)(0,2)
39
What's the idea of radix sort?
We saw that for u=n^2 i could use two n-sized numbers (a,b). Similarly for u=n^3 i would need a tuple of size 3: a= k//n ->most significant int,it will be of size n b'=k//n -> the reminder will be of size n^2 but i saw that this i can break into two numbers In general, the idea is to break up integers max size u into a base-n tuple.
40
Explain the difference or relation between "Counting sort" and "tuple sort" from the lecture
Counting sort is an algorithm which helps us implement tuple sort. * Instead of storing a single item at each array index, store a chain, just like hashing! * For stability, chain data structure should remember the order in which items were added After going through all of the previous sequence of items (from a previous iteration), we will get in our data structure (create via 'counting sort') a new sequence which is the result of the next iteration of tuple sort.
41
What's the general structure of a node in a binary tree?
item/key, pointers to parent, left child,right child
42
Define traversal order
In order traversal order- recursively visit left subtree of a node, then the node, then right subtree of a node
43
What is the first node that appears in the traversal order of a tree?
its leftmost leaf
44
What's the successor of a node X during a traversal order of a tree? (assume x is an inner node so the question is interesting)
Either its leftmost leaf of its right subtree, or if none exists, the first ancestor which x is a left descendant of
45
I want to add a node immediately after (in traversal order) a given node x; how do I do that?
If x.right doesn't exist, just put node there, otherwise put node at x.right.first_node() (leftmost leaf)
46
Describe the algorithm to delete a node X in a tree.
0. If node is a leaf-just detach it from the tree else not leaf : 1.If node.left-swap with its immediate predecessor. Go to step 0 2. If node.right (has to be if not 1, since node isn't a leaf),swap with immediate successor, Go to step 0.
47
In the deletion algorithm in binary trees, we swapped with immediate predecessor if there's a left subtree/successor if there's a subtree to the right and assumed that it's down the tree. Why can it not be up the tree?
The existence itself of a right subtree for example,guarantees, based on traveral order, that the immediate successor of the node is going to be in the right subtree and not a paren
48
Prove that the deletion algo in binary trees is O(h)
We always go in a path down the tree, and always resume where we stopped, and we do constant amount of operations other than traversing down the tree, so at most we are going to run O(h) time
49
Implementing a set binary tree is straightforward: how?
smaller than x keys to the left of x, larger than x keys to the right of x
50
Implementing a set binary tree is straightforward. How do I implement a sequence binary search tree?
I need to support the operation element_at(index j). For that purpose, I can maintain a number of elements in each node (aka size of a node).
51
Subtree augmentation - describe preciely and give an example
Each node can store O(1) extra storage, P(node) is computable in O(1) if we know the P(node) of its children. num_elements,height,sum,product, min, max
52
What's the definition of a balanced binary tree?
h=O(logn)
53
What's the guarantee of an AVL tree after updating (insert/deleting) in terms of balance?
|h(node.right)-h(node.left)| <=1
54
How do i prove that the guarantee of AVL is a guarantee of the balance of the tree?
Look at the worst case scenario: A tree with the property that for every node x, its right subtree's height is hL+1. Let's say that the height of the tree is h. How many nodes does the tree have? N(h) = N(right tree)+N(left tree)+1= N(h-1)+N(h-2)+1 N(h) > 2*N(h-2) =>N(h)>2^(h/2) h<=2*log(n). This is the minimum number of nodes for the worst, most unbalanced tree.
55
I need to get the n biggest numbers, but I have only O(logn) memory. Describe an algorithm which does that in O(n*log(logn)) time. 3. Removing this memory constraint, what would be the best time complexity and how?
1. Build a balanced BST like AVL tree from the first logn items (time: logn*loglogn) 2. For every other item in my list, insert it, and then delete the maximum (log(logn) time for each item, n*loglogn altogether). 3. If I can use O(n) memory, I can use a max heap: Build will take O(n), popping max will take O(1) n times, so O(n) is doable (and is a tight bound since someone I have to process all of the data which takes O(n)).
56
How do I implement the following: 1. new bid(d, b) record a new bidder ID d with bid b in O(log n) time 2.update bid(d, b) update the bid of existing bidder ID d to bid b in O(log n) time 3. get revenue() return revenue from selling to the current k highest bidders in O(1) time
1. AVL on the bidders id- with pointers to the next 2 data structures 2. AVL - k highest bids 3. AVL - n-k highest bidders The AVL with the highest bidders will be augmented by the sum of the bids in that tree (so I can return the sum of the biggest k bids in O(1))
57
What operations does the priority queue interface support? It's a subset of what interface?
-init -insert(value) -find_max -delete_max We think of keys as 'priorities' This is a subset of the set interface
58
AVL trees can implement the priority queue interface at what complexity? How? Why do we care about heaps then?
a set AVL tree can solve that interface in log(n) time for insert/delete, nlogn for init, and logn for find_max. If we add a max augmentation, then we can solve find_max in O(1) time. We still can about heaps because they provide an IN PLACE sorting algorithm (yes their init time will be O(n) but that's not the main reason for our interest),eg O(1) extra memory is needed.
59
If I implemented the priority queue interface using an unsorted array, and someone magically gave me the index of the max object, say j, what's the best way to implement del_max and at what time complexity?
I'd swap the j element with the last element in the array and then pop it so it would be O(1).
60
How would I use the priority queue interface to sort n items?
-add them all to the queue -delete all =Tbuild(n)+n*T(delete_max)
61
Define the properties of a complete binary tree
2^i nodes at level i, except for the deepest level where the nodes are left-justified(nodes are filled from left to right and may not occupy all possible positions) "left-justified" text aligns tightly to the left margin.
62
How would I transform from an array representation to a complete tree and vice versa?
The transformation is unique and straightforward: The first element of the array is the node, second and third elements are the direct children of the root etc.
63
Is a complete binary tree balanced?
Yes, not only that they are balanced, but their height is ceil(logn) so logn+1 at most.
64
Are we going to represent the explicit heap in our system?
Nah, we're just going to use the array (heap is an implicit data structure).
65
How do we know the indices of the children/parent of some node with the index i in our heap/array representation
left child = 2i+1, right child = 2i+2 parent = (i-1)//2
66
Binary heap Q satisfies what conditions?
1. a complete binary tree 2. For each node, its value is greater (or equal) than that of its children (the biggest is at the top)
67
How do I insert at a heap?
The only thing we can do in an array is insert at the last position. So insert there and then 'bubble' the max (swap with parent) all the way up (or until a parent node doesn't change ie its parent is greater)
68
How do I delete at a heap?
Swap the root note's value with the right most leaf value, remove the rightmost leaf, and then bubble down (swap value with max of its two temporary sons) the potentially small value until it is a parent to two smaller (than itself) nodes or until it is a leaf
69