Data structures Flashcards

1
Q

List

A

ordered collection of nodes/items/elements

key property: ordering

operations:

push: adds element to end of list
pop: removes last element of the list
shift: removes first element of the list
unshift: adds an element to the front of the list
insert: adds an element at a given position
remove: removes the element at a given position
iterate: returns the items in order

also: sorting, searching, copying, joining, splitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Stack

A

LIFO (last-in first-out) list

objects added (pushed) and removed (popped) from the same end of the list

use array or linked-list

linked list: add and remove in constant time

array: use int sp to track index at top of stack
resizing array: increase the capacity of the stack if it is full (doubling) / decrease the capacity if it is too big (quarter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Amortisation / Performance of push (stack)

A

geometric series:
T (N ) = Nc + (4c + 8c + · · · + (N − 1)c/2 + (N − 1)c )
first Nc is the cost of writing N elements
rest is for copying to new arrays
time for copying is 2Nc - 6c (geometric series)
T(N) = 3Nc - 6c for this N
T(N) ≤ 3Nc - 6c for all N, so
T(N) = O(N) for N pushes
a push takes O(N)/N = Θ(1) time, on average

push time is AMORTISED Θ(1) (NOT same as Θ(1))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Amortisation

A

amortized analysis considers a sequence of operations
cost of expensive operations is ‘amortized’ across the sequence…
can never “be in debt”

pick a value to ‘pay’ per operation - show that this value covers all costs (never in debt)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Resizable array

A

compared with normal bounded-size array:
average insert time Θ(1) for both
actual time for inserting objects is higher (3c v c)
some very expensive inserts as N increases
keep benefits over linked list: random access, easy implementation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Queue

A

a list

next object removed

  • earliest on added (FIFO)
  • one with highest priority (Priority Queue)

does not support indexed access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Priority Queue

A

objects have keys
highest/lowest key is always at front

does not support indexed access

key property: maintain order within each branch
highest or lowest key at the root

add: O(N)
choose a short branch to keep longest branch as short as possible; start at the bottom and find the root

search: O(N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Heap

A

a tree in an array

when adding: choose shortest branch
restore the shape then restore the order

always remove object at the root
restore shape then restore order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Binary heap

A

O(log2 N) for add and remove one object

height is Θ(log2 N)

time limited by the height of the heap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Set

A

unordered collection of objects each having a unique key

put (Add)
retrieve (Get)
by unique key
key is id for the data; need to compare keys, so must define < and =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Binary Search Tree

A

T.root is a pointer to the root node
at every node n in the tree..
n.left points to the root node of n’s left subtree
n.right points to the root node of n’s right subtree
can access key found at n, e.g. n.data.key

add: new key is always added as a leaf node - start at the root, move down, comparing against each key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Red-black tree

A

extension of binary search trees
RBT’s are self-balancing BST’s
(BST search O(N) in worst case

red-black tree, 5 properties

  1. all nodes (including NILs) are either red or black
  2. the root node is black
  3. every leaf (all NIL) is black
  4. both children of a red node are black
  5. all paths from a node to a descendant leaf contain the same number of black nodes

have a limited height: O(log2 N)

put is much more complicated
new node always colored red, recolor and problem will move closer to the root

restoration procedure works up from leaf towards root
at each node performs constant theta action

get: O(log2 N)
put: O(log2 N) time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hash table

A

index data by key (indirectly)
like an array with m positions

hash function h to turn key into index
deterministic: key k always has same result

an object with key k is stored at T[h(k)]

time taken by h depends only on k

general set of possible keys is large if not unlimited

limitations, hash table does not support:
in order iteration
next key/object
minimum key
maximum key
because objects are stored in random order (by design)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Perfect hashing

A

every key is mapped to a different value ie no collisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Uniform distribution (hashing)

A

if keys are not uniformly distributed, hashing can be very poor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Collisions

A

hashing: all objects whose key hashes to h(k) are placed into a linked list
table contains a pointer to the list

17
Q

Chaining

A

resolve collisions by adding bucket at location

put: Θ(1) time
unaffected by uniformity of hashing

get: O(N) (worst case is all elements in one chain, O(N)
expected time is Θ(N/m)
N/m is called the load factor

SUHA
expected time for an unsuccessful search for key k in a hash table with m slots, containing N objects
expected length of chain: N/m
probability of needing to search chain at T[I] is 1/m
expected number of comparisons (including loop termination) is 1 + N/m
expected time complexity: Θ(N/m)
reasoning similar for successful get Θ(N/m)

if m is proportional to N, expected time for GET is amortized Θ(1)
design of table needs to ensure N/m is Θ(1)
as N increases, table needs to be resized (geometrically) - number of buckets is increased to keep N/m below some constant
all existing data has to be rehashed

18
Q

Probing

A

probe until find an empty position

simplest: linear: consecutive slots are probed beginning with h(k) up to h(k) - 1

Uniform Hashing: implies every permutation is possible

a hash function h produces uniform hashing for a table with m buckets if, for each possible key k, the probability that ⟨h(k, 0), . . . , h(k, m − 1)⟩ is some permutation p of ⟨0,…,m−1⟩ is the same for all such p

linear probing DOES NOT produce uniform hashing

assuming uniform hashing, the expected number of keys compared when inserting an object depends on the load factor N/m
each probe is to a random bucket with probability N/m it is occupied
is m proportional to N, expected time for Put and Get is amortized constant

19
Q

Simple Uniform Hashing Assumption

A

With reference to chaining

Given a hash table T with m buckets, using hash function h, the simple uniform hashing assumption (SUHA) states that each different key k is equally likely to hash into any of the buckets. So the probability that h(k) = I, for all 1 ≤ I ≤ m, is 1/m

most probable / expected outcome: same number in each bucket

expected time for an unsuccessful search for key k in a hash table with m slots, containing N objects
expected length of chain: N/m
probability of needing to search chain at T[I] is 1/m
expected number of comparisons (including loop termination) is 1 + N/m
expected time complexity: Θ(N/m)
reasoning similar for successful get Θ(N/m)

20
Q

Uniform Hashing

A

with reference to probing

a hash function h produces uniform hashing for a table with m buckets if, for each possible key k, the probability that ⟨h(k, 0), . . . , h(k, m − 1)⟩ is some permutation p of ⟨0,…,m−1⟩ is the same for all such p

linear probing DOES NOT produce uniform hashing

assuming uniform hashing, the expected number of keys compared when inserting an object depends on the load factor N/m
each probe is to a random bucket with probability N/m it is occupied
is m proportional to N, expected time for Put and Get is amortized constant