Search Trees Flashcards

1
Q

Ordered Data Sets

A

hashing stores items in random order
data should be ordered by key to support insert, remove, lookup
operations: lower bound, upper bound, range scan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Standard Binary Search (arrays)

A

logn steps if array is sorted
0.5 branch misses per comparison on avg for random searches
access pattern is not cache friendly
on large arrays is slower than in B-tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Branch-Free Binary Search

A

conditional branch can be transformed to data dependency
no branch misses for small arrays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Interpolation Search

A

assumes keys are uniformly distributed(distance between Elems is same/close to it) 10 13 15 26 28 30
a single, very large or very small outlier will destroy interpolation
can be combined with binary search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Linear Search

A

O(n) steps
can be improved using SIMD instructions
good locality
array doesn’t have to be sorted for point search
best solution for small arrays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Blocking Layout

A

store hot keys together in blocks
recursive structure
improves locality
allows use of SIMD instructions (k-ary search)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Growing Arrays

A

better locality than linked lists
one downsize is growth because of limited size
standard approach: exponential growth plus copy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Growing Arrays without Copying

A

64 pointers pointing to array of values
good locality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Comparison-based Search Trees (binary search trees)

A

compare one elem at time
each comparison rules out half the elems (if tree is balanced)
log2n comparisons and height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Binary Search Trees

A

each node stores one key/value pair and two child pointers
complex rebalancing logic
bad cache locality
pointer based: elem doesnt have to be moved physically during rebalancing

Adv:
if you insert an element, you just allocate a new entry and point into tree
if tree rebalanced: no need to copy, you just change pointer structure of the tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Skip List

A

elems in sorted linked list + additional shortcuts on top
probabilistic data structure (no worst case guarantees)
all items are stored in a sorted order by the key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

B+-tree

A

variant of B tree where inner nodes only store pointers, no values
leaf nodes are often chained to simplify range scan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sorted Arry

A

worst case for n elems:
lookup log2n
insert n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sorted Blocks of Fixed Size

A

split array into small sorted arrays of b elems, b is fixed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Inner Nodes

A

indicate path to leaf node that has actual data
tree height: logb(n)
comparisons: log2n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

B-tree logic decomposition

A

two parts:
1. structure-modification operations (split, merge, new root) determine shape of B tree
2. per-node operations (lower bound, insert, delete, iterate) work on nodes locally
(2) can be changed without affecting high level logic (1)

17
Q

Physical Node Layouts

A

most common layout: array of sorted key/value pairs
binary search within each node
insert/delete moves half of elems on avg(to the right)

alternatives: store keys and values separately, unsorted leaf nodes but sorted immer nodes, encode binary search tree, trie or hash table within node

18
Q

what is the best node size?

A

in main memory, node size (b) can be freely chosen
node size should be at least cache line size
optimal page size for lookup is bigger that for inserts

19
Q

B trees with variable-length data

A

bad cache locality, because need ti dereference the pointer which generally will not be in cache
lots of memory allocation
memmory fragmentation from memory allocator

20
Q

Slotted page layout

A

enables variable-lenght data in sorted order
slots decouple logical from physical ordering
slots store offset and length

21
Q

Compaction

A

deletion in slotted page leaves holes
keep holes after deletion, but keep track of how many bytes are wasted
when running out of space, one can trigger compaction if its beneficial
best implemented by copying all keys to new empty page

22
Q

Separator Choice

A

when splitting node: choose shortest key around median as separator

23
Q

Suffix Truncation of Separator

A

keys in inner nodes are only used as signposts, do not have to be stored in leaf nodes
if separator is ABCDE and next key is AXYZ, separator can be truncated to AX

24
Q

Prefix Truncation, its difficulty

A

extract common prefix between smallest and largest key on node
on lookup, compare prefix first
speeds up comparison and saves space

issue: insertion of key may cause prefix to shrink -> all keys need to be enlarged, may not fit into node anymore, causing a split
solution: store fence keys on each page

25
Q

Fence keys for prefix truncation

A

use common prefix of fence keys
those are immutable after page was created by split and merge
can be stored on page or derived on demand

26
Q

key head in slot optimization

A

with slotted page layout, binary search requires loading key for every comparison
extract or copy first 4 bytes of key into slot

27
Q

Hints

A

extract a fixed number of equi-distant heads and store them consecutively

28
Q

Textbook merge (deletion and space management in b trees)

A

to avoid underfull nodes after deletion, one must merge or rebalance
if node is underfull (<50% fill factor):
find a neighboring node
if neighboring node has low enough fill factor:
merge two nodes into one, remove separator in parent, if parent is underfull merge parent the same way

else rebalance elems between two nodes

29
Q

B*tree

A

guarantees 66% fill factor.
insert and delete rebalance between 3 neighboring nodes

30
Q

Xmerge

A

insead of balancing during insert and delete, increase fill factor in background
if fill factor is low, merge X neighboring nodes into X-1 nodes

31
Q

B-tree: Cascading Insert

A

Scenario:
* leaf node is full, have to split and insert separator into parent
* parent is also full, have to split and insert separator into its parent
* repeat…
→ Insert can cascade upwards until a new root has to be created
Alternative: eager split
* when walking the tree during insert, split full inner nodes eagerly
* thus, when hitting the root node, the parent always has space
* simpler, can be faster, but average tree fill factor a bit lower

32
Q

why is a B-Tree usually faster than a binary tree

A

Instead of storing one key and having two children, B-tree nodes have n keys and n+1 children, where n can be large. This shortens the tree (in terms of height) and requires much less disk access than a binary search tree would.

33
Q
  • in which scenario can binary search be slower than linear search?
A

if data is unsorted

34
Q

what data structure is a probabilistic alternative to a binary tree?

A

Skip lists

35
Q
  • what is the main difference between a B+
    -Tree and a classical B-Tree?
A

insert and delete rebalance between 3 neighboring nodes

36
Q
  • how many comparisons does binary search require on an array with N
    entries?
A

logN

37
Q
  • how can you speed up lookup in a B-Tree?
A

link leaf nodes together, because range queries are faster