Data Structures and Data Types Primer Flashcards Preview

11637 Foundations of Computational Data Science > Data Structures and Data Types Primer > Flashcards

Flashcards in Data Structures and Data Types Primer Deck (31)
Loading flashcards...

How are lists stored in memory in python? When is memory reallocated?

Lists are actually variable-length arrays in Python, not linked lists. Internally they are stored as pointers to elements.


Append: When another element is appended, the list is resized to allocate additional space if needed. In order to avoid costly memory allocation operations, initially more memory is allocated.

Pop: When the last element is popped (removed), if the current list size is less than half of allocated memory, the allocated memory is shrinked.


What are the rules for Big O? (4)

  1. Add steps
  2. Don’t include constants because they’re not significant enough
  3. Different steps get different variables in the O equation

  4. Drop non-dominant terms


What's Big O notation?

Shows how time scales with respect to some input variables


For lists, what are the operations you can do and what's their big O notation? (3)


  • Append (add):  the list is resized to allocate additional space (i.e. more than is required for just the element(s) being appended) O(n)
  • Pop (remove): When the last element is popped (removed), if the current list size is less than half of allocated memory, the allocated memory is shrinked. O(n)
  • Accessing: Any element can be directly accessed by its index by calculating its pointer in memory using the offset from pointer to initial element, which makes accessing an O(1) operation


Describe Tuples

  • Tuples are immutable, which means that they can't be changed.
  • Therefore, they are hashable and can be used as keys in dictionaries 


What's good about linked lists?


Adding new elements is less expensive than with arrays (aka lists).


How are linked lists structured?

  • Each element has a pointer to the next element
  • For double linked lists: Stores pointers to each the last and next elements. 


Describe the operations performed on linked lists and their Big O notation

  • Adding to end: Need to add to the last element in the list a link to the new element. 
  • Inserting at position i: 
    • Traverse to element before where you want to insert.
    • Update that element's pointer to point to the inserted element. 
    • Update inserted element's pointer to the element that comes after
  • Delete at position i: the element at position i-1 needs to be updated to maintain the link to element i+1
  • Accessing element at index i: Traverse all elements until you get the desired element. 

access and search are O(n) operations and adding and removing elements (to and from the end) is just O(1) operation


What's one interesting thing about traversing a linked list

  • Simple linked lists can only be traversed in the forward direction.
  • Double linked lists can be traversed in either direction 


Other name for linked list



Library you can use for a linked list in python

from collections import deque


What are sets?

  • Unordered
  • Has only unique elements

set_example = {1,2,3}


Big O for checking if element exists in a set

O(1) on average and O(n) in worst case


What are sets best for?

For operations that require checking whether some value is in some set of values


What's the big O of dictionary operations?

getting, adding and deleting an item: O(1)


What can the keys and values of dictionaries be?

  • Keys: Anything hashable?
  • Values: Virtually anything?


What is hashing?

Hashing function is a function that converts some object to a string with next conditions: 

  1. different objects should have different hashes (though collisions are possible)
  2.  It should be deterministic and provide the same output for the same inputs.

Hash function is not reversible in general case.


What are trees?

  • A data structure where each element may have multiple links to other elements.
  • Each element has a parent - another element that holds a link to that element.
  • There should not be any cycles in trees


What's a library used with trees? How's it used?

from nltk import Tree

  • Initialize: t = Tree.fromstring('(Root (Child1 (Grandchild1 Grandchild2 Grandchild3) Child2))')
  • Generate graphical representation: t.draw()


How do you navigate a tree?

Recursion is one useful way



What does the number of bits allocated to store values represent?

The range of numbers that can be represented, the accuracy and memory requirements. 


How many bytes to store: 

  • Boolean
  • Char
  • String
  • Integers

  • Boolean: 1 byte
  • Char: 1, 2 or 4 bytes (based on encoding). The more characters the encoding supports, the more bytes are needed.
  • String: Allocated as a fixed size array with each element having an amount of memory allocated that would be equal to the amount of memory needed for the largest encoding. This is done since the string is an array, which requires each element to be using the same amount of memory
  • Integers: Int16 uses 2 bytes, while int32 uses 4 bytes and so on. I believe it's the same for unsigned (e.g. uint32 also uses 4 bytes)


Describe signed vs unsigned integers

  • Unsigned don't store negative values but can hold larger values than signed ints. 
  • This is because the first bit isn't used for the sign
  • They both are capable of representing the same number of values


Describe a float

Might need to add more here


How do you determine how many values signed and unsigned ints can represent? What about max vals?

Both can represent 2^n values. E.g. int16 = uint16 = 2^16

  • Max val signed = 2^(16-1) - 1
  • Max val unsigned = (2^16) - 1


How do you use numpy to get information about a specific type of float?


Which structure should be used for Ordered sequence of elements, a lot of edits? Why? 


Which structure should be used for Ordered sequence of elements, create once, read often? Why? 


Which structure should be used for Key-values pairs (one to many)? Why? 


Which structure should be used for Key-value pairs (one to one)? Why?