DS Entretien 2 Flashcards
(126 cards)
Bias and Variance
Bias = underfitting (too simple)
Variance = overfitting (too complex)
Time Complexity:
looping over N elements
for i in range(n)
O(N), linear
Time Complexity:
Nested Loops
for i in range(n): for j in range(n):
O(N^2), quadratic
Time Complexity:
Sorting
- order by column (SQL)
- quick sort, merge sort
O(N log N)
Time Complexity:
Using an index
- where id = x (SQL)
O (log N), logarathmic
Space Complexity:
Single variable
O(1), linear
Space Complexity:
Array of size N
O(N), linear
Space Complexity:
2D matrix of size NxN
O(N^2), quadratic
Complexity:
Select avg(sales) over (order by month rows between 2 preceding and current row) from sales_data
- O(N log N) time complexity
- Space O(1)
Complexity:
select customer
from orders
group by customer
having count(order_id)>2;
- Group by causes extra space of O(N)
- Time complexity is O(N log N) as counting order_id is O(N) and group by is O(N log N)
Complexity:
Select rank() over (partition by department order by salary desc) as rank_salary
From employees
Where rank_salary <= 3;
- O(N log N) time, because database needs to sort the data into departments (partition by department)
- O(N) space for the storage required when partitioning
Complexity:
select tweet_id
from Tweets
where length(content)>15;
- Time complexity: O(N) full table scan
- O(1) space complexity, no extra storage beyond the query execution
Complexity:
def count_salary_categories(accounts: pd.DataFrame) -> pd.DataFrame:
low = len(accounts[accounts.income<20000])
high = len(accounts[accounts.income>50000])
avg = len(accounts) - low - high
return pd.DataFrame(
{‘category’: [‘Low Salary’,’Average Salary’,’High Salary’], ‘accounts_count’:[low,avg,high]})
- Time complexity O(N)
- Len lines scan table, filtering data O(N), although Len operation itself is O(1)
- Pd.dataframe creates 3 rows so its constant O(1)- Space complexity O(1)
- Len lines are integers (constant space) O(1)
- Dataframe is constant size too
- Space complexity O(1)
What’s a holdout set?
A holdout set is a portion of data that is separated from the training set and used to evaluate the performance of a machine learning model after it has been trained.
When to use grid search or random search for hyper parameter tuning.
Grid search tries every possible parameter combination (comp expensive). Random only tries a subset but works pretty well.
Use Grid Search when you have <100 combinations and want precision.
Use Random Search when you have hundreds or thousands of possible combinations and need efficiency
Why are insertions and deletions more efficient in linked lists than arrays?
Arrays require shifting elements when inserting/deleting from the beginning or middle, making these operations O(n). Linked lists use pointers, allowing O(1) insertion/deletion at the head since no shifting is needed. However, inserting or deleting in the middle or end still requires traversal (O(n)).
What is the time complexity of inserting an element at the head of a linked list?
O(1). The new node is created, pointed to the current head, and the head pointer is updated. No shifting is required.
What is the time complexity of inserting an element in the middle of a linked list?
O(n). The list must be traversed to find the insertion point before updating pointers.
What is the time complexity of deleting an element at the head of a linked list?
O(1). The head pointer is updated to the next node, and the old head is freed.
What is the time complexity of deleting an element in the middle of a linked list?
O(n). The list must be traversed to locate the node before updating pointers.
How does inserting in an array compare to inserting in a linked list?
Inserting at the head of an array is O(n) due to shifting, while it is O(1) in a linked list. Inserting in the middle or end of both structures is O(n) unless a tail pointer is used in a linked list, which makes end insertions O(1).
How does deleting in an array compare to deleting in a linked list?
Deleting at the head of an array is O(n) due to shifting, while it is O(1) in a linked list. Deleting in the middle or end is O(n) in both structures unless a tail pointer exists in a doubly linked list, making end deletions O(1).
How does Merge Sort work, and what is its time complexity?
Merge Sort is a divide-and-conquer algorithm that:
1. Recursively splits the array into two halves until each half has one element.
2. Merges the sorted halves back together in a sorted order.
Time Complexity:
- Best, Worst, and Average Case: O(n log n)
- Space Complexity: O(n) (due to extra space for merging)
- Stable? Yes (maintains relative order of equal elements)
None
What is Quick Sort, and why is it efficient in practice?
Quick Sort is a divide-and-conquer algorithm that:
1. Selects a pivot element.
2. Partitions the array into elements smaller and larger than the pivot.
3. Recursively sorts the subarrays.
Time Complexity:
- Best and Average Case: O(n log n)
- Worst Case: O(n²) (if the pivot is always the smallest or largest element)
- Space Complexity: O(log n) (for recursion)
- Stable? No (swaps can change the relative order of equal elements)
- Why is it fast in practice? Has good cache locality and usually outperforms Merge Sort for large datasets.
None