WEEK 6 Flashcards
(42 cards)
MultiIndex in pandas
allows multiple levels of indexing, making it easier to organize and manipulate hierarchical data.
explain unstack and stack
unstack (spread it out): Turns a MultiIndex Series into a DataFrame (makes it wide).
stack (pull it together): turns a DataFrame back into a MultiIndex Series.
why do we use multi index concept
it allows us to analyze complex dataset better
array
states = [‘California’, ‘Texas’, ‘New York’]
years = [2000, 2010, 2000]
Here, both states and years are arrays (lists of values).
tulip
A tuple is like an array, but it groups values together as one item.
(‘California’, 2000) is a tuple acting as a key in this dictionary.
What is the purpose of np.concatenate() in NumPy?
It combines multiple arrays into one single array.
What is pd.concat() used for?
It combines Series or DataFrames along rows (axis=0) or columns (axis=1).
If you pd.concat([ser1, ser2]), what happens?
It stacks ser2 below ser1, combining them into one Series.
What happens in an inner join?
Keeps only common columns between DataFrames.
What happens in an outer join?
Keeps all columns; missing values are filled with NaN.
Does pandas keep duplicate indices when combining DataFrames?
Yes, by default pandas keeps duplicates.
What are three ways to handle duplicate indices when joining DataFrames?
- Reset index (ignore_index=True)
- Group using MultiIndex (keys=[‘First’, ‘Second’])
- Use verify_integrity=True to stop if duplicates exist
What happens when you use .append() on DataFrames?
It stacks one DataFrame on top of another, keeping the original indices unless reset.
What is relational algebra in simple terms?
A set of rules for working with table-based data, used in databases like SQL.
What is the purpose of pd.merge() in pandas?
It joins two tables based on a common column, like SQL joins.
What are the three types of joins based on data relationships?
One-to-One: Both tables have unique keys.
Many-to-One: One table has repeated keys.
Many-to-Many: Both tables have repeated keys.
What does ‘inner’ join do?
Keeps only matching rows.
What does ‘outer’ join do?
Keeps all rows, fills missing values with NaN.
What does ‘left’ join do?
Keeps all rows from the first table, fills missing from the second.
What does ‘right’ join do?
Keeps all rows from the second table, fills missing from the first.
What does aggregation mean in pandas?
Summarizing data by applying functions like sum(), mean(), max(), etc.
What does a pivot table do?
It groups data by a specific category (like gender or class) and applies aggregation to find totals, averages, etc.
Why use pivot tables in pandas?
They make it easy to compare data across categories and summarize large datasets.
What helps pandas clean messy text data quickly?
Vectorized string operations, faster than loops.