WEEK 6 Flashcards by anna vandenappels

MultiIndex in pandas

allows multiple levels of indexing, making it easier to organize and manipulate hierarchical data.

How well did you know this?

Not at all

Perfectly

explain unstack and stack

unstack (spread it out): Turns a MultiIndex Series into a DataFrame (makes it wide).

stack (pull it together): turns a DataFrame back into a MultiIndex Series.

How well did you know this?

Not at all

Perfectly

why do we use multi index concept

it allows us to analyze complex dataset better

How well did you know this?

Not at all

Perfectly

array

states = [‘California’, ‘Texas’, ‘New York’]
years = [2000, 2010, 2000]

Here, both states and years are arrays (lists of values).

How well did you know this?

Not at all

Perfectly

tulip

A tuple is like an array, but it groups values together as one item.

(‘California’, 2000) is a tuple acting as a key in this dictionary.

How well did you know this?

Not at all

Perfectly

What is the purpose of np.concatenate() in NumPy?

It combines multiple arrays into one single array.

How well did you know this?

Not at all

Perfectly

What is pd.concat() used for?

It combines Series or DataFrames along rows (axis=0) or columns (axis=1).

How well did you know this?

Not at all

Perfectly

If you pd.concat([ser1, ser2]), what happens?

It stacks ser2 below ser1, combining them into one Series.

How well did you know this?

Not at all

Perfectly

What happens in an inner join?

Keeps only common columns between DataFrames.

How well did you know this?

Not at all

Perfectly

What happens in an outer join?

Keeps all columns; missing values are filled with NaN.

How well did you know this?

Not at all

Perfectly

Does pandas keep duplicate indices when combining DataFrames?

Yes, by default pandas keeps duplicates.

How well did you know this?

Not at all

Perfectly

What are three ways to handle duplicate indices when joining DataFrames?

Reset index (ignore_index=True)
Group using MultiIndex (keys=[‘First’, ‘Second’])
Use verify_integrity=True to stop if duplicates exist

How well did you know this?

Not at all

Perfectly

What happens when you use .append() on DataFrames?

It stacks one DataFrame on top of another, keeping the original indices unless reset.

How well did you know this?

Not at all

Perfectly

What is relational algebra in simple terms?

A set of rules for working with table-based data, used in databases like SQL.

How well did you know this?

Not at all

Perfectly

What is the purpose of pd.merge() in pandas?

It joins two tables based on a common column, like SQL joins.

How well did you know this?

Not at all

Perfectly

What are the three types of joins based on data relationships?

One-to-One: Both tables have unique keys.

Many-to-One: One table has repeated keys.

Many-to-Many: Both tables have repeated keys.

How well did you know this?

Not at all

Perfectly

What does ‘inner’ join do?

Study These Flashcards

Keeps only matching rows.

What does ‘outer’ join do?

Study These Flashcards

Keeps all rows, fills missing values with NaN.

What does ‘left’ join do?

Study These Flashcards

Keeps all rows from the first table, fills missing from the second.

What does ‘right’ join do?

Study These Flashcards

Keeps all rows from the second table, fills missing from the first.

What does aggregation mean in pandas?

Study These Flashcards

Summarizing data by applying functions like sum(), mean(), max(), etc.

What does a pivot table do?

Study These Flashcards

It groups data by a specific category (like gender or class) and applies aggregation to find totals, averages, etc.

Why use pivot tables in pandas?

Study These Flashcards

They make it easy to compare data across categories and summarize large datasets.

What helps pandas clean messy text data quickly?

Study These Flashcards

Vectorized string operations, faster than loops.

How can you turn text dates into proper date objects?

Use dateutil.parser.

What does datetime64 do in NumPy?

Enables fast date-time operations.

What can pd.eval() handle?

Arithmetic, comparisons, and boolean logic.

Why use df.eval()?

Makes it easier to reference column names directly.

Q: How do you reference Python variables inside eval()?

Use @ symbol.

What does slicing mean in data analysis?

Selecting a subset of data based on one dimension — like a few rows or one column.

What does dicing mean in data analysis?

Selecting a specific subset of data based on multiple dimensions or criteria.

What is the key difference between slicing and dicing?

Slicing → 1 dimension (rows or columns) Dicing → multiple dimensions (rows and columns or filters)

Why do we use slicing or dicing?

To extract just the data we need for analysis or to focus on specific groups or time periods.

what is more expensive: dicing or slicing?

dicing

What is a full outer join?

Keeps everything from both DataFrames, filling missing spots with NaN.

What does NaN stand for in pandas?

Not a Number

What is the use of .isna() in pandas?

Helps detect missing values.

What is considered bad data?

Empty cells, wrong formats, wrong data, or duplicates.

What is one way to fix wrong formats in pandas?

You can drop the incorrect data or convert it to the correct format.

Why is removing duplicates important?

To clean data and avoid wrong analysis or double counting.

WEEK 6 Flashcards

(42 cards)