WEEK 6 Flashcards

(42 cards)

1
Q

MultiIndex in pandas

A

allows multiple levels of indexing, making it easier to organize and manipulate hierarchical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

explain unstack and stack

A

unstack (spread it out): Turns a MultiIndex Series into a DataFrame (makes it wide).

stack (pull it together): turns a DataFrame back into a MultiIndex Series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

why do we use multi index concept

A

it allows us to analyze complex dataset better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

array

A

states = [‘California’, ‘Texas’, ‘New York’]
years = [2000, 2010, 2000]

Here, both states and years are arrays (lists of values).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

tulip

A

A tuple is like an array, but it groups values together as one item.

(‘California’, 2000) is a tuple acting as a key in this dictionary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of np.concatenate() in NumPy?

A

It combines multiple arrays into one single array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is pd.concat() used for?

A

It combines Series or DataFrames along rows (axis=0) or columns (axis=1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If you pd.concat([ser1, ser2]), what happens?

A

It stacks ser2 below ser1, combining them into one Series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens in an inner join?

A

Keeps only common columns between DataFrames.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens in an outer join?

A

Keeps all columns; missing values are filled with NaN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Does pandas keep duplicate indices when combining DataFrames?

A

Yes, by default pandas keeps duplicates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are three ways to handle duplicate indices when joining DataFrames?

A
  • Reset index (ignore_index=True)
  • Group using MultiIndex (keys=[‘First’, ‘Second’])
  • Use verify_integrity=True to stop if duplicates exist
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens when you use .append() on DataFrames?

A

It stacks one DataFrame on top of another, keeping the original indices unless reset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is relational algebra in simple terms?

A

A set of rules for working with table-based data, used in databases like SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the purpose of pd.merge() in pandas?

A

It joins two tables based on a common column, like SQL joins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the three types of joins based on data relationships?

A

One-to-One: Both tables have unique keys.

Many-to-One: One table has repeated keys.

Many-to-Many: Both tables have repeated keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does ‘inner’ join do?

A

Keeps only matching rows.

18
Q

What does ‘outer’ join do?

A

Keeps all rows, fills missing values with NaN.

19
Q

What does ‘left’ join do?

A

Keeps all rows from the first table, fills missing from the second.

20
Q

What does ‘right’ join do?

A

Keeps all rows from the second table, fills missing from the first.

21
Q

What does aggregation mean in pandas?

A

Summarizing data by applying functions like sum(), mean(), max(), etc.

22
Q

What does a pivot table do?

A

It groups data by a specific category (like gender or class) and applies aggregation to find totals, averages, etc.

23
Q

Why use pivot tables in pandas?

A

They make it easy to compare data across categories and summarize large datasets.

24
Q

What helps pandas clean messy text data quickly?

A

Vectorized string operations, faster than loops.

25
How can you turn text dates into proper date objects?
Use dateutil.parser.
26
What does datetime64 do in NumPy?
Enables fast date-time operations.
27
What can pd.eval() handle?
Arithmetic, comparisons, and boolean logic.
28
Why use df.eval()?
Makes it easier to reference column names directly.
29
Q: How do you reference Python variables inside eval()?
Use @ symbol.
30
What does slicing mean in data analysis?
Selecting a subset of data based on one dimension — like a few rows or one column.
31
What does dicing mean in data analysis?
Selecting a specific subset of data based on multiple dimensions or criteria.
32
What is the key difference between slicing and dicing?
Slicing → 1 dimension (rows or columns) Dicing → multiple dimensions (rows and columns or filters)
33
Why do we use slicing or dicing?
To extract just the data we need for analysis or to focus on specific groups or time periods.
34
what is more expensive: dicing or slicing?
dicing
35
What is a full outer join?
Keeps everything from both DataFrames, filling missing spots with NaN.
36
What does NaN stand for in pandas?
Not a Number
37
What is the use of .isna() in pandas?
Helps detect missing values.
38
What is considered bad data?
Empty cells, wrong formats, wrong data, or duplicates.
39
What is one way to fix wrong formats in pandas?
You can drop the incorrect data or convert it to the correct format.
40
Why is removing duplicates important?
To clean data and avoid wrong analysis or double counting.
41
42