Panda Joins Flashcards

1
Q

What is the main library in Python for data manipulation and analysis?

A

Pandas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a join operation in pandas?

A

Join operation in pandas is a way to combine two or more dataframes based on a related column between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What function is used to perform SQL-like joins in pandas?

A

merge() function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the syntax for the merge function in pandas?

A

merge(left, right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the ‘on’ parameter in the merge function do?

A

The ‘on’ parameter specifies the column or columns on which to perform the join.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the ‘how’ parameter in the merge function do?

A

The ‘how’ parameter specifies the type of join to be performed: ‘left’, ‘right’, ‘outer’, ‘inner’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a ‘left’ join do in pandas?

A

A ‘left’ join in pandas returns all the rows from the left dataframe and the matched rows from the right dataframe. If there is no match, the result is NaN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a ‘right’ join do in pandas?

A

A ‘right’ join in pandas returns all the rows from the right dataframe and the matched rows from the left dataframe. If there is no match, the result is NaN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does an ‘inner’ join do in pandas?

A

An ‘inner’ join in pandas returns the rows that have matching values in both dataframes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does an ‘outer’ join do in pandas?

A

An ‘outer’ join in pandas returns all rows from both dataframes. If there is no match, the result is NaN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a left_on and right_on parameters in the merge function?

A

The left_on and right_on parameters allow you to specify the columns to join on if they have different names in the two dataframes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the sort parameter in the merge function do?

A

The sort parameter sorts the result dataframe by the join keys in lexicographical order. Default is False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What function is used to combine Series or DataFrame objects with a set of key(s) in pandas?

A

concat() function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the join_axes parameter in the concat function do?

A

The join_axes parameter is deprecated since pandas 0.25.0. Use .reindex or .reindex_like on the result to achieve the same functionality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the syntax for the concat function in pandas?

A

concat(objs, axis=0, join=’outer’, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the ‘axis’ parameter in the concat function do?

A

The ‘axis’ parameter specifies the axis to concatenate along. 0 is for index (rows) and 1 is for columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does the ‘ignore_index’ parameter in the concat function do?

A

The ‘ignore_index’ parameter, if True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, …, n - 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the main difference between merge and concat in pandas?

A

merge is used to combine dataframes based on a key/column, whereas concat is used to append dataframes along a particular axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the purpose of the keys parameter in the concat function?

A

The keys parameter is used to construct hierarchical index using the passed keys as the outermost level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does the ‘verify_integrity’ parameter in the concat function do?

A

The ‘verify_integrity’ parameter, if True, checks whether the new concatenated axis contains duplicates. If it does, it will raise an exception.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the join function in pandas?

A

The join function is used to combine columns of two potentially differently-indexed dataframes into a single dataframe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the syntax for the join function in pandas?

A

join(self, other, on=None, how=’left’, lsuffix=’’, rsuffix=’’, sort=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does the ‘lsuffix’ and ‘rsuffix’ parameters in the join function do?

A

The ‘lsuffix’ and ‘rsuffix’ parameters are suffixes to add to overlapping column names in the left and the right side, respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How can you merge two dataframes df1 and df2 on a column ‘id’ using an inner join?

A

df1.merge(df2, on=’id’, how=’inner’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How can you concatenate two dataframes df1 and df2 along the column axis?

A

pd.concat([df1, df2], axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can you join two dataframes df1 and df2 using the indexes?

A

df1.join(df2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When would you use merge over join in pandas?

A

When you need to combine dataframes based on a key/column rather than their index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

When would you use join over merge in pandas?

A

When you want to combine dataframes based on their index rather than a key/column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

When would you use concat over merge or join in pandas?

A

When you want to append dataframes along a particular axis (either rows or columns) rather than combining them based on a key or index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does a suffix do when merging two dataframes with overlapping column names in pandas?

A

A suffix is added to the overlapping column names to maintain their identity after the merge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How can you add a suffix when merging two dataframes df1 and df2 on a column ‘id’?

A

df1.merge(df2, on=’id’, suffixes=(‘_df1’, ‘_df2’))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What happens if you try to merge two dataframes with different shapes in pandas?

A

You can merge dataframes with different shapes. The resulting dataframe’s shape will depend on the type of join used and the data in the dataframes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How can you perform an outer join on two dataframes df1 and df2 on a column ‘id’?

A

df1.merge(df2, on=’id’, how=’outer’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the difference between a one-to-one, many-to-one, and many-to-many join in pandas?

A

In a one-to-one join, each row of the first dataframe is merged with one row of the second dataframe. In a many-to-one join, each row of the first dataframe is merged with multiple rows of the second dataframe. In a many-to-many join, multiple rows of the first dataframe are merged with multiple rows of the second dataframe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a suffix in pandas merge operation?

A

In a pandas merge operation, a suffix is a string added to the end of a column name in case of overlapping column names in two dataframes to maintain their identity.

36
Q

How to concatenate two pandas Series s1 and s2?

A

pd.concat([s1, s2])

37
Q

How to join on index using merge function in pandas?

A

You can use the left_index=True or right_index=True arguments in the merge function to join on the index.

38
Q

What does it mean if the validate argument in the merge function is set to ‘one_to_one’?

A

If the ‘validate’ argument is set to ‘one_to_one’, it checks if merge keys are unique in both left and right datasets.

39
Q

What does it mean if the validate argument in the merge function is set to ‘one_to_many’?

A

If the ‘validate’ argument is set to ‘one_to_many’, it checks if merge keys are unique in the left dataset.

40
Q

What does it mean if the validate argument in the merge function is set to ‘many_to_one’?

A

If the ‘validate’ argument is set to ‘many_to_one’, it checks if merge keys are unique in the right dataset.

41
Q

What does it mean if the validate argument in the merge function is set to ‘many_to_many’?

A

If the ‘validate’ argument is set to ‘many_to_many’, it doesn’t perform any checks.

42
Q

What happens if validation fails when performing a merge in pandas?

A

If validation fails when performing a merge in pandas, a MergeError will be raised.

43
Q

Can you perform a join operation on more than two dataframes in pandas?

A

Yes, you can perform a join operation on more than two dataframes in pandas by chaining multiple join or merge calls.

44
Q

How do you handle missing values when joining or merging dataframes in pandas?

A

Missing values resulting from a join or merge operation in pandas are filled with NaN by default. You can use methods like fillna() to handle these missing values after the operation.

45
Q

What is a cross join in pandas?

A

A cross join in pandas is a type of join that returns the Cartesian product of rows from the dataframes being joined. In other words, it combines every row of the first dataframe with every row of the second dataframe.

46
Q

How to perform a cross join in pandas?

A

You can perform a cross join in pandas by adding a temporary key column to each dataframe and merging on this column. After the merge, you can drop the temporary key column.

47
Q

What happens if there are no common columns to perform a merge on in pandas?

A

If there are no common columns to perform a merge on in pandas, you can perform a cross join. This combines every row from the first dataframe with every row from the second dataframe.

48
Q

How to merge two dataframes df1 and df2 using both index and columns in pandas?

A

You can use the merge function with both left_on/right_on and left_index/right_index arguments. For example: df1.merge(df2, left_on=’column1’, right_on=’column2’, left_index=True, right_index=True).

49
Q

What is the difference between the merge, join, and concat functions in pandas in terms of performance?

A

The performance of merge, join, and concat functions in pandas can vary depending on the size of the dataframes and the type of operation. In general, merge can be faster for many-to-one and many-to-many relationships, while concat can be faster for one-to-one and one-to-many relationships.

50
Q

How can you sort the result of a merge operation in pandas?

A

You can sort the result of a merge operation in pandas by setting the ‘sort’ parameter to True in the merge function or by using the sort_values method after the merge.

51
Q

What is the use of the indicator argument in the merge function in pandas?

A

The indicator argument in the merge function in pandas adds a special column ‘_merge’ to the resulting dataframe that indicates the source of each row. If True, it shows whether the row comes from the ‘left_only’, ‘right_only’, or ‘both’.

52
Q

What are the possible values of the _merge column when the indicator argument is set to True in the merge function in pandas?

A

The possible values of the _merge column are ‘left_only’, ‘right_only’, and ‘both’ when the indicator argument is set to True in the merge function in pandas.

53
Q

What is the copy argument in the merge function in pandas?

A

The copy argument in the merge function in pandas, if set to False, avoids copying the data into the resulting dataframe. By default, it’s set to True which means the data is copied.

54
Q

What is the purpose of the suffixes argument in the merge function in pandas?

A

The suffixes argument in the merge function in pandas is used to specify the suffixes to append to overlapping column names in the left and the right dataframe.

55
Q

What is the difference between a full join and an inner join in pandas?

A

A full join (or outer join) in pandas returns all the rows from both dataframes, filling with NaN where there are no matching join variables. An inner join returns only the rows where there is a match in both dataframes.

56
Q

What is a semi-join in pandas?

A

A semi-join in pandas is a join that returns only the rows from the left dataframe where there is a match with the right dataframe. It’s not a built-in function, but it can be implemented using a combination of merge and isin functions.

57
Q

What is an anti-join in pandas?

A

An anti-join in pandas is a join that returns only the rows from the left dataframe where there is no match with the right dataframe. It’s not a built-in function, but it can be implemented using a combination of merge and isin functions.

58
Q

How can you implement a semi-join between two dataframes df1 and df2 on a column ‘id’ in pandas?

A

df1[df1[‘id’].isin(df2[‘id’])]

59
Q

How can you implement an anti-join between two dataframes df1 and df2 on a column ‘id’ in pandas?

A

df1[~df1[‘id’].isin(df2[‘id’])]

60
Q

What is a self-join in pandas?

A

A self-join in pandas is a join where a dataframe is joined with itself.

61
Q

How can you perform a self-join in pandas?

A

You can perform a self-join in pandas by using the merge function and passing the same dataframe as both the left and right arguments. You would also use a suffix to distinguish between the two instances of the dataframe.

62
Q

How to specify multiple columns for joining in pandas merge function?

A

You can pass a list of column names to the ‘on’, ‘left_on’, or ‘right_on’ arguments in the merge function to specify multiple columns for joining.

63
Q

What is a natural join in pandas?

A

A natural join in pandas is a type of join that is based on all the columns with the same name in the two dataframes. It can be performed using the merge function without specifying the ‘on’, ‘left_on’, or ‘right_on’ arguments.

64
Q

What does the suffix ‘_x’ and ‘_y’ indicate in the result of a merge operation in pandas?

A

The suffix ‘_x’ indicates that the column comes from the left dataframe and ‘_y’ indicates that the column comes from the right dataframe in the result of a merge operation in pandas.

65
Q

How can you perform a merge operation without adding any suffixes to the overlapping column names in pandas?

A

By setting the ‘suffixes’ argument to (‘’, ‘’) in the merge function, you can perform a merge operation without adding any suffixes to the overlapping column names in pandas.

66
Q

How can you combine the merge and groupby functions in pandas?

A

You can combine the merge and groupby functions in pandas by first performing a merge and then applying a groupby operation on the resulting dataframe.

67
Q

What is the purpose of the ‘level’ argument in the join function in pandas?

A

The ‘level’ argument in the join function in pandas specifies the level(s) of index on which to join. This is applicable to MultiIndex dataframes.

68
Q

How to join two dataframes df1 and df2 on a shared index in pandas?

A

df1.join(df2, how=’inner’)

69
Q

What does ‘lsuffix’ mean in pandas?

A

lsuffix’ in pandas is a parameter used in join operations where there are overlapping columns in the dataframes. It refers to the suffix added to the overlapping columns from the left dataframe.

70
Q

What does ‘rsuffix’ mean in pandas?

A

rsuffix’ in pandas is a parameter used in join operations where there are overlapping columns in the dataframes. It refers to the suffix added to the overlapping columns from the right dataframe.

71
Q

How do you join two dataframes df1 and df2 with overlapping columns in pandas?

A

You can use the join method with ‘lsuffix’ or ‘rsuffix’ parameters to join two dataframes with overlapping columns. For example: df1.join(df2, lsuffix=’_df1’, rsuffix=’_df2’)

72
Q

What is the difference between pandas merge and SQL join?

A

While pandas merge and SQL join are used for similar purposes (combining data based on a common column), they have some differences. For example, pandas merge works directly on dataframes, while SQL join works on tables within a database. Additionally, the syntax and usage are different due to the difference in languages.

73
Q

What is a composite key?

A

A composite key is a key that consists of multiple columns, used for joining, merging or indexing in pandas.

74
Q

What is the purpose of ‘sort’ in pandas merge?

A

If ‘sort’ is set to True in pandas merge, it sorts the output dataframe by the join keys in lexicographical order.

75
Q

How do you merge two dataframes df1 and df2 on multiple keys in pandas?

A

You can merge two dataframes on multiple keys by passing a list of column names to the ‘on’ argument in the merge function. For example: df1.merge(df2, on=[‘key1’, ‘key2’])

76
Q

What is pandas.concat used for?

A

pandas.concat is used to concatenate two or more pandas objects along a particular axis.

77
Q

What is a Cartesian product?

A

In the context of pandas, a Cartesian product is the result of a cross join. It combines each row of the first dataframe with each row of the second dataframe.

78
Q

How do you handle overlapping indices when concatenating pandas objects?

A

When concatenating pandas objects, if there are overlapping indices, you can use the ‘ignore_index’ argument to reset the index in the resulting object. If you want to keep the original indices, you can use the ‘keys’ argument to create a hierarchical index.

79
Q

What is the default join type in pandas.concat?

A

The default join type in pandas.concat is ‘outer’, which means it includes all indices from all the objects being concatenated.

80
Q

What is the effect of ‘keys’ in pandas.concat?

A

keys’ in pandas.concat is used to create a hierarchical index for the resulting object. It takes a list of values, each representing a level in the hierarchy.

81
Q

What does ‘copy’ do in pandas.concat?

A

If ‘copy’ is set to False in pandas.concat, it avoids copying data into the new object if possible. By default, it’s set to True, which means the data is always copied.

82
Q

What is the difference between append and concat in pandas?

A

While both append and concat in pandas can be used to add rows to a dataframe, concat is more versatile as it can also add columns (by changing the axis parameter), concatenate multiple objects at once, and create a hierarchical index.

83
Q

What does ‘verify_integrity’ do in pandas.concat?

A

If ‘verify_integrity’ is set to True in pandas.concat, it checks whether the new concatenated index contains duplicates. If it does, it will raise a ValueError.

84
Q

What does ‘ignore_index’ do in pandas.concat?

A

If ‘ignore_index’ is set to True in pandas.concat, it resets the index in the resulting object. If it’s set to False, the original indices are kept.

85
Q

How do you join two dataframes df1 and df2 on a shared multi-index in pandas?

A

df1.join(df2, how=’inner’, on=[‘index1’, ‘index2’])