Pandas Flashcards

(40 cards)

1
Q

Drop Columns

A

df.drop(columns=[‘Column1’, ‘Column2’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pivots

A

df.pivot(columns=’var’, values=’val’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sort

A

df.sort_values(‘column1’)

Order rows by values of a column (low to high).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Rename Columns

A

df.rename(columns = {‘y’:’year’})

Rename the columns of a DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Head

A

df.head(n)

Select first n rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tail

A

df.tail(n)

Select last n rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Using Query

A
query() allows Boolean expressions for filtering
rows.
df.query('Length > 7')
df.query('Length > 7 and Width < 8')
df.query('Name.str.startswith("abc")',
engine="python")
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Select rows 10-20.

A

df.iloc[10:20]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Select columns in positions 1, 2 and 5 (first

column is 0).

A

df.iloc[:, [1, 2, 5]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Access single value by index

A

df.iat[1, 2]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Access single value by label

A

df.at[4, ‘A’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Select rows meeting logical condition, and only the specific columns .

A

df.loc[df[‘a’] > 10, [‘a’, ‘c’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Append rows of DataFrames

A

pd.concat([df1,df2])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Append columns of DataFrames

A

pd.concat([df1,df2], axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Gather columns into rows.

A

pd.melt(df)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Logic in Python (and pandas)

A
< Less than 
!= Not equal to
> Greater than 
df.column.isin(values)  Group membership
== Equals 
pd.isnull(obj) Is NaN
<= Less than or equals pd.notnull(obj) Is not NaN
>= Greater than or equals &,|,~,^,df.any(),df.all() Logical and, or, not, xor, any, all
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Count number of rows with each unique value of variable

A

df[‘w’].value_counts()

18
Q

Tuple of # of rows, # of columns in DataFrame.

19
Q

of distinct values in a column.

A

df[‘w’].nunique()

20
Q

Basic descriptive and statistics for each column (or GroupBy).

A

df.describe()

21
Q

Drop rows with any column having NA/null data.

22
Q

Replace all NA/null data with value

A

df.fillna(value)

23
Q

Compute and append one or more new columns.

A

df.assign(Area=lambda df: df.Length*df.Height)

24
Q

Return a GroupBy object, grouped

by values in column named “col”

A

df.groupby(by=”col”)

size()
Size of each group.
agg(function)
Aggregate group using function

25
Histogram for each column
df.plot.hist()
26
Scatter chart using pairs of points
df.plot.scatter(x='w',y='h')
27
Merge: | Join matching rows from bdf to adf.
pd.merge(adf, bdf, | how='left', on='x1')
28
Merge: | Join matching rows from adf to bdf. (Right join)
pd.merge(adf, bdf, | how='right', on='x1')
29
Merge: | Join data. Retain only rows in both sets.
pd.merge(adf, bdf, | how='inner', on='x1')
30
Merge: | Join data. Retain all values, all rows.
pd.merge(adf, bdf, | how='outer', on='x1')
31
Filtering Joins: All rows in adf that have a match in bdf.
adf[adf.x1.isin(bdf.x1)]
32
Filtering Joins: All rows in adf that do not have a match in bdf.
adf[~adf.x1.isin(bdf.x1)]
33
Set-like Operations:
pd.merge(ydf, zdf) Rows that appear in both ydf and zdf (Intersection). pd.merge(ydf, zdf, how='outer') Rows that appear in either or both ydf and zdf (Union). ``` pd.merge(ydf, zdf, how='outer', indicator=True) .query('_merge == "left_only"') .drop(columns=['_merge']) Rows that appear in ydf but not zdf (Setdiff). ```
34
Creating Dataframes: Specify values for each column.
``` df = pd.DataFrame( {"a" : [4, 5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = [1, 2, 3]) ```
35
Creating Dataframes: Specify values for each row.
``` df = pd.DataFrame( [[4, 7, 10], [5, 8, 11], [6, 9, 12]], index=[1, 2, 3], columns=['a', 'b', 'c']) ```
36
Creating Dataframes: Create DataFrame with a MultiIndex
``` df = pd.DataFrame( {"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = pd.MultiIndex.from_tuples( [('d’, 1), ('d’, 2), ('e’, 2)], names=['n’, 'v'])) ```
37
Read csv with Pandas
pd.read_csv(“ruta.csv”, index =FALSE)
38
Alternative way of creatina a Pivot table
pd.pivot_table(df, values= 0, index=[‘col 1’], columns=[‘col2’], aggfunc =np.sum)
39
Save df as csv
df.to_csv(“filename”, index=False)
40
Save as XLSX
with pd.ExcelWriter(“file_name”) as writer: df. to_excel(writer,sheet_name=“name”,index =false) df2. to_excel(writer, sheet_name=“name2”, index=false)