Pandas Basics Flashcards

(51 cards)

1
Q

Import library

pandas

A
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Import a csv into data frame

pandas

A
file = "file.csv"
df = pd.read_csv(file)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Export a data frame to csv

pandas

A
df.to_csv("file.csv", sep = "|", index = F
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Creating a data frame from a list of lists

pandas

A
data = [[1, 2, "A"], [3, 4, "B"]]
df = pd.DataFrame(data, 
           columns = ["col1", "col2", "col3"])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Creating a data frame from a dictionary

pandas

A
data = {'col1': [1, 2], 
        'col2': [3, 4], 
        'col3': ["A", "B"]}

df = pd.DataFrame(data=data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Get number of rows and columns in a data frame

pandas

A
df.shape
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Viewing top n rows

pandas

A
df.head(n)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Displaying data type of columns

pandas

A
df.dtypes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Modifying the data type of a column

pandas

A
df["col1"] = df["col1"].astype(np.int8)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Display missing value stats and data type

pandas

A
df.info()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Print descriptive stats

pandas

A
df.describe()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Filling missing values with a specific value

pandas

A
df.fillna(0, inplace = True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Combining data frames: join (merge)

pandas

A
pd.merge(df1, df2, on = "col3")
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sorting a data frame

pandas

2 alternatives

A
df.sort_values("col1"))
df.sort_values(by='Sales', ascending=False)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Grouping a data frame

pandas

2 alternatives

A
df.groupby('Region')['Sales'].mean()
df.groupby("col3").agg({"col1":sum, "col2":max})
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Renaming columns

pandas

A
df.rename(columns = {"col_A":"col1"})
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Deleting columns

pandas

A
df.drop(columns = ["col1"])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Adding columns (addition method)

pandas

A
df["col3"] = df["col1"] + df["col2"]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Adding columns (assingment method)

pandas

A
df = df.assign(col3 = df["col1"] + df["col2"])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Filtering rows: boolean method

pandas

A
dfx[['b', 'c']]
df[df["col2"] > 5]
df[(df['Region'] == 'North') & (df['Sales'] > 100)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Filtering rows: from list

pandas

A
filter_list = ["A", "C"]
df[df["col3"].isin(filter_list)]
22
Q

Filtering by position

pandas

A
dfx.iloc[1] #Select single row
dfx.iloc[:,1] #Select single column
dfx.iloc[1,1] #Select single cell
dfx.iloc[:2,:2] #Select group of cells
dfx.iloc[1:,1:] #Select group of cells
23
Q

Filtering: selecting by index

pandas

A
dfx.loc[1] #Select single row
dfx.loc[:,'b'] #Select single column
dfx.loc[1,'b'] #Select single cell
dfx.loc[:2,['b', 'c']]  #Select group of cells



dfx.loc['hola'] #Select single row
dfx.loc[:,'c'] #Select single column
dfx.loc['hola','b'] #Select single cell
dfx.loc[:'hola',['b', 'c']]  #Select group of cells


data.loc[data['condition']]
24
Q

Set/reset index

pandas

A
dfx.set_index('d', inplace=True)
dfx.reset_index()
25
Finding unique values (list, count) | pandas
``` df["col3"].unique() df["col3"].nunique() ```
26
Apply a function to a data frame | pandas
``` def add_cols(row): return row.col1 + row.col2 df["col3"] = df.apply(add_cols, axis=1) ```
27
Apply a function to a single column | pandas
``` def square_col(num): return num**2 df["col3"] = df.col1.apply(square_col) OR data['new_column'] = data['old_column'].apply(lambda x: x * 2) ```
28
Mark duplicated rows | pandas
``` df.duplicated(keep=False) ```
29
Drop duplicated rows | pandas
``` df.drop_duplicates() ```
30
Frequency distribution | pandas
``` df.value_counts("col2") ```
31
Reset the index, drop the old index
``` print(df.reset_index()) df.reset_index(drop=True) ```
32
Crosstbulation | pandas
``` pd.crosstab(df.col1, df.col2) ```
33
Pivoting a dataset (to wide format) | pandas
``` pd.pivot_table(df, index = ["Name"], columns=["Subject"], values='Marks', fill_value=0) ```
34
Get the type of an object | pandas
``` type(df) ```
35
Drop rows with missing values | pandas
``` df.dropna() ```
36
Apply a lambda function | pandas
``` df['Sales'].apply(lambda x: x * 2) ```
37
Combining data frames: append | pandas
``` df2 = pd.concat([df, df]) ```
38
Get number of row and columns
``` df.shape ```
39
Delete a dataframe
``` del df del(df) ```
40
Add a caption to a dataframe
caption = 'This is a caption' df.style.set_caption(caption)
41
Import from Excel
# From Excel data = pd.read_excel('data.xlsx')
42
Import from SQL
import sqlite3 conn = sqlite3.connect('database.db') data = pd.read_sql_query('SELECT * FROM table_name', conn)
43
Drop rows with missing values
data.dropna()
44
Trim outliers
Q1 = data['column'].quantile(0.25) Q3 = data['column'].quantile(0.75) IQR = Q3 - Q1 data = data[(data['column'] >= Q1 - 1.5 * IQR) & (data['column'] <= Q3 + 1.5 * IQR)]
45
Save data to csv
data.to_csv('processed_data.csv', index=False)
46
Manipulate dates
data['date_column'] = pd.to_datetime(data['date_column']) data['month'] = data['date_column'].dt.month
47
Merging dataframes
merged_data = pd.concat([data1, data2], axis=0)
48
Pivot table
pd.pivot_table(data, values='value', index='category', columns='date', aggfunc=np.sum)
49
Random sample of data
sample = data.sample(n=100)
50
Merging data frame based on common column
merged_df = pd.merge(df1, df2, on='ID')
51
Joining based on index
result = df1.join(df2)