Pandas_JAS Flashcards

(51 cards)

1
Q

How to refer to a single column in a DF

A

DFname[‘columnname’].head() ##head() by default gives first 5 rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Technically, a single column is a ______ not a ______

A

Series; DataFrame

A Series is part of a DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

After typing code into the ______, highlight the code of interest and hit ____ to sent it to the ________

A

Editor
F9
REPL (console)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Normally you to install third party libraries with a tool like _____, but if you’re using ________ it comes with Pandas installed

A

pip

Anaconda Python Bundle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When importing Pandas the convention is to name it ______

A

pd.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To load a csv file:

DFname = pd._______(_______,(DATA_DIR, ‘filename.csv’))

A

read_csv
path.join

DATA_DIR is a variable where you have given the path to your file, e.g., DATA_DIR = ‘/Users/UserName/PythonDirectory’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What result does this give:

type(DFname)

A

pandas.core.frame.DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what method is called to give (by default) the first 5 rows of a DataFrame?

A

head()

DFname.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

head() is a method because you can pass it the number of rows to print, ________ are used without passing any data in parenthesis

A

attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

the attribute _______ returns the names of each column in the DF

A

columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

the attribute ______ returns the number of rows and columns in the DF

A

shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

______ will turn any Series into a one-column DF

A

to_frame()

DFname.[‘columnname’].to_frame().head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

to refer to multiple columns in a DF, you pass it a ______, and the result is _______

A

list
a DataFrame

DFname[[‘col1’, ‘col2’, ‘col3’]].head()
when working with multiple columns you must use double brackets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

An index is a built in column of ________. If you don’t designate a column as a specific index, the default is a ________.

A

row IDs

series of numbers starting at zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Indexes can be _______

A

Any type of data (strings, dates, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to assign an index to a DF?

A

DFname.set_index(‘columnname’)

This creates a copy of the DF with this index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What must be passed to set_index() to create the index on the original DF?

A

Inplace = True

DFname.set_index(‘columnname’, inplace=True)

Or overwrite DFname
DFname = DFname.set_index(‘columnname’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Most DataFrame methods return copies unless ________ is explicity included

A

Inplace = True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The opposite of set_index() is ______

A

Reset_index()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to sort a DF?

A

DFname.sort_value(‘columnname’, ascending = False, inplace = True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to add a column from another DF?

A

NewDF [‘columnname’] = DFname[‘columnname’]

If the indexes are the same (pg 56 FantasyFootball), similar to cutting a excel column from one wookbook to another - the data has to match the row to which it belongs

22
Q

How to write to csv

A

to_csv()

DFname.to_csv(path.join(DATA_DIR, ‘filename.csv’), sep = ‘|’, index = false)

Sep = is separator, in this case a pipe |

Use index = True to include index in output

23
Q

Three primary column types

A

Number
String
Boolean

24
Q

Add a column with a value or math

A

DFname[‘newcolumn’] = 4
Or
DFname = DFname[‘column2’] * 8

DFname[[’newcolumn’,’column2’]]

25
______ is a library for math
Numpy | Import numpy as np
26
_____ method to return random rows
sample() DFname.sample(5) Returns a random 5 rows from DF
27
How to concatenate strings?
+
28
How to call string methods
.str str. upper() str. replace() DFname[‘columnname’].str.replace(‘.’, ‘’).str.lower()
29
How to negate (change True to False and vis versa)?
~ | DFname[‘is_not’] = ~(DFname[‘column’] == ‘RB’)
30
How to check multiple columns for True/False at once
(DF[[‘col1’,’col2’]] > 100) Returns new columns for each column with True or False for each row
31
When you think ‘flagging rows’ you should be thinking
To make a column of booleans (true/false)
32
Which method takes a function and _____ it to every row in a column?
apply() applies ``` Def is_skill(pos): Return pos in [‘rb’, ‘wr’] ``` DF[‘is_skill’] = DF[‘column’].apply(is_skill) Apply defined function is_skill to every row in the column Alternative: DF[‘is_skill’] = DF[‘pos’].apply( lambda x: x in [‘rb’,’wr’])
33
Method to drop a column
DF.drop(‘col’, axis = 1, inplace = True) Default is to drop rows, axis = 1 changes to drop column
34
Method to rename a column
DF.rename(columns={‘col1’ : ‘newname’}, inplace= True)
35
How does numpy represent a missing value?
nan | Not a Number
36
Methods to detect nan or null value?
isnull() notnull() DF.[‘col’].isnull Returns true/false in new column for each row
37
Method to place custom value in place of nan?
fillna() | DF.[‘col’].fillna(-99)
38
How to parse day, month, and year from string in non-pandas Python?
gameid = ‘2021090700’ ``` year = gameid[0:4] month = gameid[4:6] day = gameid[6:8] ```
39
How to parse day, month, and year in Pandas, including changing the data type?
gameid = ‘2021090700’ DF[‘month’] = DF[‘gameid’].astype(str).str[4:6]
40
How to change a datatype?
astype(str) | astype(int)
41
Which attribute shows data type of each column in DF?
dtype
42
Pandas calls a string (str) a ______
Object
43
Name six summary statistic functions.
``` Mean() Std() Count() Sum() Min() Max() Note that min and max also work with strings, and goes by alpha ```
44
What is the axis default for summary statistic functions, and how can it be changed?
Default is columns (axis = 0), change to summarize by rows by axis = 1
45
In Summary Stats, what values will be returned for True & False
True = 1; False = 0
46
.any() evaluates what?
If any value in a column is True
47
.all() evaluates what?
If all values in a column are True
48
Code to determine how often a certain criteria is met?
(pg[[‘rush_yards, ‘rec_yards’]] > 100).any(axis =1).sum()
49
What does value_counts() do?
Summarizes each element in a column: DF[‘position’].valuecounts() WR 10 RB 20 QB 10 To summarize by frequency: DF[‘position’].valuecounts(normalize = True) WR 25% RB 50% QB 10%
50
What does crosstab do, and what is the syntax?
Similar to valuecounts() but returns for two columns pd.crosstab(adp[‘team’], adp[‘position’]) Crosstab also takes a normalize argument
51
How to see all Panda methods that operate on single columns? For DataFrames?
pd. series. and tab completing | pd. DataFrame. and tab completing