Data Analytics Flashcards
iloc example
df.iloc[1,2] - single cell (200)
df.iloc[2] - Entire row (1000, 2000, 3000, 4000)
loc example
Same as iloc but with string headings
e.g.
df.loc[2,’a’]
describe
Summary of a single column
df.[‘a’].describe()
Mean
The total of the figures, divided by the number of individual figures
1,2,2,3,2,4
Mean: 13/6 = 2.16666
Median
The middle point
1,2,2,3,2,4 -> 1,2,2,2,3,4
Median: 2
Mode
The most common Figure
1,2,2,3,2,4
Mode : 2
Inter Qaurtile range
The Difference between the First and Third Qaurtile Values
Q1: 10
Q3: 50
IQR: 40
Nominal
Categorisation without order e.g. the books are in: English, French, German etc.
Distinctiveness ( = and != )
Ordinal
Categorisation with order e.g. the coffee was: Good, Medium, Bad
Distinctiveness ( = and != )
Order ( <,<=,>,>= )
Interval
Scale with an arbitrary zero value e.g. temperature, shoe size, dates
Distinctiveness ( = and != )
Order ( <,<=,>,>= )
Addition ( + and - )
Ratio
Scale with a non-arbitrary zero value e.g. distance, age, speed etc.
Distinctiveness ( = and != )
Order ( <,<=,>,>= )
Addition ( + and - )
Multiplication ( * and / )
NOIR
Qualitative:
Nominal
Ordinal
Quantatitive:
Interval
Ratio
DOAM
Distinctiveness (=, !=)
Ordering (<, <=, >, >=)
Addition (+, -)
Multiplication (*, /)
Nominal : Binary
1/0, On/Off, Yes/No, True/False
Normal Distribution
Standard Bell Curve
Mode, mean and Median are in the centre
Left skewed
Tail is on the left, Hump on the right
Left: Mean
Middle: Median
Right: Mode
“You’re mean when you walk away”
Right skewed
Tail on the right, hump on the left
Left: Mode
Middle: Median
Right: Mean
“You’re mean when you walk away”
Tuple
stores data but cant be changed
myTuple = (1,2,3)
List in relation to tuple
Like a tuple but can be changed
myList = [1,2,3]
List
ordered collection of elements supporting mixed data types
Array
similar to a list but all must be of the same type
2D array or matrix
a grid of elements with uniform data types
DataFrame
two dimensional, potentially tabular data structure with labelled axes, allowing different data types for each column
e.g. SQL, or CSV
Measures of Dispersion
Standard Deviation, and Variance