WEEK5 Flashcards
(51 cards)
Pandas
a python library that is used to work with data in a structured way (like working with table sin excel).
Why is pandas used
it makes it easier to store, manipulate and analyze data.
pandas series
a column in a spreadsheet of list of numbers with labels attached to each value. ONE-DIMENSIONAL array of indexed data.
what happens if you create a series without specifying an index?
pandas assigns default integer indexes (0, 1, 2, 3, ..)
.values
shows the actual numbers inside the series
.index
shows the index positions
numpy
working with numbers and data in a super fast way and efficient way.
whats the relationship between pandas and numpy
pandas is built on top of numpy
does pandas follow the same indexing rues as python lists and NumPy arrays (data [start:end])
yes, you dont include the end
does pandas allow non-sequential numeric indexes (2, 5, 3, 7, …)
yes
where does index matter: pandas or numpy?
pandas
whats the difference with slicing between python and pandas
with pandas you include the last index
pandas dataframe
like a table and it is made up of multiple series aligned together. It has rows indices and column names.
whats special about pandas and creating/changing indexes
you cannot change its values (its immutable)
why can you not change the indexes in pandas?
- prevents unintended changes
- makes data structures more efficient
- allows the same index to be used in multiple dataframes without risk of modification
what does the ‘interaction’ do
finds elements that exist in both indA and indB
what does union do? (|)
combines all elements from both sets, keeping only unique values.
so indA = pd.index ([1,3,5,7,9])
indB = pd.index ([2,3,5,7,11)]
union: 1,2,3,5,7,9,11
what does ‘symmetric difference’ (^) do
finds elements that are only in one of the two sets (but not in both)
Pandas provides three methods for indexing (loc, loc, ix)
1.Using explicit index (loc): selects based on the actual index label. Works like dictionary-style lookup.
2.Using implicit index (iloc): selects based on the position number (like a list).
3.Using ix (deprecated): ix method combined both explicit and implicit indexing but is no longer used in Pandas.
Ufuncs
functions that work element-wise on arrays. they are useful for mathematical transformations in data analysis
When performing operations between two Pandas Series, how does Pandas align the data?
Pandas matches data by index labels, not by position.
What happens if an index is missing in one of the Pandas Series during an operation?
The result for that index will be NaN (Not a Number).
If population has ‘Texas’ and area has ‘Texas’, what happens when calculating population / area?
The operation works because both Series have the ‘Texas’ index.
What happens if population has ‘New York’ but area does not when calculating population / area?
The result for ‘New York’ is NaN because one value is missing.