Panda Theoretical Questions. Flashcards
(10 cards)
What is the primary purpose of the Pandas library in Python, as described in the presentation?
Pandas is an open-source library used for data manipulation and analysis, particularly for numerical tables and time series, built on top of NumPy (Page 12).
Who started developing the Pandas library, and in which year did the development begin?
Wes McKinney started developing Pandas in 2007 (Page 12).
Name two limitations of NumPy that Pandas addresses.
NumPy lacks built-in support for labeled data and cannot handle mixed data types, while Pandas provides labeled indexing and supports heterogeneous data types (Page 9).
What is the difference between a Pandas Series and a DataFrame?
A Series is a one-dimensional array with index labels, while a DataFrame is a two-dimensional labeled data structure with columns of potentially different types, like a spreadsheet (Page 15).
What does the value_counts() function do when applied to a Pandas Series?
It returns a Series containing counts of unique values in the Series (Page 16).
How does the pd.concat() function differ from pd.merge() in terms of functionality?
pd.concat() combines DataFrames along rows or columns without requiring a common column, while pd.merge() performs SQL-like joins based on a common column (Pages 29–30).
What does the dropna() function do when called without parameters on a DataFrame?
It removes all rows containing any NaN values from the DataFrame (Page 26).
What is the purpose of the set_index() and reset_index() methods in a DataFrame?
set_index() sets a column as the DataFrame’s index, while reset_index() moves the index back to a column and assigns default integer indices (Page 22).
What types of data sources can be used to create a Pandas DataFrame?
Lists, lists of lists, dictionaries, lists of tuples, CSV files, Excel files, SQL files, arrays, and Series objects (Pages 17–18).
What information does the info() method provide about a DataFrame?
It provides a summary of the DataFrame, including column names, data types, non-null counts, and memory usage (Page 20).