Data analysis code Flashcards
(152 cards)
What does axis=1 do in a DataFrame operation?
It moves along rows (horizontally)
What does axis=0 do in a DataFrame operation?
It moves down columns (vertically).
What is the purpose of the Shapiro-Wilk test?
It tests whether the data is normally distributed.
What is the interpretation of a p-value greater than 0.05 in the Shapiro-Wilk test?
Data is normally distributed; fail to reject the null hypothesis.
What does df.shape return?
A tuple with the number of rows and columns: (number of rows, number of columns).
What statistics does df.describe() provide?
It provides summary statistics: Count, Mean, Standard Deviation, Minimum, Quartiles (25%, 50%, 75%), and Maximum.
What does df.mean() do in a DataFrame?
It computes the mean of each column.
How do you access a specific column in a DataFrame?
Use the column name: df[‘column_name’].
What does the code df[‘column_name’].max() do?
It returns the maximum value in the specified column.
What assumptions does the paired t-test make?
It assumes no major outliers, independent observations, continuous dependent variable, and normally distributed dependent variable.
How do you read a CSV file into a DataFrame in Python?
df = pd.read_csv(‘../folder/name.filetype’)
How do you read an Excel file into a DataFrame in Python?
df = pd.read_excel(‘file_path’)
How do you read a tab-separated file into a DataFrame?
Use the sep=’\t’ parameter in pd.read_csv().
How do you handle missing data when reading a file?
Use na_values=’’ to replace ‘’ with NaN.
How do you specify the data type for integer columns in a DataFrame?
Use dtype=pd.Int64Dtype() to convert float to integers.
How do you rename columns when reading a file into a DataFrame?
Use header=None, names=[‘column1’, ‘column2’, …] when reading the file.
How do you skip rows from the top when reading a file?
Use the skiprows=… parameter.
How do you skip rows from the bottom when reading a file?
Use the skipfooter=… parameter.
How do you set a specific column as the index when reading a file?
Use index_col=1 to set the second column as the index
How do you update a specific value in a DataFrame?
Use df.at[‘row_name’, ‘column_name’] = new_value.
What does df.info() display?
It displays the number of entries, index range, columns, and non-null count per column.
What does the data type float64 represent?
It represents decimal numbers.
What does the data type int64 represent?
It represents whole numbers (integers).
What does the data type object represent?
It represents strings or words.