Data 101 Flashcards
(11 cards)
What is data preparation and cleaning?
The process of transforming raw data into a more structured, readable, and reliable format for analysis.
What is the importance of handling missing data?
It ensures a dataset’s accuracy and reliability, preventing unreliable and non-representative analysis.
What can result from a large volume of missing values in a dataset?
Unreliable and non-representative analysis.
Why is error correction important in data science?
To ensure data is accurate, unbiased, and reliable.
What types of errors are particularly concerning when correcting data?
Zeros or incorrectly large values.
What is the significance of dealing with outliers in data?
It ensures the data remains representative and prevents skewed results.
What is the role of standardization and normalization in data preparation?
To ensure all data is on the same scale for consistency and accurate analysis.
What is the purpose of removing duplicates in data preparation?
To optimize storage space and preserve data and model accuracy.
What does data munging or wrangling involve?
Converting data into a user-friendly format, such as a relational table or CSV file.
Fill in the blank: The key steps of preparing and cleaning data include handling missing data, error correction, dealing with outliers, standardization and normalization, removing duplicates, and _______.
data munging or wrangling
Name the 6 main steps of data cleaning
Handling Missing Data Error correction Dealing with Outliers Standardization and Normalization Removing Duplicates Data Munging or Wrangling