Data Science Flashcards
(55 cards)
What is data analysis?
The process of inspecting, cleaning, transforming, and modeling data to discover useful information and support decision-making.
What is the difference between qualitative and quantitative data?
Qualitative data is descriptive (e.g., names, categories), while quantitative data is numerical (e.g., height, age).
What is a data analyst?
A professional who collects, processes, and performs statistical analyses on large datasets.
What is data cleaning?
The process of fixing or removing incorrect, corrupted, or incomplete data within a dataset.
What is EDA)?
Exploratory Data Analysis
A process of analyzing datasets to summarize their main characteristics, often using visual methods.
What is the mean?
The average of a dataset, calculated by summing all values and dividing by the number of values.
What is the median?
The middle value in an ordered dataset.
What is the mode?
The most frequently occurring value in a dataset.
What is standard deviation?
A measure of how spread out numbers are from the mean.
What is variance?
The average of the squared differences from the mean.
What is a normal distribution?
A bell-shaped distribution that is symmetrical about the mean.
What is a p-value?
The probability that observed data occurred by chance under the null hypothesis.
What is a confidence interval?
A range of values derived from a sample that is likely to contain the population parameter.
What is correlation?
A statistical measure that describes the extent to which two variables are related.
What is causation?
A relationship where one variable causes a change in another variable.
What is SQL?
Structured Query Language, used to communicate with databases.
What does SELECT do in SQL?
Retrieves data from a database.
What does WHERE do in SQL?
Filters records based on specified conditions.
What is a JOIN in SQL?
Combines rows from two or more tables based on a related column.
What is GROUP BY in SQL?
Aggregates data across rows that share a common value.
What is a pivot table?
A tool in Excel used to summarize and analyze data.
What does VLOOKUP do in Excel?
Searches for a value in the first column of a table and returns a value in the same row from another column.
What is conditional formatting?
A feature that changes the appearance of cells based on conditions.
What is data validation?
A tool that restricts the type of data entered into a cell.