Data Science Flashcards
(51 cards)
What is the difference between qualitative and quantitative data?
Qualitative data is descriptive (e.g., names, categories), while quantitative data is numerical (e.g., height, age).
What is data cleaning?
The process of fixing or removing incorrect, corrupted, or incomplete data within a dataset.
What is EDA?
Exploratory Data Analysis
A process of analyzing datasets to summarize their main characteristics, often using visual methods.
What is the mean?
The average of a dataset, calculated by summing all values and dividing by the number of values.
What is the median?
The middle value in an ordered dataset.
What is the mode?
The most frequently occurring value in a dataset.
What is standard deviation?
A measure of how spread out numbers are from the mean.
What is variance?
The average of the squared differences from the mean.
What is a normal distribution?
A bell-shaped distribution that is symmetrical about the mean.
What is a p-value?
The probability that observed data occurred by chance under the null hypothesis.
What is a confidence interval?
A range of values derived from a sample that is likely to contain the population parameter.
What is correlation?
A statistical measure that describes the extent to which two variables are related.
What is causation?
A relationship where one variable causes a change in another variable.
What is SQL?
Structured Query Language, used to communicate with databases.
What does SELECT do in SQL?
Retrieves data from a database.
What does WHERE do in SQL?
Filters records based on specified conditions.
What is a JOIN in SQL?
Combines rows from two or more tables based on a related column.
What is GROUP BY in SQL?
Aggregates data across rows that share a common value.
What is a pivot table?
A tool in Excel used to summarize and analyze data.
What does VLOOKUP do in Excel?
Searches for a value in the first column of a table and returns a value in the same row from another column.
What is conditional formatting?
A feature that changes the appearance of cells based on conditions.
What is a histogram?
A histogram is a graphical representation of the distribution of numerical data. It groups data into bins (or intervals) and shows how many data points fall into each bin using bars. Unlike a bar chart, the bars in a histogram touch each other, indicating the data is continuous.
What is a bar chart?
A bar chart is a graphical display of categorical data using rectangular bars. Each bar represents a category, and the height or length of the bar shows the frequency or value. Unlike histograms, bars do not touch because the categories are discrete, not continuous.
What is a scatter plot?
A scatter plot is a type of graph that shows the relationship between two numerical variables. Each point on the plot represents an observation with values on the x-axis and y-axis. It’s useful for identifying correlations, trends, outliers, and patterns in data.