Data Science Flashcards

(55 cards)

1
Q

What is data analysis?

A

The process of inspecting, cleaning, transforming, and modeling data to discover useful information and support decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between qualitative and quantitative data?

A

Qualitative data is descriptive (e.g., names, categories), while quantitative data is numerical (e.g., height, age).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a data analyst?

A

A professional who collects, processes, and performs statistical analyses on large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is data cleaning?

A

The process of fixing or removing incorrect, corrupted, or incomplete data within a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is EDA)?

A

Exploratory Data Analysis

A process of analyzing datasets to summarize their main characteristics, often using visual methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the mean?

A

The average of a dataset, calculated by summing all values and dividing by the number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the median?

A

The middle value in an ordered dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the mode?

A

The most frequently occurring value in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is standard deviation?

A

A measure of how spread out numbers are from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is variance?

A

The average of the squared differences from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a normal distribution?

A

A bell-shaped distribution that is symmetrical about the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a p-value?

A

The probability that observed data occurred by chance under the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a confidence interval?

A

A range of values derived from a sample that is likely to contain the population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is correlation?

A

A statistical measure that describes the extent to which two variables are related.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is causation?

A

A relationship where one variable causes a change in another variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is SQL?

A

Structured Query Language, used to communicate with databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does SELECT do in SQL?

A

Retrieves data from a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does WHERE do in SQL?

A

Filters records based on specified conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a JOIN in SQL?

A

Combines rows from two or more tables based on a related column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is GROUP BY in SQL?

A

Aggregates data across rows that share a common value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a pivot table?

A

A tool in Excel used to summarize and analyze data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does VLOOKUP do in Excel?

A

Searches for a value in the first column of a table and returns a value in the same row from another column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is conditional formatting?

A

A feature that changes the appearance of cells based on conditions.

24
Q

What is data validation?

A

A tool that restricts the type of data entered into a cell.

25
What is the difference between relative and absolute references in Excel?
Relative references change when copied; absolute references stay the same.
26
What is a histogram?
A chart that displays the distribution of a dataset.
27
What is a bar chart?
A chart that represents data with rectangular bars.
28
What is a scatter plot?
A graph used to determine relationships between two variables.
29
What is a line chart used for?
To show trends over time.
30
What is data visualization?
The graphical representation of information and data.
31
What is Pandas in Python?
A library used for data manipulation and analysis.
32
What is NumPy?
A library for numerical operations in Python.
33
What is a DataFrame?
A 2-dimensional labeled data structure with columns of potentially different types.
34
What does .groupby() do in Pandas?
Splits the data into groups based on some criteria.
35
What does .merge() do in Pandas?
Combines DataFrames using database-style joins.
36
What is a KPI?
Key Performance Indicator, a measurable value that indicates how effectively a company is achieving key objectives.
37
What is hypothesis testing?
A method of making decisions using data, typically involving p-values and significance levels.
38
What is a P
39
What is A/B testing?
An experiment comparing two versions of something to determine which performs better.
40
What is root cause analysis?
A method of problem solving that tries to identify the primary cause of a problem.
41
What is outlier detection?
Identifying values that are significantly different from others in the dataset.
42
What is time series analysis?
Analyzing data points collected or recorded at specific time intervals.
43
What is the SQL syntax to select all columns from a table named 'customers'?
SELECT * FROM customers;
44
How do you filter rows in SQL where age is greater than 30?
SELECT * FROM table_name WHERE age > 30;
45
What is the SQL syntax to count the number of rows in a table?
SELECT COUNT(*) FROM table_name;
46
How do you sort results in SQL by a column named 'price' in descending order?
SELECT * FROM table_name ORDER BY price DESC;
47
What is the SQL syntax for an INNER JOIN between 'orders' and 'customers' on 'customer_id'?
SELECT * FROM orders INNER JOIN customers ON orders.customer_id = customers.customer_id;
48
How do you import Pandas in Python?
import pandas as pd
49
What is the syntax to read a CSV file named 'data.csv' using Pandas?
pd.read_csv('data.csv')
50
How do you display the first 5 rows of a DataFrame called df?
df.head()
51
How do you select the 'age' column from a DataFrame called df?
df['age'] =
52
What is the syntax to filter rows where 'salary' > 50000 in a DataFrame df?
df[df['salary'] > 50000]
53
What is the syntax to loop through a list called 'items' in Python?
for item in items: print(item)
54
How do you create a list of numbers from 0 to 9 in Python?
list(range(10))
55
How do you import NumPy in Python?
import numpy as np