About Data Analysis Flashcards

1
Q

What Does a Data Analyst Do?

A

A data analyst is a professional who collects data, processes it, and produces insights that can help solve a problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What Are the Most Important Skills for a Data Analyst?

A
  • Data collection and organization
  • Statistical techniques to analyze data
  • Data visualization tools like PowerBI and
    Tableau
  • Problem-solving approaches
  • Verbal and written communication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define the Data Analysis Process

A

Data analysis is the process of collecting, cleaning, transforming, and analyzing data to generate insights that can solve a problem or improve business results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What Process Would You Follow While Working on a Data Analytics Project?

A
  1. Understanding the business problem
  2. Collecting data
  3. Data exploration and preparation
  4. Data analysis
  5. Presenting your analysis
  6. Predictive analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Does a Data Analyst Need Data Analytics Tools? If So, Name the Top Ones.

A

Excel, SQL, PowerBI, Tableau, Python…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Data Cleansing.

A

Data cleansing is the process of identifying and correcting irrelevant, incorrect, and incomplete data. It ensures that the final dataset contains usable and consistent data that can produce valuable insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Mining vs Data Profiling: What Is the Difference?

A

Data mining involves processing data to find patterns that were not immediately emergent in it. The focus is on analyzing the dataset and detecting dependencies and correlations within it.

Data profiling, on the other hand, implies identifying the attributes of the data in a dataset. That includes attributes such as datatype, distributions, and functional dependencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Outlier. Explain Steps To Treat an Outlier in a Dataset.

A

An outlier is a piece of data that varies significantly from the average features of the dataset that it is in. Outliers can be valuable sources of information or indications of anomalies, errors, or rare events.

There are two methods to treat outliers:

Box plot method. In this method, a particular value is classified as an outlier if it is above the top quartile or below the bottom quartile of that dataset.

Standard deviation method. If a value is greater than or less than the mean of the data +/- (3*standard deviation), then it is called an outlier in the standard deviation method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What Is the Difference Between Data Analysis and Data Mining

A

Data analysis is the broad process of collecting, cleaning, modeling, and transforming data to gain important insights.

Data mining is the more specific practice of finding rules and patterns in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What Is Metadata?

A

Metadata is data that talks about the data in a dataset. That is, it’s not the data you’re working with itself, but data about that data. Metadata can give you information on things like who produced a piece of data, how different types of data are related, and the access rights to the data that you’re working with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What Is KNN Imputation?

A

K-Nearest Neighbors (KNN) is an algorithmic method to replace missing values in a dataset with some plausible values. KNN assumes that you can approximate a missing value by looking at other values closest to it. It is more effective/accurate than using mean/median/mode, and can be performed easily using libraries like scikit-Learn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What Is Data Visualization? How Many Types of Visualization Are There?

A

Data visualization is the practice of representing data in graphical form. Visualization makes it easy for viewers to quickly glean the trends and outliers in a dataset.

There are several types of data visualizations, including:

Pie charts
Column charts
Bar graphs
Scatter plots
Heat maps
Line graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What Is Data Wrangling?

A

Data wrangling is the process of taking raw data and cleaning and enriching it so that it can be analyzed easily to generate trends and patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What Is a Pivot Table?

A

A pivot table is a data analysis tool that sources groups from larger datasets and puts those grouped values in a tabular form for easier analysis.
The purpose is to make it easier to find figures or trends in the data by applying a particular aggregation function to the values that have been grouped together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How Do You Tackle Missing Data in a Dataset?

A

There are two main ways to deal with missing data in data analysis.

Imputation is a technique of creating an informed guess about what the missing data point could be. It is used when the amount of missing data is low and there appears to be natural variation within the available data.

The other option is to remove the data. This is usually done if data is missing at random and there is no way to make reasonable conclusions about what those missing values might be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the significance of Exploratory Data Analysis (EDA)?

A
  • Helps understands data better.
  • Helps discover hidden trends and insights
    from the data.
16
Q

Explain descriptive, predictive, and prescriptive analytics.

A

DESCRIPTIVE:
- Provides insights into the past to answer
“what has happened”
- Uses data aggregation and data mining
techniques

PREDICTIVE:
- Understands the future to answer “what
could happen”
- Uses statistical models and forecasting
techniques

PRESCRIPTIVE:
- Suggest various courses of action to
answer “what should you do”
- Uses simulation algorithms and
optimization techniques to advise possible
outcomes

17
Q

How do you treat outliers in a dataset?

A

Drop the outlier records
Assign a new value
Try a new transformation