Understanding and visualizing data Flashcards

1
Q

You are working as a data scientist for a fintech company. At the moment, you are working on a regression model that predicts how much money customers will spend on their credit card transactions in the next month. You believe you have created a good model; however, you want to complete your residual analysis to confirm that the model errors are randomly distributed around zero. What is the best chart for performing this residual analysis?

A

Scatter plot. In this case, you want to show the distribution of the model errors. A scatter plot would be a nice approach to present such an analysis. Having model errors randomly distributed across zero is just more evidence that the model is not suffering from overfitting. Histograms are also nice for performing error analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Although you believe that two particular variables are highly correlated, you think this is not a linear correlation. Knowing the type of correlation between these two variables is crucial for determining the type of algorithm that you are going to use in later phases of this project. Which type of chart could you use to show the correlations between these two variables?

A

Scatter plot. Correlation is a type of relationship. In this case, we just have two variables, so a scatter plot would be a good approach to this problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You are working for a call center company, and the business team wants to see the percentage of calls on each channel of their interactive voice response device. You just got those percentages, broken down per month. What’s the best way to show this monthly information to your users?

A

Stacked 100% column chart. In this case, you want to show the composition of your data across different periods. Both a and b could be used to show compositions; however, only option b would be appropriate to show those compositions across each month.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You are working as a data scientist for a telecom company. The company offers many different services related to broadband, TV, phone, and mobile. In total, the company offers more than 50 services. You want to see the number of services that each customer usually signs for. What’s the most appropriate chart to use to prepare for this visualization?

A

Histogram. Since the company has so many services and you want to see the distribution of the number of services per customer, the most appropriate chart would be a histogram. Keep in mind that, most of the time, when the goal of the analysis is showing distributions, your most common options are histograms and box plots. Depending on the use case, scatter plots can also be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You are working on a sales report for your company and you need to make different types of comparisons regarding your company’s sales across different regions. Which types of charts are suitable for presenting comparisons in your data?

A

a) Bar chart
b) Column chart

c) Line chart
You have multiple options when it comes to using charts to present comparisons on your data. Use bar charts when you have a single variable and no time dimension; use column charts when you have one or two variables changing over time; and use line charts when you have three or more variables changing over time. Finally, remember that it is also possible to show comparisons using tables. This is especially helpful when you have three or more variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

You are working as a data scientist for a meteorological organization. Your company is measuring the daily temperature of a particular region during the entire summer. The summer is over and it is now time to present a detailed report about the data that has been captured. What types of analysis are valid?

A

a) Create a box plot to show some statistics about the data; for example, the median temperature, lower and upper quartiles, and outlier points.
b) Create a histogram to show the distribution of the data across all the different days when temperatures have been measured.
c) Create a key performance indicator just to present the average temperature that has been measured during the summer, in that particular region.
d) Create a line chart to show the evolution of the temperature, day after day. That will give you some sense about the increasing and decreasing trends.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

You are working in a financial company and your team is responsible for the investment portfolio. You have multiple options for investments and the key characteristics of all of them are annual rate, investment period, and investment amount. There is a straight relationship between these three components. You have built a bubble chart to show the relationship among these investment options, but now, you want to compare them and add a few more dimensions, such as level of risk and year after year performance. What is the most appropriate way to present this information in the most compact way?

A

Create a table to present this information, where each investment option is represented by rows and each metric is represented by columns. This is the most compact way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

You are working as a data scientist for a marketing company. At the moment, you are building a machine learning model to predict churn. One of the features that you are using on your model is annual salary. Due to the nature of this variable, you suspect that a log transformation would be beneficial to bring this feature closer to a normal distribution. What type of chart would support you to prove that, indeed, a log transformation is a good idea for this feature?

A

Histogram. Deciding on whether log transformation is a good idea or not will depend on the distribution of the data. If you have a skewed distribution, then log transformation could be beneficial to bringing this data closer to a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly