Data Science Orientation Flashcards

1
Q

What does a Data Scientist do?

A

So, typically as a data scientist, the majority of our time is spent dealing with data, munging data; doing what we call feature engineering, so taking that raw input

data and transforming it in a way that we can actually provide value.

A great deal of my time is spent cleaning data and getting data ready to actually be good enough to be

trust worthy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some key skills you need as a data scientist?

A
  1. Math
  2. Statistics
  3. Programming R / Python
  4. Data Visualization and Modeling
    5.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the first thing you can do as a data scientist?

A

I think the best thing that you can do with data is just understand what you have and start asking yourself questions and just start exploring and from there you can make incredible – do incredible things for businesses and organizations to help them drive improvements and whatever it is, that their problems are trying to solve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the types of questions you may ask of the data?

A
  1. Descriptive
  2. Associative
  3. Comparitive
  4. Predictive
    5.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You are working in Excel with a large table of data that includes a column of student scores.

You want to highlight the scores in the table so you can find the rows for the students who scored the top 10% of grades more easily.

Which Excel visualization should you use?

A

Conditional Formatting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

You create a line chart in Excel, showing monthly rainfall levels over a period of 10 years.

You want the chart to indicate whether rainfall is generally rising, falling, or remaining at a consistent level.

What should you add to the chart?

A

A trendline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the standard deviation for normal distribution of data?

A

a normal distribution, 67 percent of your data will fall between plus 1 and minus 1

standard deviations from the mean; 95 to 96 percent fall between plus 2 and minus 2; and

then 99 percent between plus 3 and minus 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Standard error?

A

SE= S/√ n

The standard error is simply the standard deviation divided by the square root of the number of observations that you have.

Now, this is important because when we look at any value that we obtain from any set of data, it’s just an estimate of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Kurtosis?

A

Kurtosis is an estimate of the normality of the data; the closer it

is to 0, the more likely the data is normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is normal for Kurtosis and skewness?

A

In both cases, values between minus 2 and plus

2 are considered okay for statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a correlation range?

A

First, correlations always range from negative 1 to positive 1.

The closer the relationship is to 1, the more in-alignment those variables are, so they move together either in the same direction if it’s a positive correlation, or in opposite

directions if it’s a negative correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is variance?

A

Variance is simply the measurement of dispersion in the data; how far away is – are each of the values on average from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Good to remember about : Standard Error

A

The standard error actually gets smaller with the larger number of samples that you have, which is why we take it and we divide it by the square root of N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly