Overview Flashcards
Recall all important concepts presented in the course
What is statistics? (informal)
It is a field that takes the data in the world and transforms that into Information, that can be used to make decisions.
What is a Scatter Plot? Give a reason to use it.
In that kind of graph, each data point is plotted as a dot at a Cartesian plan.
On the Scatter Plot we can see if the data has a linear relationship or not.
What are outliers?
They are data points that deviate a lot from the expectation and can distort the mean of the data set.
What is a bar chart? Give a reason to use it.
In a bar chart, we choose an interval in the x axis and we aggregate the values of the data points in that interval, creating a bar with the mean of all values.
That way we get rid of the noise and get a better understanding of the Global Trend of the data
What is a histogram?
A histogram is a special case of a Bar Chart.
While Bar Charts look at 2D data, histograms look at 1D data.
In histograms, the X axis is the data we are seeing (e.g. salary) and on the Y axis we see the frequency (or count) of how many data points fall into the interval defined in the X axis.
Ex:. from $120.000 to $130.000 salary, how many employees fall into that bucket?
What is the Simpsons Paradox, also know as the reversal paradox?
is a phenomenon inprobabilityandstatistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. It is sometimes given the descriptive titlereversal paradoxoramalgamation paradox.
Seen on the UC Berkeley Gender Bias study
What is probability theory?
Probability Theory is the branch of math that deals with Probability. Probability measures the likelihood of an event to occur.
What is the first law of probability?
The sum of the probability (P) of each possible event to happen is always 1.
What is the probability of multiple independent events to happen?
is the product of their probability.
What is a conditional probability? Give an example.
It is the probability of an event to happen given that a dependent event has happened.
Ex: the probability of a cancer test be positive given that the pacient has cancer.
What is the notation of the conditional probability of outcome A given an outcome B has happened?
P( Outcome A | Outcome B )
Given an event A that depends on event B. What is the total probability of A? Use as examples of A and B a cancer test being positive and the pacient having cancer.
P(Positive) = P(Positive|Cancer)*P(Cancer) +
P(Positive|!Cancer)*P(!Cancer)
What is Bayes Rule?
It is a method to discover the probability of an event given the outcome of an event dependent to it. (eg; probability of having cancer given a test result is positive)
Write an example of Bayes Rule calculation for having cancer given the test was positive. Show which terms are the Posterior, the Joint and the Prior probabilities.
P(Cancer) - prior probability
P(Positive) = P(Positive | Cancer)P(C) + P(Positive | !Cancer)P(!Cancer) - total probability (normalizer)
P(Cancer | Positive) = P(Positive | Cancer) * P(Cancer) / P(Positive) - posterior probability
In a continuous distribution of probability, what is the probability of a specific data point?
Zero
What is a density function?
It is an equation that represents a continuous distribution of probability.