Statistics Flashcards
(56 cards)
3 basic Classification of Statistics:
Classical statistics - paremeters unknown to us but they are fixed and we want to make inferences(mu, sigma ^2, X bar. )
Bayesian statistics - paremeters are not fixed, more parametric, you have to impose a distribution
Non parametric Statistics - does not assume normality, has the least assumptions
What are Descriptive Statistics?
- First approach to turn data into information
- Summarize large amounts of data - ease of interpretation
- It consists of tables, graphs, summary measures, images or
anything that illustrates the information contained in the data.
-A picture is worth a thousand words
Types of Statistical Variables:
1) Qualitative: sex, socioeconomic status, marital status
2) Quantitative:
a) discrete- # of times a particular phenomenon has happened.
b) Continuous-indicate the result of a random
experiment whose sample space or possibilities is uncountable.
Type of Statistical Data:
1) Ordinal: 1, 2, 3,…; A, B, C, …)
2) Non-ordinal: Married, Divorced, Single, Widowed …
3) Time Series: Poverty over time
4) Cross Section: Population in 200 countries at a given time (say for January 2010)
5) Panel Data: Population in 200 countries over the last 30 year.
numerical measures:
• Location: Average, median, mode, quartiles, quintiles,
deciles, percentiles (quantiles in general), trimmed mean, weighted mean, geometric mean, harmonic mean, etc.
• Scale: Range, interquartile range, variance, pseudovariance, standard deviation, etc.
• Other: Coefficient of Variation, Sharpe Ratio, skewness, kurtosis…
mean:
the arithmetic average
mean=EX/N
-it is important to remember that although mean provides a useful peace of information, it does not tell you anything about how spread out the scores are(variance), outliers that might skew the mean, etc.
median
the number in the distribution that marks the 50th percentile/the number in the middle of the entire distribution
mode
the number that has the highest frequency(occurs most often)
Quantiles
quartile: splits the ranked data into 4 segments with an equal number of values per segment:
quintiles: splits the ranked data into 5 segments…
deciles: splits the ranked data into 10 segments…
percentiles: splits the ranked data into 100 segments…
Trimmed mean/Truncated mean
A method of averaging that removes a small percentage of the largest and smallest values before calculating the mean. After removing the specified observations
- the trimmed mean is found using an arithmetic averaging formula (look in below website).
https: //www.easycalculation.com/statistics/learn-trimmed-mean.php
Weighted mean
Instead of each data point contributing equally to the final mean, some data points contribute more “weight” than others.
Formula: (X1 x .40) + (X2 x .30) + (X3 x .20) + (X4 x .10)
-If all the weights are equal, then the weighted mean equals the arithmetic mean (the regular “average” you’re used to)
range
The difference between the lowest and highest value.
Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9, so the range is 9 − 3 = 6.
Interquartile Range(IQR)/H-spread
also called the midspread or middle fifty, it is a measure of statistical dispersion, it “chops off” the top 25% quartile and bottom 25%(ignores 50% of the data).
variance
the expectation of the squared deviation of a random variable from its mean, and it informally measures how far a set of (random) numbers are spread out from their mean(dispersion).
σ^2 = [ ∑(x-mean)^2] / N
standard deviation
a measure that is used to quantify the average amount of variation or dispersion of a set of data values from the mean.
(represented by the Greek letter sigma σ or the Latin letter s)
Square root of variance (√[ ∑(x-mean)^2 / N)
Parameters vs Estimators
P correspond to the population. They are practical quantities. They can be computed from the data
E correspond to the sample. They are theoretical quantities, many times unknown
stochastic model
tool for estimating probability distributions for a collection of random variables over time
3 methods assigning probability
Classical Method - based on the assumption of equally likely outcomes - > counting techinques
Relative Frequency Method - based on experimentation or historical data
Subjective Method - based on judgement, still can be scientific
Complement vs Union vs Intersection
The Complement of an event is defined to be the event consisting of all sample points that are not in A
-it is denoted as A^c
The union of events A and B is the event containing all sample points that are in A or B(or both)
-denoted as A U B
The Intersection of events A and B is the set of all sample points that are in both A AND B
-denoted as A ^ B
Addition Law
provides a way to compute the probability of event A, or B, or both A and B occuring
- law is written as P(AUB) = P(A) + P(B) - P(A∩B)
- this is done so you don’t count them twice
Mutually exclusive events
have no sample points in common, cannot happen at the same time.
For example: when tossing a coin, the result can either be heads or tails but cannot be both.
Conditional probability
The probability of an event given that another event has occurred
-Denoted as P(A|B) computed mathematically as follows P(A|B) = P(A ^ B) / P(B)
Independent Events
If the probability of event A is not changed by the occurrence of event B
-It would simply be denoted as P(A) but mathematically to find out if they are dependent do P(A) x P (B)
Baye’s Theorem
describes the probability of an event, based on conditions that might be related to the event.
-For example, suppose one is interested in whether a person has cancer, and knows the person’s age. If cancer is related to age, then, using Bayes’ theorem, information about the person’s age can be used to more accurately assess the probability that they have cancer.
Bayes’ theorem provides the means for revising theprior probabilities.