Simulations Flashcards
(59 cards)
random module
used to generate random numbers in python
random.seed()
If you initialize the random number generator with a specific seed using random.seed(), the sequence of numbers generated will always be the same (no matter how many times you run it)
What is the meaning of 1 in random.seed(1)?
1 doesn’t have anything to do with actual list content, it is just index used to generate random numbers - it enables you to access the same numbers
np.random.choice()
used to generate a random sample from given array or list
np.arange()
generates array containing evenly spaced values within specified range -> so it is similar to range() but range() returns a list and np.arrange() returns array
x = np.random.choice(np.arange(1, 7), 3)
displays 3 random numbers from array 1 to 6
What is a difference between simulation probability and mathematical probability?
Simulation probability is an approximation, whereas mathematical probability is exact. However, mathematical probability is often calculated in ideal conditions, whereas simulations can be adjusted to real-life scenarios
one-sample t-test
parametric test
examines whether the mean of a population is statistically different from a known or hypothesized value (therefore it checks whether the difference between two variabels varies from 0)
What are data requirements for one-sample t-test?
1) continuous test variable
2) scores on test variable are independent (there is no relationship between scores)
3) random sample of data from population
4) normal distribution of sample and population on test variable
5) homogeneity of variances (variances are approx equal in both sample and population)
6) no outliers
what is linear regression?
estimates the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable. Linear regression fits a straight line or surface that minimizes the discrepancies between predicted and actual output values.
Y = β0+ β1X+ ε
Y = dependent variable
B0 = Y-intercept
B1 = slope
x = independent variable
e = error
How to calculate the slope (B1)?
CHANGE IN Y/CHANGE IN X
What is polynomial linear regression?
It is used when there is nonlinear relationship between the predictor and response variable (when data points form a curve)
Y = β0+ β1X + β2X2+ … + βhXh+ ε
In this equation,his referred to as the degreeof the polynomial.
What are advantages of polynomial regression?
- Polynomial provides the best approximation of the relationship between the dependent and independent variable.
- A Broad range of function can be fit under it.
- Polynomial basically fits a wide range of curvature.
What are disadvantages of polynomial regression?
- The presence of one or two outliers in the data can seriously affect the results of the nonlinear analysis.
- These are too sensitive to the outliers.
- In addition, there are unfortunately fewer model validation tools for the detection of outliers in nonlinear regression than there are for linear regression.
- Also more complex models are prone to overfitting
What is continous uniform distribution?
symmetric probability distribution where all outcomes have an equal likelihood of occurring
How to draw samples from uniform distribution?
stats.uniform.rvs(0, 1, size=100):
generates 100 random samples from uniform distribution
loc = 0 (sets lower bound)
scale = 1 (sets range of distribution)
so samples are drawn from interval (0,1)
What is binomial distribution?
probability of outcome being either success or failure
How to generate random variables from binomial distribution?
stats.binom.rvs(n=1, p=0.5, size=100)
n - number of trials
p - probability of success
size - number of random variables
What is normal distribution?
data are symmetrically distributed; bell-shaped - most values clustering around central region
mean = median = mode
How to generate random samples from normal distribution in Pyton?
stats.norm.rvs(loc=0, scale=1, size=100)
loc = mean
scale = standard deviation
size = number of random samples to generate
OR
np.random.normal (loc=0, scale=1, size=100)
What does function stats.norm.cdf?
Cumulative Distribution Function
area under the curve
For a normal distribution, the CDF at a point x gives the area under the probability density function (PDF) curve to the left of x. This area corresponds to the probability that a randomly selected value from the distribution is less than or equal to x.
What does function stats.norm.pdf?
Probability Density Function
The PDF of a continuous random variable gives the relative likelihood of the random variable taking on a specific value.
It is height of the curve
What is stats.norm.ppf?
Percent Point Function
inverse cdf!
given the probability of numbers smaller than x, what is x?
How to generate random samples from exponential distirbution?
stats.expon.rvs