Model Fitting Flashcards
(42 cards)
{xi…xm}=
random sample from pdf p(x) with mean μ and variance σ^2
sample mean
μ hat = 1/m sum from i=1 to M of xi
as sample size increases, sample mean
increasingly concentrated near to true mean
var(μ hat)=
σ^2/M
for any pdf with finite variance σ^2, as M approaches infinity, μ hat follows
a normal pdf with mean μ and variance σ^2/M
the central limit theorem exaplains
importance of normal pdf in statistics
but still based on asymptotic behaviour of an infinite ensemble of samples that we didn’t actually observe
bivariate normal pdf
p(x,y) which is specified by μx, μy, σx, σy, p
often used in the physical sciences to model the joint pdf of two random variables
the first four parameters of the bivariate normal pdf are
equal to the following expectation values
E(x)=μx
E(y)=μy
var(x)=σx^2
var(y)=σy^2
the parameter p is known as the
correlation coefficient
what does the correlation coefficient satisfy?
E[(x-μx)(y-μy)]=pσxσy
if p=0, then
x and y are independent
what is E[(x-μx)(y-μy)]=pσxσy also known as
the covariance of x and y and is often denoted cov(x,y)
what does the covariance define?
how a parameter (x) varies with another parameter (y)
p>0
positive correlation
y tends to increase as x increases
p<0
negative correlation
y tends to decrease as x increases
contours become narrower and steeper as
|p| approaches 1
what is pearson’s product moment correlation coefficient
r
given sampled data, used to estimate the correlation between variables
if p(x,y) is bivariate normal, then r is
an estimator of p
the correlation coefficient is a unitless version of
the covariance
if x and y are independent variables, cov(x,y)=
0
so p(x,y)=p(x)p(y)
the method of least squares
workhorse method for fitting lines and curves to data in the physical sciences
useful demonstration of underlying statistical principles
ordinary least squares
scatter plot of (x,y) is assumed to arise from errors in only one of the two variables
ordinary least squares - can write
yi=a+bxi+Ɛi
what is Ɛi
the residual of the ith data point
i.e the difference between the observed value of yi and the value predicted by the best fit, characterised by parameters a and b