Model Fitting Flashcards by Holly Fitzgerald

{xi…xm}=

random sample from pdf p(x) with mean μ and variance σ^2

How well did you know this?

Not at all

Perfectly

sample mean

μ hat = 1/m sum from i=1 to M of xi

How well did you know this?

Not at all

Perfectly

as sample size increases, sample mean

increasingly concentrated near to true mean

How well did you know this?

Not at all

Perfectly

var(μ hat)=

σ^2/M

How well did you know this?

Not at all

Perfectly

for any pdf with finite variance σ^2, as M approaches infinity, μ hat follows

a normal pdf with mean μ and variance σ^2/M

How well did you know this?

Not at all

Perfectly

the central limit theorem exaplains

importance of normal pdf in statistics

but still based on asymptotic behaviour of an infinite ensemble of samples that we didn’t actually observe

How well did you know this?

Not at all

Perfectly

bivariate normal pdf

p(x,y) which is specified by μx, μy, σx, σy, p

often used in the physical sciences to model the joint pdf of two random variables

How well did you know this?

Not at all

Perfectly

the first four parameters of the bivariate normal pdf are

equal to the following expectation values

E(x)=μx
E(y)=μy
var(x)=σx^2
var(y)=σy^2

How well did you know this?

Not at all

Perfectly

the parameter p is known as the

correlation coefficient

How well did you know this?

Not at all

Perfectly

what does the correlation coefficient satisfy?

E[(x-μx)(y-μy)]=pσxσy

How well did you know this?

Not at all

Perfectly

if p=0, then

x and y are independent

How well did you know this?

Not at all

Perfectly

what is E[(x-μx)(y-μy)]=pσxσy also known as

the covariance of x and y and is often denoted cov(x,y)

How well did you know this?

Not at all

Perfectly

what does the covariance define?

how a parameter (x) varies with another parameter (y)

How well did you know this?

Not at all

Perfectly

p>0

positive correlation
y tends to increase as x increases

How well did you know this?

Not at all

Perfectly

p<0

negative correlation
y tends to decrease as x increases

How well did you know this?

Not at all

Perfectly

contours become narrower and steeper as

|p| approaches 1

How well did you know this?

Not at all

Perfectly

what is pearson’s product moment correlation coefficient

Study These Flashcards

given sampled data, used to estimate the correlation between variables

if p(x,y) is bivariate normal, then r is

Study These Flashcards

an estimator of p

the correlation coefficient is a unitless version of

Study These Flashcards

the covariance

if x and y are independent variables, cov(x,y)=

Study These Flashcards

so p(x,y)=p(x)p(y)

the method of least squares

Study These Flashcards

workhorse method for fitting lines and curves to data in the physical sciences

useful demonstration of underlying statistical principles

ordinary least squares

Study These Flashcards

scatter plot of (x,y) is assumed to arise from errors in only one of the two variables

ordinary least squares - can write

Study These Flashcards

yi=a+bxi+Ɛi

what is Ɛi

Study These Flashcards

the residual of the ith data point

i.e the difference between the observed value of yi and the value predicted by the best fit, characterised by parameters a and b

we assume that the Ɛi are

an independently and identically distributed random sample from some underlying probability distribution function with mean zero and variance σ^2 (residuals are equally likely to be positive or negative and all have equal variance)

ds/da=0 when

a=a_LS

Weighted least squares is an efficient method that makes good use of

small data sets

weighted least squares - in the case where σi^2 is constant for all i, the formulae

reduce to those for the unweighted case

principle of maximum likelihood is a method to

estimate the parameters of a distribution which fit to observed data

principle of maximum likelihood - first

decide which model we think best describes the process of generating the data.

Maximum likelihood estimation is a method that will find the values

of mu and sigma that result in the curve that best fits the data

Assuming all events are independent, then the total probability of observing all of data is

the product of observing each data point individually (i.e. the product of the individual probabilities)

when is chi2 used?

when we know there are definite outcomes e.g. flipping a coin, measure whether email arrival rate is constant in time => no errors on measurement

when is reduced chi2 used?

when we know there is uncertainty or variance in a measured quantity e.g. measure flux from a galaxy => errors on measurement

poisson distribution, k=

1 (mean)

normal distrubution, k=

2 (mean and variance)

degrees of freedom=

N-K-1

For the reduced Chi2, don’t know number of outcomes, so degrees of freedom are

the number of data points

p-value

If the null hypothesis were true, how probable is it that we would measure as large, or larger, a value of chi2 ?

standard value to reject a hypothesis

a p-value <0.05

If we obtain a very small P-value (e.g. a few percent?) we can interpret this as

providing little support for the null hypothesis, which we may then choose to reject.

Model Fitting Flashcards

(42 cards)