Nonparametric and semiparametric estimation Flashcards

1
Q

Basic. What is a non parametric model?

A

In a non-parametric model, we assume as little as possible. E.g.,

$$
y =m(x)+\epsilon
$$

Where $m(.)$ is an unspecified function of $x$. This is thus a non-parametric regression model of the CDF.

We need a lot of data to use non-parametric estimation.

A histogram is actually a nonparametric estimator of the density of our variable of interest. If we like to get something smoother, we would use the kernel density estimator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How dost it work when we estimate a non-parametric model?

A

When estimating a non-parametric regression model, a local weighted regression line at each point $x
$ is fitted using centered subsets that include the closest $h\times N$ observation. Where $h$ is the bandwidth and $N$ is the sample size. The weights decline as we move away from $x$.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a kernel?

A

The kernel estimate is a weighted average of observations within the bandwidth at the current point of evaluation. Data closest to the current point of evaluation are given more weight, as specified by a function called the kernel.

This is kind of a moving average of our data set (density)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we think about bandwidth in the context of kernels

A

Bandwidth = h

The bandwidth decides how much data we use in our moving average. Using more data will create a smoother estimate.

Choosing the smallest bandwidth leads to a jagged density estimate while the largest bandwidth overs smooths the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the point of estimating a density function? ANd what estimators can we use?

A

The point of these types of estimations is to estimate the density of $f(x_0)$ of $x$ evaluated at some point $x_0$. For this we can use:
- The histogram estimator (like a uniform kernel)
- The Kernel density estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Formulate the histogram estimator and describe its parts

A

See notion.

Where $h$ is the bandwidth and $N$ is the sample size. This estimator gives all observations in $x_0 \pm h$ equal weight. This leads to a density estimate that is a step function, even if the underlying density is continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Formulate the general kernel estimator.

A

See notion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What needs to be true for a kernel function?

A

The kernel function $K(·)$ must be continuous, symmetric
around zero, and integrate to unity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the most important thing when it comes to non-parametric estimation. The kernel or the BW?

A

In practice the choice of kernel is not a huge deal, the choice of BW is more important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can be said about the mean obtained from the kernel density estimator?

A

The kernel density estimator is biased, with the bias term $ b(x_0)$ that depends on the bandwidth, the curvature of the true density, and the kernel used. The bias disappears asymptotically if $h\to0$ as $N \to \infin$.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the kernel density bias?

A

See notion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What can be said about the mean obtained from the kernel density estimator?

A

The variance disappears if $Nh → ∞$, which requires that while $h → 0$, it must do so at a slower rate than $N → ∞$.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the bias-variance trade-off when using kernel density?

A

The choice of bandwidth $h$ is much more important than the choice of kernel function $K(·)$.

There is a tension between setting $h$ small to reduce bias and setting $h$ large to ensure smoothness. A natural metric to use is some form of the mean-squared error (MSE).

We would therefore find a way to optimally chose the bandwidth. This is done by minimizing some function of the integrated standard error (ISE), e.g., $E(ISE(h))$.

The optimal BW ($h$) goes to zero as $N \to \infin$.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is non-parametric regression?

A

another interesting application of nonparametric methods is in the estimation of a regression function:

y_i =m(x_i)+\epsilon_i

Since the functional form of $m(x_i)$ is unspecified, we can not use OLS. Instead we use the sample average:

\hat m(x_0) = \sum_i w_{i0,h}y_i

where $w_{i0,h}$ are local weights.

The estimator is unbiased, but for consistency we need $N_0 → ∞$ as $N → ∞$ so that the variance goes to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can be said about the bias-variance trade of in nonlinear regression

A

Here we also have a bias-variance tradeoff. As $h$ becomes smaller $\hat m(x_0)$ becomes less biased, as only observations close to $x_0$ are being used, but more variable, as fewer observations are being used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the features of the local constant kernel regressor?

A

This estimator is known as the local constant estimator, as it assumes that $m(x)$ is constant in a neighborhood of $x_0$. Again, $h$ is the bandwidth and determines how many of the $x_i$s around $x_0$ are used in forming the average.

If $h = \infin$ we get the sample average of $y$

if $h = 0$ we only get one observation,

For the formula. See Notion. It is what we derive PS.

17
Q

What is the boundary problem in kernel regressions?

A

Since this type of regression is basically a moving average. We will have fewer observations at the endpoints, thus we will have over and under-estimations here. This problem is reduced when using local linear (or local polynomial) regression.

18
Q

What do we assume in the local linear kernel regression?

A

We assume that $m(x)$ is linear in the neighbourhood of x_0

19
Q

What can be said about the bias in the local linear regression?

A

b({x_0})=\frac{1}{2}h^2m’‘(x_0)\int z^2 K(z)dz

This estimator does not have a bias in the slope of $m$. This is especially beneficial for overcoming boundary problems. Therefore, we use this in the RDD design.

20
Q

What is penalised regression?

A

We are in a setting with an ordinary linear model and lot of regressors. We then might want some guidance of hoe to solve the right righthand side variables. This problem of model selection is an application of machine-learning.

We then used penalised regression models which adds a penalazation term to the OLS objective function.

These models will help us finding out which variables are “irrelevant” to explain our outcome and which instead we should keep.

This thus not help us solve any problem regarding causal inference, but about prediction. What regressors best predicts our outcome.

Lasso and Ridge regression will punish coefficients too far from zero.

21
Q

How should we think about kernel in multivariate settings?

A

We can generalize the kernel density to a multivariate setting. However, as the dimension of x gets large, it is likely that we incur problems of sparseness (not enough observations in a neighborhood of$ x_0$).

22
Q

State the uniform, triangular and Epanechnikov kernels

A

Uniform: K(z) = 1/2*1(|z|<1)

Triangular: K(z) = (1-|z|)* 1(|z|<1)

Epanechnikov kernel: K(z) 3/4(1-z^2)*1(|z|<1)

23
Q

What are the important features of the kernel

A

\int K(z) dz = 1

\int zK(z)dz= 0

Also \int z^3 k(z)dz = 0

24
Q

Under which condition does the bias in the kernel estimator disapere?

A

When m(x_0) is a constant?