Generalised Linear Model (week 6-8) Flashcards

1
Q

Where does GLM used for?

A

General / health insurance pricing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

GLM formula?

A

g(μ) = g(E(Y)) = α + β1X1 + … + βkXk = η

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does g represent?

A

link function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is η

A

linear predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is μ?

A

g^(-1) (η)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the symbol of dispersion parameter

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

b”(teta) from PDF represents

A

variance function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is canonical link

A

transform mean to natural exponential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we need GLM?

A

Because when the dist is normal, we use PDF to calc P-val or CI. However, if its normally dist, heteroskedacity, and non-linear, we use GLM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

link function

A

we transforming the predictions, or everything except the dependent var

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

binomial (binary) follow what dist

A

logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when we use poisson?

A

if we have skewed discrete dist
-“num of time u …”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

when to use neg binomial?

A

mean and median diff, unlike poisson

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

gamma dist when to use?

A

continuous dist, var must positive >0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to do GLM? (long)

A
  1. what dist is this?
  2. look at the table, see which μ are u suing (formula sheet)
  3. write likelihood function ∏(fy)
  4. compute log likelihood function change ∏(fy) to ∑log(fy)
  5. fy is from formula sheet page 5 (dont forget exp can diturunin langsung kalau dikali with log)
  6. masukan the fy (from number 5) use number 2 μ
  7. derive alpha and beta and set to 0 (if we derive and hv x infront, the x stay still, gbisa di remove, if dont hv x, we can remove langsung all the alpha beta ))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Information Criteria is

A

-Assess goodness-of-fit and parameter parsimony
-For comparison between diff linear predictors/link functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How too choose good IC?

A

find the lowest one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are 2 types of CI?

A

AIC and BIC (more likely underfit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

forward and backward selection if look at the BIC AIC

A

same but find the lowest

20
Q

Pearson residual vs Deviance residual is used when:

A

Pearson when normal
Deviance when close to normal dist
If Y is normally dist, pearson and deviance is equal

21
Q

positive trend is when

A

when plotting the absolute standarised residual vs scaled fitted values and
b”(teta) increase too slowly

22
Q

negative trend is when

A

when plotting the absolute standarised residual vs scaled fitted values and
b”(teta) increase too fast

23
Q

Short tailed line business

A

less few years to settle all claims. e.g motor, home, fire

24
Q

Long tailed line business

A

more than new years. e.g worker’s compensation, public&product liability

25
What is Outstanding Claim Liabilities? (OCL)
claims incurred prior valuation date but not paid by valuation date
26
IBNR is
Incurred but not reported
27
How to estimate OCL
expressing past claims data as a run-off triangle, then applying reserving methods (Chain Ladder Method)
28
How to construct the Claims Off Run Triangle
yg kanan kiri itu development year(tahun dibayar) yang turun itu accident year (accident terjadi)
29
How to make chain ladder method?
1. cumulative 2. find the development factor value (sum of kolom 2 / sum of kolom 1 tpi panjangny di samain) 3. karena panjangnya disamain, kan ada value dari every last kolom/baris yang ga kepake itu dikali sama res no 2 (start from pojok kanan atas or kolom 9 ) 4. sama kaya no 3, tpi kali ini dikali sama hasil yg no 2 dri sblomnya. eg: kolom 8, itu dikali devfactor 9-10 and 8-9 5. dikurangin sama alue dari every last kolom/baris 6. repeat kaya no 4, jadi yes tambah banyak dikali dev factornya
30
residual bootstrapping
allows for both process error and parameter error
31
what is process error?
process of uncertainty, randomness of the future
32
what is parameter error?
uncertainty when fitting to a model
33
how to do residual bootstrapping?
1. bootstrap data: residualnya di pick randomly with replacement 2. bootstrap data di combine with fitted values to generate pseudo data 3. new model is fitted to pseudo data 4. expected OCL for pseudo data is estimated 5. repeat!
34
what does pmax do?
set minimum
35
Logistic regression
When GLM with binomial distribution and logit link function (canonical link)
36
What is the estimated prob for logistic regression?
1
37
What happen if we lowering the threshold?
Increase true positivity but also increase false positive
38
What is False Positive?
When "yes" but no
39
What happen if we increase the threshold?
Reduce false positive but increase false negative
40
What is false positive?
When "no" but yes
41
Downside of logistic reggresion?
sensitive to class imbalance, model may predict majority class more frequently
42
How to solve the class imbalance?
Oversampling the minority class
43
How to do logistic regression
1.make oversampling and summary data 2. model the data use data$explanatory var 3. fit into glm use family = binomial and link = logit 4. combine explanatory variables 5. using the new improved model, we do model checking 6. check TP, FP, TN, FN 7. Check the ratio between TP and FP
44
45