### Role of Generalised Linear Model in non-life pricing Phase3

Phase1: http://www.actuarysense.com/2018/10/role-of-generalised-linear-model-in-non.html
Phase2: http://www.actuarysense.com/2018/11/role-of-generalised-linear-model-in-non.html
So we know that the purpose of GLM is to find the relationship between mean of the response variable and covariates.

Linear Predictor: Let’s denote it with, “η” (eta). So, linear predictor is actually a function of covariates. For example, in the normal linear model where function is Y = B0 + B1x. So linear predictor will be η = B0 + B1x. Always note that linear predictor has to be linear in its parameter. In this case parameters are B0 and B1.
But still the question is how I came up with B0 + B1x as a function?
First of all, note that broadly there are two types of Covariates.
1. Variables: It takes the numerical value. For example: age of policyholder, years of experience etc.
2. Factors: It takes the categorical value. For example: Sex of Policyholder, car colour of Policyholder etc.

Let’s see different Scenario:
1.  If Age is the only covariate that Exists
Linear Predictor: η = B0+B1x, where we input for X i.e. age of policyholder.
2. If Age and Sex both are the Covariates (one is factor and other is variable)
Linear Predictor: Age + Sex: η = ai+Bx, where we input for X i.e. age of policyholder, where i=1 for male and 2 for female.
3. If Age and Sex are the covariates with interaction between them too.
Liner Predictor: Age + Sex + Age.Sex: η = ai+Bix , so we can see here that with change in “i” value of B also changes

The reason that this formulation of liner predictor is desirable is its efficiency. In 1st Case we need to estimate only 2 parameters, in 2nd Case we need to estimate 3 parameter and in last case we need to estimate 4 parameters. So as the covariates keep on increasing, the model will become more complex and we need to estimate more and more parameters.

photo credit: thegeneral.com

So, it simply means that I can estimate as many parameters as data points to make the perfect model? But that will not be the case as it impacts the efficiency of the model. However, we can use that type of model as a benchmark. That type of model is known as “Saturated Model”.
Saturated Model: It is the model that provides perfect fit to the data. The Saturated model is not useful from a predictive point of view, however it is a good benchmark against which to compare the fit of other models via the scaled deviance.

So now point is different people came up with their models for pricing of motor insurance, now which model is good and which is not, we can check it using Likelihood Ratio Test.
Let’s see the example:
There are 2 models: Model P and Model Q.
Now In Model P, Scaled Deviance will be 2(lS – lP) where lS represents log likelihood of Saturated model and other one is of Model P. SO now you can relate with topic 1 of this series why we make every Response Variable model into exponential family, so that we can take its log likelihood easily to use it for comparing with other models. 😊
Same thing I can do for Model Q too.

So, now I have scaled Deviance for both models. So we can use Deviance as a measure of the fit of the model. If the data is normally distributed, the Scaled deviance has a Chi-Square Distribution.
SP - SQ = 2(lS – lP) - 2(lS – lQ) = 2(lP – lQ) then we can test with Chi-square with 5%(say) significance level.

Caution: We can only compare two models if one of them is subset of other model. I mean, both models should have same distribution of data and link function and one model is sub model of other model. Through this we can check whether by adding new parameter, my model produce more accurate result or not.

Seems Easy Haan 😊

Thanks and Regards
Actuary Sense