Before reading this article, make sure that you have read the phase 1.

In this Topic we will talk about Link function and Linear predictors.

We saw in previous topic that exponential family is

f( y; θ, φ) = exp[{(yθ – b(θ))/a(φ)} – c(y, φ)]

where θ is your natural parameter and φ is your dispersion parameter.

Response distribution can be Gamma, Binomial, Lognormal, Poisson, Normal or Exponential and then we make it in the form of exponential family.

Now the thing is that relationship between Response and covariates is defined through mean of response distribution i.e. E[Y]

Let’s take the example of linear model where we defined Y = B_{0} + B_{1}x. (learn about linear model in the previous topic),

Y – N ( µ, σ^{2}) where your µ = B_{0} + B_{1}x.

__Now note one thing always that purpose of GLM is to find the relationship between mean of the response variable and covariates.__

Photo credits: bajajallianz.com

So, what are the components required for GLM model? Let’s see one by one.

1. Distribution of the Data: We have to see that on which thing we want to apply this model. For example, we want to apply this model for number of claims, then we should use Poisson distribution. Or if we want to use it for probability of getting a heart attack, we should use Binomial distribution. Remember that we can then make it into exponential family.

2. Linear Predictor: Let’s denote it with, “η” (eta). So, linear predictor is actually a function of covariates. For example, in the normal linear model where function is Y = B_{0} + B_{1}x. So linear predictor will be η = B_{0} + B_{1}x. Always note that linear predictor has to be linear in its parameter. In this case parameters are B_{0 }and B_{1}.

3. Link Function: We saw the bold line above that purpose is to find the relationship between mean of the response variable and covariates. So there has to be a link between both of them. So, what we do here is that we take function of the mean of response variable and equate it with the function of covariates (which is linear predictor i.e. η).

So, g(µ) = η. Now by assuming that link function is invertible so, µ = g^{-1}(η)

I guess we read enough core concepts, let’s see how in reality it works.

**Follow us on LinkedIn : Actuary Sense**

So, I want to model the number of claims on bike insurance. So my response variable is Y i.e. number of claims and let’s suppose that there is only factor that we are considering that is age. So we denote the age by X. So, for GLM we need three things

1. Distribution of Data: So, I will use Poisson distribution for my response variable. So, Y – Poi(µ)

2. Linear Predictor: so, I will define η = B_{0} + B_{1}x.

3. Link Function: we saw that it will be g(µ) = η. So, question is what should be g(µ). I will say that it will be log µ.

So let’s equate; log µ = B_{0} + B_{1}x. Now linear predictor is invertible then µ = exp (B_{0} + B_{1}x).

So yeah, we finally defined the relationship between Mean of response variable (µ) and function of covariates.

But the question is how I came up with g(µ) = log µ. Let’s see

We have used the Poisson distribution and we know that its mean cannot be negative. So that why we used log µ. What about other distributions?

1. Normal Distribution: Identity function: g(µ) = µ. Because normal distribution can take any value positive or negative.

2. Binomial: Logit Function: g(µ) = log(µ/1- µ). Because Binomial value will be between 0 and 1.

Well in next topic we will see about the role of Linear predictors in detail.

**Follow us on LinkedIn : Actuary Sense**

## Comments

## Post a Comment