It is easy to lie with Statistics. It is hard to tell the truth without it by Andrejs Dunkels, Mathematician and Writer
MultiCollinearity Problem
When you are going for an interview, the interviewer seeks for only unique qualities in the candidate. Qualities that make a different candidate from others. Similarly, When two or more than two variables carry same information to the target variable, this problem occurs. For modeling, target variable only require unique information from each variable.
To be more precise, when one independent variable is highly correlated with one or more of other variables, multicollinearity lives there.
Output
 |
VifCor |
Variance Inflation Factor(VIF) used to detect strong correlation between two or more than two independent variable. Formula to Calculate:-
 |
Calculate Vif |
- If VIF = 1, not correlated
- VIF = between 1 and 5, moderate correlated
- VIF = greater than 5, highly correlated
From above observation, it is certain that there is a collinearity problem. Before removing the variable, let's look at the correlation plot too.
Output
 |
Correlation plot |
This plot helps to understand how variables are correlated with each other. Here, we are getting the same result as obtained from VIF, temp and atemp are highly correlated that means carry the same information.
While removing highly correlated variable "atemp", we are also removing removing variable that does not carry meaningful information to our objective i.e. "instant", "dteday".
Feel free to comment your opinionsđź’“
If you are new on this post, visit it's previous post also.
0 Comments