Without big data analytics, companies are blind and deaf, wandering out onto web like deer on a freeway by Geoffrey Moore, Author, speaker and consultant
Metrics to evaluate your model completely depend on your type of model you are building. The model that we are working on is a regression model.
Mean Absolute Percentage Error (MAPE)
It is a statistical measure of how prediction system is accurate. It is the most common method used to forecast error. Why it is better than other regression metric? Like MAE values can range from 0 to infinity which makes it difficult to interpret the result as compared to the training data.
MAPE is equivalent to MAE but provides error in a percentage form and therefore overcomes MAE limitations. It is very easy to understand the model performance and interpret the result out of it.
MAPE value of Multiple linear regression model is: 0.6761611
Error percentage: 0.6761611 * 100 = 67.61%
MAPE value of Random forest model is: 0.04155425
Error percentage: 0.04155425 * 100 = 4.12%
Root Mean Square Error (RMSE)
It represents the standard deviation of the residuals(i.e. differences between the model predictions and the true values). It provides an estimate of how large the residuals are being dispersed.
RMSE value of Multiple linear regression model is: 2697.235
RMSE value of Random forest model is: 172.1481
In RMSE, more residuals dispersed apart that means the model performing poorly.
To calculate accuracy, subtract error percentage from 1.
The code to calculate all these metrics are already explained in Part 6 post. We can clearly observe that error percentage for multiple linear regression model is very high that may indicate we need more data to train our model or re pre-process the data until the number goes down. You might have to play with some features to improve the model performance. But we can see that error percentage and rmse of random forest model is much much better. Clearly, we will accept these value. Our model is good to go with random forest algorithm but always remember it is a repeated procedure.
There is so much work we can do if the data is imbalanced. This project series follows the right methodology to tackle any problem.
Feel free to comment your opinionsđź’“

0 Comments