Exploring the Most Popular Regression Metrics for Data Scientist
While applying regression on real-world data it is necessary to make sure our model is optimized to predict correct output and make fewer errors. Below are some of the popular metrics used to evaluate the model’s performance.
MAE (Mean Absolute Error)
It is basically the mean of absolute error between the actual value and the predicted value. We take absolute error to ignore the negative sign of values for calculation.
Advantages:
- Error unit same as output variable which makes it easy to interpret.
- Robust to outliers.
Disadvantages:
- It is not differentiable.
MSE (Mean Squared Error)
Instead of taking the absolute value of the error, MSE uses squared values of the error to eliminate negative signs.
Advantages:
- Graph is differentiable.
- Can be used as a loss function.
Disadvantages:
- Error unit different from output variable
- Not robust to outliers.
RMSE (Root Mean Squared Error)
It is same as the MSE but at the end of the calculation we will take the square root of the error to make its unit same as the output feature(y).
Advantages:
- square root makes error unit same as output variable (y)
Disadvantages:
- Not robust to outliers
R2 Score
It is also called the coefficient of determination or goodness of fit. It helps to evaluate the model’s performance by comparing square of prediction error made by the regression model to square of mean error.
If R2 Score goes towards Zero (0) Model performance decreases.
R2 Score may get affected by irrelevant features added to the dataset.
Adjusted R2 Score
Adjusted R2 score does not get affected by any irrelevant column added and try to give the best results on each database being used.