**Evaluation metric for Supervised Learning:**

Evaluation metrics explain the **performance of a model**. An important aspect of evaluation metrics is their capability to discriminate among model results.

In machine learning, we regularly deal with mainly two types of tasks that are classification and regression. Classification is a task where the predictive models are trained in a way that they are capable of classifying data into different classes for example if we have to build a model that can classify whether a loan applicant will default or not. But regression is a process where the models are built to predict a continuous variable for example if we need to predict the house prices for the upcoming year.

In both the tasks we do the basic data processing followed by splitting the data into training and testing sets. We use training data to train the model whereas testing data is used to compute prediction by the model. Many different algorithms can be used for classification as well as regression problems but the idea is to choose that **algorithm that works effectively** on our data. This can be done by doing the** evaluation of the model and using error metrics.** Different evaluation methods are used like **confusion matrix, accuracy score, classification report, mean square error etc.**

We have Different evaluation metrics for Supervised Learning. Here is the list of the evaluation metrics,

**Evaluation metrics for Regression:**

- Mean Absolute Error (MAE)
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE)
- Root Mean Square Log Error (RMSLE)
- R2 and Adjusted R2

**Evaluation metrics for Classification:**

- Confusion Matrix
- Accuracy
- Alternatives to Accuracy
- Recall (TPR, Sensitivity)
- Precision
- F-Score
- ROC AUC
- FPR (Type I Error)
- FNR (Type II Error)
- Log Loss
- Gini Coefficient

# Now will discuss on Regression metrics:

1. Mean Absolute Error (MAE)

is the measure of the difference between the two continuous variables. The MAE is the average vertical distance between each actual value and the line that best matches the data. MAE is also the average horizontal distance between each data point and the best matching line.*MAE*

2. Mean Square Error (MSE)

**Mean Squared Error** that is mainly used when **predictions have large deviations. **Values range from 0 up to millions and we **don’t want to punish deviations in prediction**.

** Mean Square Error (MSE) **measures how far the data are from the model’s predicted values.

**Disadvantage of MSE:**

**Sensitive to outliers**- If we make a
**single very bad**prediction, taking the**square**will make the**error even worse**and it may skew the metric towards overestimating the model’s badness. - On the other hand, if all the errors are
**smaller than 1**, than it affects in the opposite direction: we may underestimate the model’s badness.

3. Root Mean Square Error (RMSE)

- The most commonly used metric for regression tasks is
**RMSE (root-mean-square error)**. This is defined as the square root of the average squared distance between the actual score and the predicted score. **RMSE**is sensitive to**outliers**and can**exaggerate**results if there are**outliers in the data set**.

4. Root Mean Square Log Error (RMSLE)

- we
**don’t want to penalize big differences**when**both the predicted and the actual are big numbers**. - we want to penalize
**under estimates**more than**over estimates**. **Range: [0,∞)****Squared Logarithmic Error(SLE)**= (log(prediction+1)-log(actual+1))²**RMSLE**= sqrt(mean(squared logarithmic errors))

where:

**n**is the total number of observations**p**is the**predicted value****a**is the**actual value****1**is added as constant to**actual and predicted**values because they can be**0**and**log of 0 is undefined.**

**RMSLE** measures the **ratio** between **actual and predicted**.

log(pi+1)−log(ai+1)log(pi+1)−log(ai+1)

can be written as log((pi+1)/(ai+1))

*Example:1*

*Example:2*

- Imagine that we have a simple predictive model, for example, a linear regression that predicts the following values.

- The metrics for these values would be:
*MRSE: 2.5495**MRSLE: 0.5358*

## Outliers

- One difference is the influence that outliers values have on the error. This happens because when the values are transformed to
**logarithmic**, these values are softer and also the error. This is known as**robustness.** - We will calculate the metrics by adding one outlier observation in the table above.

- If we
**look at the metrics again**, we can see that the**RMSE**is**very affected**because it has increased a lot due to the new values that have been added. *RMSE: 28.9421**RMSLE: 0.5890*- Also, visually this effect on a graph can be understood because the
**logarithmic representation is not parallel**, since, according to its orientation it has one of the sides with a**flatter curve**, so it**penalizes more underestimation than overestimation**.

5. R Squared (R2)

- R2 measures
**how far**the data are from the model’s**predicted values**compare to how far the data are from the**mean model**(model predicting all given samples as mean value). - always has values between
**-∞****and****1.** - When the interest is in the
**relationship between variables**, not in prediction, the**R2 is less important.**

`from sklearn.metrics import r2_score`

r2 = r2_score(y_test,y_pred)

print(r2)

6. Adjusted R Squared

`n=40`

k=2

adj_r2_score = 1 - ((1-r2)*(n-1)/(n-k-1))

print(adj_r2_score)

*Summary:*

- If you have
**outlier**in the data and you**want to ignore**them,**MAE**is a better option but if you**want to account**for them in your loss function, go for**MSE/RMSE**. - When the interest is in the
**relationship between variables**, not in prediction, the**R2 is less important.**

*Now will discuss on Classification metrics:*

*Now will discuss on Classification metrics:*

1. Confusion Matrix

- Inorder to check the accuracy like how many no of results are got correctly, usualy create a 2x2 matrix and is called Confusion Matrix for 2 class labels. No of outputs is either 1 or 0.
- A
**confusion matrix**is a tabular summary of the**number of correct and incorrect predictions**made by a classifier. It can be used to**evaluate the performance of a classification model**through the calculation of performance**metrics like accuracy, precision, recall, and F1-score**.

- False Positive 21
- False Negative 2
- True Negative 4
- True Positive 34

2. Accuracy

**Accuracy** simply measures how often the classifier makes the correct prediction. It’s the ratio between the number of correct predictions and the total number of predictions While accuracy is easy to understand, the **accuracy metric** is ** not suited** for

**unbalanced classes**. Hence, we also need to explore other metrics for classification.

There are 4 important terms:

**a. TP(True Positive): The cases in which,**

Actual value is 1

Predicted value is 1

**b. FN(False Negative)**

Actual value is 1

Predicted value is 0

**c. FP(False Positive)**

Actual value is 0

Predicted value is 1

**d. TN(True Negative)**

Actual value is 0

Predicted value is 0.

3. Alternatives to Accuracy

4,5. Precision and Recall:

**Balanced Dataset**

Accuracy = (TP+TN) / (TP+FP+FN+TN)**Imbalanced Dataset**

*a. RECALL (TPR + SENSITIVITY)*

TPR : True Positive Rate

RECALL = TP / (TP+FN)

suppose person having cancer (or) not? **He is suffering from cancer** but model predicted as **not suffering from cancer**

*b. PRECISION = TP / (TP+FP)*

PRECISION (+ve prediction value). Out of the total actual positive predicted results how many were actually positive.

**In Spam Detecion** : Need to focus on **precision**

a. Suppose **mail is not a spam** but model is **predicted as spam :** FP (False Positive). We always try to reduce FP.

b. Whenever **False Positive** is much more important use **PRECISION**

c. Whenever **False Negotive** is much more important use **RECALL**

6. F Score

For a use case, if we are trying get the best precision and recall at the same time? F Score is the **harmonic mean** of precision and recall values for a classification problem.

7. ROC / AUC

**AUC-ROC** curve is a performance measurement for **classification problem** at **various thresholds settings**. **ROC** is a ** probability curve** and

**AUC**represents degree or measure of

**.**

*separability***Higher the AUC**, better the model is at predicting ** 0s as 0s and 1s as 1s**. By analogy, Higher the AUC, better the model is at distinguishing between

*patients with disease and no disease.**a. AUC : Area Under Curve*

*a. AUC : Area Under Curve*

One of the widely used metrics for ** binary classification** is the

**Area Under Curve(AUC)**AUC represents the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example. The AUC is based on a plot of the false positive rate vs the true positive rate which are defined as:

## Defining terms used in AUC and ROC Curve

## 1. TPR (True Positive Rate) / Recall / Sensitivity

Sensitivity tells us what proportion of the ** positive class** got correctly classified. A simple example would be to determine what proportion of the

**were**

*actual sick people***correctly detected**by the model.

## 2. Specificity / TNR (True Negative Rate)

Specificity tells us what proportion of the ** negative class** got correctly classified. Taking the same example as in Sensitivity, Specificity would mean determining the proportion of

**who were**

*healthy people***by the model.**

*correctly identified*## 3. FPR (False Positive Rate)

FPR tells us what proportion of the ** negative class** got

**by the classifier. A higher TNR and a lower FPR is desirable since we want to correctly classify the negative class.**

*incorrectly classified*## 4. FNR (False Negative Rate)

False Negative Rate (FNR) tells us what proportion of the ** positive class** got

**by the classifier. A higher TPR and a lower FNR is desirable since we want to correctly classify the positive class.**

*incorrectly classified*The area under the curve represents the area under the curve when the false positive rate is plotted against the True positive rate as below.

AUC ranges between 0 and 1.

A value of 0 means 100% prediction of the model is incorrect. A value of 1 means that 100% prediction of the model is correct.

**b. ROC : Receiver Operating Characteristic Curve**

The ROC curve is plotted with TPR against the FPR where TPR is on y-axis and FPR is on the x-axis.

## Relation between Sensitivity, Specificity, FPR and Threshold

Sensitivity and Specificity are inversely proportional to each other. So when we increase Sensitivity, Specificity decreases and vice versa.

When we decrease the threshold, we get more positive values thus it increases the sensitivity and decreasing the specificity.

Similarly, when we increase the threshold, we get more negative values thus we get higher specificity and lower sensitivity.

As we know FPR is 1-specificity. So when we increase TPR, FPR also increases and vice versa.

10. Log Loss

Log loss is a pretty good evaluation metric for binary classifiers and it is sometimes the optimization objective as well in case of Logistic regression and Neural Networks.

Binary Log loss for an example is given by the below formula where p is the probability of predicting 1.

As you can see the log loss decreases as we are fairly certain in our prediction of 1 and the true label is 1.

11. Gini Coefficient

- Gini coefficient is sometimes used in classification problems. Gini coefficient can be straight away derived from the AUC ROC number. Gini is nothing but the ratio between the area between the ROC curve and the diagonal line and the area of the above triangle.

- The formula for Gini Coefficient,

Gini = 2*AUC-1

- Gini above 60% is a good model. An important point to note is that this Gini coefficient is different from the Gini index we encounter in Decision tree.

# Conclusion

We have discussed the evaluation metrics for both the classification and regression problems. We can always try improving the model performance using a good amount of feature engineering and Hyperparameter Tuning. Read more about error metrics here

“Top 10 Model evaluation Machine Learning Enthusiast should know”

# Clap if you liked the article!

Please find my next articles on: