A - Z

Evaluation Metric
How to evaluate performance of ML models

Evaluation metrics for different types of machine learning models, including linear regression, logistic regression, and multi-class classification, can help assess their performance and guide model selection and tuning.


Some commonly used evaluation metrics for each of these types of models:


Linear Regression


  • Mean Absolute Error (MAE) measures the average absolute difference between the actual and predicted values. It is less sensitive to outliers compared to MSE.

  • Mean Squared Error (MSE) measures the average squared difference between actual and predicted values. It gives more weight to large errors.

  • Root Mean Squared Error (RMSE) is the square root of MSE and provides an interpretable measure in the same units as the target variable.

  • R-squared (R²), also known as the coefficient of determination, R-squared measures the proportion of variance in the target variable explained by the model. It ranges from 0 to 1, with higher values indicating a better fit.


Logistic Regression (Binary Classification)


  • Accuracy measures the proportion of correctly classified instances out of all instances. It's a common metric but may not be suitable for imbalanced datasets.

  • Precision measures the proportion of true positives among all predicted positives. It's useful when minimizing false positives is important.

  • Recall (Sensitivity or True Positive Rate) measures the proportion of true positives among all actual positives. It's important when minimizing false negatives is critical.

  • F1-Score, the harmonic mean of precision and recall. It balances precision and recall and is especially useful when dealing with imbalanced datasets.

  • Area Under the Receiver Operating Characteristic Curve (AUC-ROC) measures the model's ability to distinguish between positive and negative classes across different probability thresholds. It's useful for assessing the overall performance of the model.


Multi-Class Classification


  • Accuracy measures the proportion of correctly classified instances out of all instances. Similar to binary classification, it may not be suitable for imbalanced datasets or multi-class problems with unequal class sizes.

  • Precision, Recall, F1-Score (per class) can be computed separately for each class in a multi-class problem. They provide insights into the model's performance for each class.

  • Macro and Micro Averaging are used to compute precision, recall, and F1-score when dealing with multi-class problems. Macro averaging computes metrics for each class independently and then takes the unweighted average, while micro averaging aggregates the contributions of each class to compute overall metrics.

  • Confusion Matrix is a table that summarizes the model's predictions, showing the true positives, true negatives, false positives, and false negatives for each class.

  • Cohen's Kappa is a statistic that measures the agreement between the model's predictions and the actual classes, considering chance agreement. It's useful when dealing with imbalanced datasets.

  • Log Loss (Cross-Entropy Loss) measures the goodness of fit between predicted class probabilities and actual class probabilities. Lower log loss values indicate better model performance.


Remarks


The choice of evaluation metric depends on the specific problem, the goals of the analysis, and the characteristics of the dataset. It's often recommended to use a combination of metrics to gain a comprehensive understanding of the model's performance.

Some of our services

Strategy

We help you develop and implement a winning strategy for the Generative AI revolution.

Security

Secure your generative AI solution with our comprehensive security consulting services.

Governance

Enabling your company to harness the full potential of AI while minimizing risks.

Share by: