Evaluating Fairness in Machine Learning Models

Meme on Fairness of ML solutions

Businesses prioritize metrics that impact their bottom line, while data scientists often focus on accuracy. However, bias in a model can lead to allocative harms (unequal distribution of benefits) and representation harms (downplaying certain groups). These coupled with opaque black box models ultimately can erode trust in AI.

Here Data scientists play a critical role in developing responsible AI/ML solutions. Cloud platforms offer various fairness testing tools to help identify and mitigate bias. And it is possible to incorporate fairness testing in python notebooks.

One powerful open source tool I use for Fairness assessment in my project is Fairlearn, which quantifies bias in models with respect to sensitive features like age, gender, and race.

Sharing the code template that will help you to see your model fairness. Also it shows an example of unequal distribution of count and selection rate among different groups.

!pip install --no-cache-dir fairlearn
from fairlearn.metrics import MetricFrame
from fairlearn.metrics import (accuracy_score, precision_score,
                              false_positive_rate, false_negative_rate,
                              selection_rate,count)

metrics = {
    "accuracy": accuracy_score,
    "precision": precision_score,
    "false positive rate": false_positive_rate,
    "false negative rate": false_negative_rate,
    "selection rate": selection_rate,
    "count": count,
}
metric_frame = MetricFrame(
    metrics=metrics, y_true=df['y_true'], y_pred=df['y_pred'], sensitive_features=df['race']
)
metric_frame.by_group.plot.bar(
    subplots=True,
    layout=[3, 3],
    legend=False,
    figsize=[12, 8],
    title="Fairness testing",
)

Running the above code on one of the biased dataset resulted in the below charts.

Notice how some metrics, like selection rate and false positive rate, favor certain groups within the population. You will also see that there are more groups belonging to minorities in the population. Even with good accuracy, bias against certain minority groups can undermine their rights. We can measure this only when we shift our focus to how our metrics impact people's lives.

Steps to Evaluate Fairness in Machine Learning Models

Comments

Machine Learning

Quick Guide to Speeding Up Docker Builds for ML Applications

More from this blog

Exploring AI Agents: An Introduction to CrewAI

Quick Guide to Speeding Up Docker Builds for ML Applications

How to choose the right Python base image for containerizing your ML application

Command Palette

Comments

Machine Learning

Quick Guide to Speeding Up Docker Builds for ML Applications

More from this blog